MembersPage/RichardBarrington/TaviPostgreSQL (2006-09-17 14:19:49)

Tavi on PostgreSQL 7.4

Converting a Tavi 0.26 Wiki from MySQL 4.1 to PostgreSQL 7.4


This page documents the process of comverting a Tavi 0.26 Wiki running on MySQL 4.1, to run on PostgreSQL 7.4. It provides references to the previous works this process is based on, and to sources for the software required.

Abstract

Tavi is a PHP Wiki implementation running on a webserver. VEMS has been using Tavi for several years, however there have been ongoing problems with the MySQL database backend that Tavi requires. The problems appear to be index related, and have not been repeatable (therefore fixable) but have been addressed by regular dump and reloads. This should not be necessary, so an alternative was sought.

Tavi does not support databases other than MySQL by default so conversion to another Wiki rather than another database was initially considered. It is important not to lose data in the process, so either conversion may have been difficult. Fortunately, there is a patch to provide PostgreSQL support for Tavi 0.25, written by Juha-Mikko Ahonen and available from his website. Less fortunately, the performance when searching the database is not adequate (PostgreSQL does not support text indexes by default) and it has not been updated to the current version of Tavi.

The work by Ahonen was however a good place to start, so full credit must be given to him for the inspiration and initial patch.

The work described below extends Ahonens patch, updating it for 0.26 compatibility, refining the queries, and increasing search speed by an order of magnitude by using the TSearch2 functionality by Oleg Bartunov and Teodor Sigaev. The stable release of TSearch2 is provided with the PostgreSQL distribution.

Links to the software mentioned above are at the end of this document.

Aims

Prerequisites

The Conversion Process

As VEMS is using Tavi 0.26 with Captcha, the first step is to update Juha-Mikko Ahonens PostgreSQL port of Tavi from 0.25 to 0.26. The prototype conversion with TSearch2 (full text indexing) is documented on this page.

Once this prototype was working well enough to be usable, a more thorough conversion was performed from clean 0.26 sources. This process is documented on this page: MembersPage/RichardBarrington/TaviPostgreSQL/ConversionDetails


Prototype Conversion

The next logical step is optimisation of the PostgreSQL database queries, however to do this we need some test data loaded. So we'll change course and figure out how script the data transfer from MySQL Tavi to PostgreSQL Tavi.

Fortunately, we're using MySQL v4.1 which supports some useful options. To get a PostgreSQL compatible data dump, we can run:

This gives us an SQL data (only) script which can be run after we have created the Tavi schema in PostgreSQL. Creating the basic Tavi schema is straight forward, and is again based on Juha-Mikko Ahonens work. The only change being to add a CHECK constraint to the mutable field of the pages table.

The PostgeSQL user and database must be created, then we can import the data dumped from MySQL:

We can then apply the patch and copy Juha-Mikko Ahonens dbpg.php file into wiki/lib/ path, and ensure that the line DBClass = 'dbpg'; is in the Tavi config.php file. We now have a working Tavi 0.26 PostgreSQL system, but there is still much work to be done! Text searching is extremely slow, as is retrieving the list of recent updates and orphaned pages, etc.

Optimisation

VEMS uses two Tavi features extremely regularly, both of which are effectively crippled by the unoptimised PostgreSQL conversion - RecentChanges, and Search.

Firstly, the existing queries were examined to see if there was any room for improvement, as some needed to be modified for PostgreSQL anyway. A noticable improvement was found by using sub-selects in the large search queries.

Query improvement patch

The slowness of Search function was the second issue to be addressed. Two text search libraries for PostgreSQL were evaluated, FTI and TSearch2. FTI is an older library, but it was found to create an enormous index that had a negative impact on the speed of the database as a whole. TSearch2 had no such problems, although we did need to use the "simple" dictionary to effectively index technical/non-standard English as used on the VEMS wiki.

Firstly, the TSearch2 schema and functions must be loaded into the database. The script to do this is available in PostgreSQL contrib (/usr/share/postgresql/contrib on many systems) and must be run as a user with "C" permissions. If necessary, appropriate permissions must be granted to the Tavi user. The Tavi schema can then be modifed to use TSearch2, and the existind data can be indexed.

Apply the scripts to the database (as DBA user):

After indexing it may be desirable to optimise the database at the physical level, by clustering on the primary key index of the pages table. This will speed reads based on version and title information.

Finished patches

Combining the above work results in the following patches:

Thanks

Thank yous go out to Juha-Mikko Ahonen for the initial patch, Oleg Bartunov and Teodor Sigaev for their TSearch2 work, and of course to the Tavi and PostgreSQL teams for creating great software. Finally, thanks to VEMS for providing the hosting and a testbed for this patch.


Software:

Tavi Wiki:

Apache HTTP server:

PHP engine:

MySQL database:

PostgreSQL database:

Initial PostgreSQL patch for Tavi:

TSearch2 full-text-index search engine for PostgreSQL: