MembersPage/RichardBarrington/TaviPostgreSQL

Tavi on PostgreSQL 7.4

Converting a Tavi 0.26 Wiki from MySQL 4.1 to PostgreSQL 7.4

Richard Barrington - barri662@student.otago.ac.nz


This page documents the process of comverting a Tavi 0.26 Wiki running on MySQL 4.1, to run on PostgreSQL 7.4. It provides references to the previous works this process is based on, and to sources for the software required.

Abstract

Tavi is a PHP Wiki implementation running on a webserver. VEMS has been using Tavi for several years, however there have been ongoing problems with the MySQL database backend that Tavi requires. The problems appear to be index related, and have not been repeatable (therefore fixable) but have been addressed by regular dump and reloads. This should not be necessary, so an alternative was sought.

Tavi does not support databases other than MySQL by default so conversion to another Wiki rather than another database was initially considered. It is important not to lose data in the process, so either conversion may have been difficult. Fortunately, there is a patch to provide PostgreSQL support for Tavi 0.25, written by Juha-Mikko Ahonen and available from his website. Less fortunately, the performance when searching the database was not adequate (PostgreSQL does not support text indexes by default) and it has not been updated to the current version of Tavi.

The work by Ahonen was however a good place to start, so full credit must be given to him for the inspiration and initial patch.

Links to the software mentioned above are at the end of this document.

Aims

Prerequisites

The Conversion Process

As VEMS is using Tavi 0.26 with Captcha, the first step was to update Juha-Mikko Ahonens PostgreSQL port of Tavi from 0.25 to 0.26. To address some performance issues, the prototype was modified to use TSearch2 (full text indexing). Once this prototype was working well enough to be usable, a more thorough conversion was performed from clean 0.26 sources. This process is documented below.


Code Conversion Details

This section describes the process of converting Tavi 0.26 from MySQL to PostgreSQL from scratch, after initial prototyping with Juha-Mikko Ahonens patch.


The first thing to do is update the scripts to create the database, and configure the settings. The existing scripts were modified to include PostgreSQL options. Next, a data conversion script was added to the install directory to help users migrate from MySQL to PostgreSQL.

The lowest level interface to the database in Tavi is the WikiDB class, so a PostgreSQL WikiDB class was created. This class has additional debugging added to the query output.

References to WikiDB were then grepped for, with only one result. Pagestore.php was then modifed to require the appropriate library. Before making any further PostgreSQL addition, it was decided to grep for all query and result calls to ensure none were missed as a result of not testing all website functions manually. 47 query() calls were found, and 26 result() calls were found (not including db.php, dbpg.php, and tavidoc.php).

When tested at this point, the first error was a syntax error regarding a table locking statement on the rate table. PostgreSQL uses MVCC rather than MySQL style locking, and I'm not an expert in database locking, so I'm trusting the DBMS to do the right things here.

The first file to be converted was rate.php. This file needed a transaction added (replacing the lock statement), queries altered and optimised to use the CIDR type, and also needed to have timestamps explicitly updated (MySQL does that automatically, whereas PostgreSQL doesn't). Regexes were altered to suit the CIDR queries, and the associated text in parse/html.php and lang/default.php was updated to reflect the CIDR terminology (eg, 192.168.1.0/24 rather than 192.168.1.*).

While working with the rate and associated IP blocking admin pages, the locking admin pages were also examined. This highlighted a problem with the allpages query, which simply wouldn't parse. A simpler (although more costly) query was added to the method for PostgreSQL to use. Now that the locking page could generate a list of all pages, unlocking and locking was tested. Unlocking worked, while locking did not - there was a problem with the times again, so the lib/page.php queries were modified to update the timestamps explicitly.

The pagestore.php calls were next to be addressed. This was more of the same, with some areas of possible physical-level optimisation apparent. The timestamp output format was also adjusted to be compatible with the expected 3.x MySQL format (not using the hacked 4.1 code that the VEMS Wiki uses).

An hour was wasted chasing a difference between MySQL and PostgreSQL in the WantedPages query... turns out MySQL isn't too fussy about the case of text strings when it does JOINs. I'm not sure this is a good thing in a Wiki. Nice to know about though, I guess.

The result of that problem was a moderately complex query that required an additional index to speed up by a factor of 8. A side benefit of this is that searches are fast - even without the TSearch2 module. Granted, we don't have a huge amount of data to scan but this was a suprising find anyway. The ILIKE in PostgreSQL operator works as acceptably as the LIKE opertor in MySQL, and if it becomes too slow later, adding TSearch2 functionality can be scripted.

A gzipped tar archive may be downloaded here:

http://www.vems.hu/files/MembersPage/RichardBarrington/TaviPostgreSQL/tavi-postgresql-0.26.tar.gz

Minor customisation for VEMS

Any emergency switch-over should be transparent to the end users, so the first task was to move the config settings over, and the CSS stylesheet. Obviously the additional toolbar code in parse/html.php was needed as well, as was template/common.php. At this point, it was possible to switch between HomePage on www.vems.hu and on testbox.freedomwigs.com, and see no difference.

Logging of Captcha errors was added, using CIDR so that queries based on IP range could be easier. Next, the "rolling-code" Captcha was implemented, along with the requirement for entering a comment when saving. The edit and preview templates, along with the parse/html.php file were then edited to match the current VEMS template, and the lang/default.php file updated with the current edit and preview text.

At this point, the only visible differences are the format of the timestamp at the bottom of each page, and the slightly larger size of the Captcha text. Both of these are likely to be acceptable changes, as the date format is arugably nicer, and the Captcha is arguably easier to read.

The VEMS specific gzipped tar archive may be downloaded from here:

http://www.vems.hu/files/MembersPage/RichardBarrington/TaviPostgreSQL/tavi-postgresql-vems-0.26.tar.gz

The system may be migrated by create the PostgreSQL Tavi user and database, then running the createdb.pl (or createdb.sql) scripts, and then the transfer-mysql41-to-postgresql74.sh script. Don't forget set a cron job to run a vacuum full analyze; regularly too.

A server is currently available for testing at http://www.freedomwigs.com:81/wiki/index.php

Please poke it and write if any problems are found.


Prototype Conversion

The work described below extends Ahonens patch, updating it for 0.26 compatibility. It explores the use the TSearch2, and the process of migrating Tavi from MySQL to PostgreSQL


The first task was to modify the 0.25 patch to work with 0.26, and refine some of the queries where necessary. This wasn't too difficult, mostly involving some repositioning of lines in the original patch. However, it did not examine the complete Tavi source code and as a result, some database interaction was not ported.

The next logical step is optimisation of the PostgreSQL database queries, however to do this we need some test data loaded. So we'll change course and figure out how script the data transfer from MySQL Tavi to PostgreSQL Tavi.

Fortunately, we're using MySQL v4.1 which supports some useful options. To get a PostgreSQL compatible data dump, we can run:

This gives us an SQL data (only) script which can be run after we have created the Tavi schema in PostgreSQL. Creating the basic Tavi schema is straight forward, and is again based on Juha-Mikko Ahonens work. The only change being to add a CHECK constraint to the mutable field of the pages table.

The PostgeSQL user and database must be created, then we can import the data dumped from MySQL:

We can then apply the patch and copy Juha-Mikko Ahonens dbpg.php file into wiki/lib/ path, and ensure that the line DBClass = 'dbpg'; is in the Tavi config.php file. We now have a working Tavi 0.26 PostgreSQL system, but there is still much work to be done! Text searching is extremely slow, as is retrieving the list of recent updates and orphaned pages, etc.

Optimisation

VEMS uses two Tavi features extremely regularly, both of which are effectively crippled by the unoptimised PostgreSQL conversion - RecentChanges, and Search.

Firstly, the existing queries were examined to see if there was any room for improvement, as some needed to be modified for PostgreSQL anyway. A noticable improvement was found by using sub-selects in the large search queries (this despite the cost in the planner being higher).

Query improvement patch

The slowness of Search function was the second issue to be addressed. Two text search libraries for PostgreSQL were evaluated, FTI and TSearch2. FTI is an older library, but it was found to create an enormous index that had a negative impact on the speed of the database as a whole. TSearch2 had no such problems, although we did need to use the "simple" dictionary to effectively index technical/non-standard English as used on the VEMS wiki.

Firstly, the TSearch2 schema and functions must be loaded into the database. The script to do this is available in PostgreSQL contrib (/usr/share/postgresql/contrib on many systems) and must be run as a user with "C" permissions. If necessary, appropriate permissions must be granted to the Tavi user. The Tavi schema can then be modifed to use TSearch2, and the existind data can be indexed.

Apply the scripts to the database (as DBA user):

After indexing it may be desirable to optimise the database at the physical level, by clustering on the primary key index of the pages table. This will speed reads based on version and title information.

Thanks

Thank yous go out to Juha-Mikko Ahonen for the initial patch, Oleg Bartunov and Teodor Sigaev for their TSearch2 work, and of course to the Tavi and PostgreSQL teams for creating great software. Finally, thanks to VEMS for providing the hosting and a testbed for this patch.


Software:

Tavi Wiki:

Apache HTTP server:

PHP engine:

MySQL database:

PostgreSQL database:

Initial PostgreSQL patch for Tavi:

TSearch2 full-text-index search engine for PostgreSQL: