jlouis' Ramblings

Musings on tech, software, and other things

An Evil Postgres Bug

So, I have this QLGlicko project. It consists of a web scraper which takes in duel matches played in the game of Quake Live—and stores them in a Postgres database. These duels are then analyzed and I run the Glicko 2 (see http://glicko.net/) ranking system on them to tell people who are currently the best player on a given map.

The system works well, but I was tired of maintaining it on a Linux machine for several reasons. One, Linux is hard to maintain and things change underneath faster than I like. Two, I have much more experience maintaining FreeBSD machines. Three, I can get proper built-in ZFS support on FreeBSD. Hence I decided to move the database onto FreeBSD.

Most of the migration went well. I dumped the database, installed FreeBSD, installed Postgres 9.3 (to avoid having to toy with SysV shared memory). Tuned the database. And tried to import. First problem is that Linux used the fake locale called “en_dk.utf-8”—so I had to fake it and install that. Next problem was that I used the “uuid-ossp” EXTENSION in Linux. This one has several problems on FreeBSD, mostly related to PIC code and the fact that it will crash the database.

So I changed the code to inject the UUID data rather than producing it on the database side. And then I suddenly had duplicate key constraint problems on an UNIQUE INDEX. Running SELECT queries showed no such trouble with the index and there were no entries with more than a single entry. This is to be expected due to the index being UNIQUE. But the insertion or update code would soundly fail. I was wondering what went wrong and began digging.

Here is what happened: the “uuid-ossp” code had managed to insert a row into the database before it crashed. So this went under the radar of the index and now we had trouble! There is an extra entry in the database—violating the index—but it does not detect that.

Worse, when we query, the data can be served from the index alone, since most data in the table are frozen. This means we begin getting really evil and odd violations.

Running a DROP on the index and then trying to recreate the index again makes the error show up. And now I understood what went wrong.

The solution was pretty simple: Restore the database from a backup. Remove any trace of “uuid-ossp” and then start the database again. Now the index works as expected and the database doesn’t crash.

TL;DR—Beware of extensions that doesn’t work on your platform of choice!