Akonadi on a MySQL Server

Wouter described how to get Akonadi (the back-end for KDE PIM) to use PostgreSQL [1].

I don’t agree with his “MySQL is a toy” sentiment. But inspired by his post I decided to convert some of my systems to use a MySQL instance running on a server instead of one instance for each user. In the default configuration you have 140M of disk space and 200M of RAM used by each user for a private MySQL installation which has about 24K of data (at least at the moment on systems I run, maybe more in future).

Here’s some pseudo shell script to dump the database and get a new config:

mysqldump --socket=$HOME/.local/share/akonadi/socket-$HOSTNAME/mysql.socket akonadi > dump.sql
akonadictl stop
rm -rf .config/akonadi
rm -rf .local/share/akonadi
mkdir .config/akonadi
cat > .config/akonadi/akonadiserverrc <<EOF
[%General]
Driver=QMYSQL
SizeThreshold=4096
ExternalPayload=false

[QMYSQL]
Name=${USER}_akonadi
Host=IP_OR_HOST
User=$USER
Password=$PASS
StartServer=false
Options=
ServerPath=/usr/sbin/mysqld

[Debug]
Tracer=null
EOF

Then with DBA privs you need to run the following in the mysql client:

create database $USER_akonadi;
GRANT ALL PRIVILEGES ON $USER_akonadi.* to '$USER'@'%' IDENTIFIED BY '$PASS';

Then run the following to import the SQL data:

mysql $USER_akonadi < dump.sql

Ideally that would be it, but on my test installation (Debian/Squeeze MySQL server and Debian/Unstable KDE workstations) I needed to run the following SQL commands to deal with some sort of case problem.
rename table schemaversiontable to SchemaVersionTable;
rename table resourcetable to ResourceTable;
rename table collectionattributetable to CollectionAttributeTable;
rename table collectionmimetyperelation to CollectionMimeTypeRelation;
rename table collectionpimitemrelation to CollectionPimItemRelation;
rename table collectiontable to CollectionTable;
rename table flagtable to FlagTable;
rename table mimetypetable to MimeTypeTable;
rename table parttable to PartTable;
rename table pimitemflagrelation to PimItemFlagRelation;
rename table pimitemtable to PimItemTable;

I am not using any PIM features other than the AddressBook (which hasn’t been working well for a while), so I’m not sure that this is working correctly. But I would prefer that something I don’t use and is probably broken take up disk space and RAM on a server instead of a workstation anyway.

Backing up MySQL

I run a number of MySQL databases, the number of mysqld installations that I run is something like 8, but I may have forgotten some. With the number of servers that I run on a “do nothing except when it breaks” basis it’s difficult to remember the details. The number of actual databases that I run would be something like 30, four databases running on a database server (not counting “mysql” is fairly common. Now I need to maintain some sort of automated backup of these, this fact became obvious to me a couple of days ago when I found myself trying to recreate blog entries and comments from Google’s cache…

There are two types of database that I run. There are ones of significant size (more than 1GB) and tiny ones – I don’t think I run any database which has a MySQL dump file that is more than 20M and less than 2G in size.

For the machines with small databases I have the following script run daily from a cron job (with db1 etc replaced with real database names and /mysql-backup replaced by something more appropriate). The “--skip-extended-insert” option allows the possibility of running diff on the dump files but at the cost of increasing file size, when the raw file size is less than 20M this overhead doesn’t matter – and gzip should handle some extra redundancy well.

#!/bin/bash -e
for n in db1 etc ; do
  /usr/bin/mysqldump --skip-extended-insert $n | gzip -9 > /mysql-backup/$n-`date +%Y-%m-%d`.gz
done

Then I have a backup server running the following script from a cron job to copy all the dump files off the machines.

#!/bin/bash -e
cd /backup-store
for n in server1 server2 ; do
  scp $n:/mysql-backup/*-`date +%Y-%m-%d`.gz $n
done

This script relies on being run after the script that generates the dump files. Which is a little more tricky than it should be, it’s a pity that cron jobs can’t be set to have UTC run times. I could have run the dumps more frequently and used rsync to transfer the data, but it seems that the risk of losing one day’s worth of data is acceptable. For my blog I can get any posts that I might lose from Planet installations in that time period.

For the bigger databases my backup method starts by putting the database and the binary log files on the same filesystem – not /var. This requires some minor hackery of the MySQL configuration. Then I use rsync to copy the contents of an LVM snapshot of the block device. The risks of data consistency problems involved in doing this should be no greater than the risks from an unexpected power fluctuation – and the system should be able to recover from that without any problems.

My experience with MySQL dumps is that they take too long and too much system resources for large databases so I only use them for backing up small databases (where a dump can be completed in a matter of seconds so even without using a transaction it doesn’t hurt).

I Won’t Use Drizzle

A couple of days ago I attended a lecture about the Drizzle database server [1].

Drizzle is a re-write of MySQL for use in large web applications. It is only going to run on 64bit platforms because apparently everyone uses 64bit servers – except of course people who are Amazon EC2 customers as the $0.10 per hour instances in EC2 are all 32bit. It’s also designed to use large amounts of RAM for more aggressive caching, it’s being optimised for large RAM at the expense of performance on systems with small amounts of RAM. This is OK if you buy one (or many) new servers to dedicate to the task of running a database. But if a database is one of many tasks running on the machine, or if the machine is a Xen instance then this isn’t going to be good.

There are currently no plans to support replication between MySQL and Drizzle databases (although it would not be impossible to write the support).

The good news is that regular MySQL development will apparently continue in the same manner as before, so people who have small systems, run on Xen instances, or use EC2 can keep using that. Drizzle seems just aimed at people who want to run very large sharded databases.

Now I just wish that they would introduce checksums on all data transfers and stores into MySQL. I consider that to be a really significant feature of Drizzle.

Purging an RT Database

I had a problem where the email address circsales@washpost.com spammed a Request Tracker (RT) [1] installation (one of the rules for running a vaction program is that you never respond twice to the same address, another rule is that you never respond to automatically generated messages).

Deleting these tickets was not easy, the RT web interface only supports deleting 50 tickets at a time.

To delete them I first had to find the account ID in RT, the following query does that:
select id from Users where EmailAddress='circsales@washpost.com';

Then to mark the tickets as deleted I ran the following SQL command (where X was the ID):
update Tickets set Status='deleted' where Creator=X;

Finally to purge the deleted entries from the database (which was growing overly large) I used the RTx-Shredder [2] tool. RTx-Shredder doesn’t seem to support deleting tickets based on submitter, which is why I had to delete them first.

I am currently using the following command to purge the tickets. The “limit,500” directive tells rtx-shredder to remove 500 tickets at one time (the default is to only remove 10 tickets).
./sbin/rtx-shredder --force --plugin 'Tickets=status,deleted;limit,500'

There are currently over 34,000 deleted tickets to remove, and rtx-shredder is currently proceeding at a rate of 9 tickets per minute, so it seems that it will take almost three days of database activity to clear the tickets out.

I also need to purge some tickets that have been resolved for a long time, I’m running the following command to remove them:
./sbin/rtx-shredder --force --plugin 'Tickets=status,resolved;updated_before,2008-03-01 01:01:34;limit,500'

With both the rtx-shredder commands running at once I’m getting a rate of 15 tickets per minute, so it seems that the bottleneck is more related to rtx-shredder than MySQL (which is what I expected). Although with two copies running at once I have mysqld listed as taking about 190% of CPU (two CPUs running two capacity). The machine in question has two P4 CPUs with hyper-threading enabled, so maybe running two copies of rtx-shredder causes mysqld to become CPU bottlenecked. I’m not sure how to match up CPU use as reported via top to actual CPU power in a system with hyper-threading (the hyper-threaded virtual CPUs do not double the CPU power). I wonder if this means that the indexes on the RT tables are inadequate to the task.

I tried adding the following indexes (as suggested in the rtx-shredder documentation), but it didn’t seem to do any good – it might have improved performance by 10% but that could be due to sampling error.

CREATE INDEX SHREDDER_CGM1 ON CachedGroupMembers(MemberId, GroupId, Disabled);
CREATE INDEX SHREDDER_CGM2 ON CachedGroupMembers(ImmediateParentId, MemberId);
CREATE UNIQUE INDEX SHREDDER_GM1 ON GroupMembers(MemberId, GroupId);
CREATE INDEX SHREDDER_TXN1 ON Transactions(ReferenceType, OldReference);
CREATE INDEX SHREDDER_TXN2 ON Transactions(ReferenceType, NewReference);
CREATE INDEX SHREDDER_TXN3 ON Transactions(Type, OldValue);
CREATE INDEX SHREDDER_TXN4 ON Transactions(Type, NewValue);

MySQL security in Debian

Currently there is a problem with the MySQL default install in Debian/Etch (and probably other distributions too). It sets up “root” with dba access with no password by default, the following mysql command will give a list of all MySQL accounts with Grant_priv access (one of the capabilities that gives great access to the database server) and shows their hashed password (as a matter of procedure I truncated the hash for my debian-sys-maint account). As you can see the “root” and “debian-sys-maint” accounts have such access. The debian-sys-maint account is used for Debian package management tools and it’s password is stored in the /etc/mysql/debian.cnf file.

$ echo "select Host,User,Password from user where Grant_priv='y'" | mysql -u root mysql
Host    User    Password
localhost       root
aeon    root
localhost       debian-sys-maint        *882F90515FCEE65506CBFCD7

It seems likely that most people who have installed MySQL won’t realise this problem and will continue to run their machine in that manner, this is a serious issue for multi-user machines. There is currently Debian bug #418672 about this issue. In my tests this issue affects Etch machines as well as machines running Unstable.