Archives

Categories

More DRBD Performance tests

I’ve previously written Some Notes on DRBD [1] and a post about DRBD Benchmarking [2].

Previously I had determined that replication protocol C gives the best performance for DRBD, that the batch-time parameters for Ext4 aren’t worth touching for a single IDE disk, that barrier=0 gives a massive performance boost, and that DRBD gives a significant performance hit even when the secondary is not connected. Below are the results of some more tests of delivering mail from my Postal benchmark to my LMTP server which uses the Dovecot delivery agent to write it to disk, the rates are in messages per minute where each message is an average of 70K in size. The ext4 filesystem is used for all tests and the filesystem features list is “has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize“.

p4-2.8
Default Ext4 1663
barrier=0 2875
DRBD no secondary al-extents=7 645
DRBD no secondary default 2409
DRBD no secondary al-extents=1024 2513
DRBD no secondary al-extents=3389 2650
DRBD connected 1575
DRBD connected al-extents=1024 1560
DRBD connected al-extents=1024 Gig-E 1544

The al-extents option determines the size of the dirty areas that need to be resynced when a failed node rejoins the cluster. The default is 127 extents of 4M each for a block size of 508MB to be synchronised. The maximum is 3389 for a synchronisation block size of just over 13G. Even with fast disks and gigabit Ethernet it’s going to take a while to synchronise things if dirty zones are 13GB in size. In my tests using the maximum size of al-extents gives a 10% performance benefit in disconnected mode while a size of 1024 gives a 4% performance boost. Changing the al-extents size seems to make no significant difference for a connected DRBD device.

All the tests on connected DRBD devices were done with 100baseT apart from the last one which was a separate Gigabit Ethernet cable connecting the two systems.

Conclusions

For the level of traffic that I’m using it seems that Gigabit Ethernet provides no performance benefit, the fact that it gave a slightly lower result is not relevant as the difference is within the margin of error.

Increasing the al-extents value helps with disconnected performance, a value of 1024 gives a 4% performance boost. I’m not sure that a value of 3389 is a good idea though.

The ext4 barriers are disabled by DRBD so a disconnected DRBD device gives performance that is closer to a barrier=0 mount than a regular ext4 mount. With the significant performance difference between connected and disconnected modes it seems possible that for some usage scenarios it could be useful to disable the DRBD secondary at times of peak load – it depends on whether DRBD is used as a really current backup or a strict mirror.

Future Tests

I plan to do some tests of DRBD over Linux software RAID-1 and tests to compare RAID-1 with and without bitmap support. I also plan to do some tests with the BTRFS filesystem, I know it’s not ready for production but it would still be nice to know what the performance is like.

But I won’t use the same systems, they don’t have enough CPU power. In my previous tests I established that a 1.5GHz P4 isn’t capable of driving the 20G IDE disk to it’s maximum capacity and I’m not sure that the 2.8GHz P4 is capable of running a RAID to it’s capacity. So I will use a dual-core 64bit system with a pair of SATA disks for future tests. The difference in performance between 20G IDE disks and 160G SATA disks should be a lot less than the performance difference between a 2.8GHz P4 and a dual-core 64bit CPU.

2 comments to More DRBD Performance tests

  • What version of DRBD are you trying this with?

    In 8.3.9 a feature was introduced that will disable activity log updates when all blocks of an unconnected device is are out of sync. If you want to get to this point of not updating the activity log quicker you can “invalidate-remote” on an unconnected primary. This should help with the performance hit you’re seeing with a disconnected secondary, however keep in mind you’ll have to do a full sync once the resources are connected again.

    Also when setting the al-extents set it to a prime number, I doubt this will make a world of difference but every little bit helps.

    If you want to post your resource and global configs I may be able to provide additional feedback.

    Brian

  • etbe

    I’m using Debian/Squeeze which has kernel 2.6.32 and DRBD kernel code 8.3.7 as well as version 8.3.7 of the user-space code.