Linux, politics, and other interesting things
Here are some basic things to do when debugging storage performance problems on Linux. It’s deliberately not an advanced guide, I might write about more advanced things in a later post.
When a hard drive is failing it often has to read sectors several times to get the right data, this can dramatically reduce performance. As most hard drives aren’t monitored properly (email or SMS alerts on errors) it’s quite common for the first notification about an impending failure to be user complaints about performance.
View your kernel message log with the dmesg command and look in /var/log/kern.log (or wherever your system is configured to store kernel logs) for messages about disk read errors, bus resetting, and anything else unusual related to the drives.
If you use an advanced filesystem like BTRFS or ZFS there are system commands to get filesystem information about errors. For BTRFS you can run “btrfs device stats MOUNTPOINT” and for ZFS you can run “zpool status“.
Most performance problems aren’t caused by failing drives, but it’s a good idea to eliminate that possibility before you continue your investigation.
One other thing to look out for is a RAID array where one disk is noticeably slower than the others. For example if you have a RAID-5 or RAID-6 array every drive should have almost the same number of reads and writes, if one disk in the array is at 99% performance capacity and the other disks are at 5% then it’s an indication of a failing disk. This can happen even if SMART etc don’t report errors.
The iostat program in the Debian sysstat package tells you how much IO is going to each disk. If you have physical hard drives sda, sdb, and sdc you could run the command “iostat -x 10 sda sdb sdc” to tell you how much IO is going to each disk over 10 second periods. You can choose various durations but I find that 10 seconds is long enough to give results that are useful.
By default iostat will give stats on all block devices including LVM volumes, but that usually gives too much data to analyse easily.
The most useful things that iostat tells you are the %util (the percentage utilisation – anything over 90% is a serious problem), the reads per second “r/s“, and the writes per second “w/s“.
The parameters to iostat for block devices can be hard drives, partitions, LVM volumes, encrypted devices, or any other type of block device. After you have discovered which block devices are nearing their maximum load you can discover which of the partitions, RAID arrays, or swap devices on that disk are causing the load in question.
The iotop program in Debian (package iotop) gives a display that’s similar to that of top but for disk io. It generally isn’t essential (you can run “ps ax|grep D” to get most of that information), but it is handy. It will tell you which programs are causing IO on a busy filesystem. This can be good when you have a busy system and don’t know why. It isn’t very useful if you have a system that is used for one task, EG a database server that is known to be busy doing database stuff.
It’s generally a good idea to have sysstat and iotop installed on all systems. If a system is experiencing severe performance problems you might not want to wait for new packages to be installed.
In Debian the sysstat package includes the sar utility which can give historical information on system load. One benefit of using sar for diagnosing performance problems is that it shows you the time of day that has the most load which is the easiest time to diagnose performance problems.
Swap use sometimes confuses people. In many cases swap use decreases overall disk use, this is the design of the Linux paging algorithms. So if you have a server that accesses a lot of data it might swap out some unused programs to make more space for cache.
When you have multiple virtual machines on one system sharing the same disks it can be difficult to determine the best allocation for RAM. If one VM has some applications allocating a lot of RAM but not using it much then it might be best to give it less RAM and force those applications into swap so that another VM can cache all the data it accesses a lot.
The important thing is not the amount of swap that is allocated but the amount of IO that goes to the swap partition. Any significant amount of disk IO going to a swap device is a serious problem that can be solved by adding more RAM.
The ratio of reads to writes depends on the applications and the amount of RAM. Some applications can have most of their reads satisfied from cache. For example an ideal configuration of a mail server will have writes significantly outnumber reads (I’ve seen ratios of 5:1 for writes to reads on real mail servers). Ideally a mail server will cache all new mail for at least an hour and as the most prolific users check their mail more frequently than that most mail will be downloaded before it leaves the cache. If you have a mail server with reads outnumbering writes then it needs more RAM. RAM is cheap nowadays so if you don’t want to compete with Gmail it should be cheap to buy enough RAM to cache all recent mail.
The ratio of reads to writes is important because it’s one way of quickly determining if you have enough RAM and adding RAM is often the cheapest way of improving performance.
One common performance problem on systems with multiple disks is having more load going to some disks than to others. This might not be a problem (EG having cron jobs run on disks that are under heavy load while the web server accesses data from lightly loaded disks). But you need to consider whether it’s desirable to have some disks under more load than others.
The simplest solution to this problem is to just have a single RAID array for all data storage. This is also the solution that gives you the maximum available disk space if you use RAID-5 or RAID-6.
A more complex option is to use some SSDs for things that require performance and disks for things that don’t. This can be done with the ZIL and L2ARC features of ZFS or by just creating a filesystem on SSD for the data that is most frequently accessed.
I’m sure that I missed something, please let me know of any other basic things to do – or suggestions for a post on more advanced things.