A while ago I wrote a blog post debunking the idea that swap should be twice as large as RAM [1]. The issue of swap size has come up for discussion in a mailing list so it seems that it’s worth another blog post.
Table of Contents
Swap Size
In my previous post I suggested making swap space equal to RAM size for systems with less than 1G of RAM and half RAM size for systems with 2-4G of RAM with a maximum of 2G for any system. Generally it’s better to have the kernel Oom handler kill a memory hungry process than to have the entire system go so slow that a hardware reset is needed. I wrote memlockd to lock the important system programs and the shared objects and data files they use into RAM to make it easier to recover from some bad paging situations [2], but even that won’t always make it possible to login to a thrashing system in a reasonable amount of time.
But it’s not always good to reduce swap size. Sometimes if you don’t have enough swap then performance sucks in situations where it would be OK if there was more swap. One factor in this regard is that pages mapped read-only from files are discarded when there is great memory pressure and no swap space to page-out the read-write pages. This can mean that you get thrashing among memory for executables and shared objects while there’s lots of unused data pages in RAM that can’t be paged out. One noteworthy corner case in this regard is Chromium which seems to take a lot of RAM if you have many tabs open for a long time.
Another factor is that there is a reasonable amount of memory allocated which will almost never get used (EG all the X based workstations which have 6 gettys running on virtual consoles which will almost never be used). So a system which has a large amount of RAM relative to it’s use (EG a single-user workstation with 8G of RAM) can still benefit from having a small amount of swap to allow larger caches. One workstation I run has 8G of RAM for a single user and typically has about 200M of the 1G swap space in use. It doesn’t have enough swap to make much difference if really memory hungry programs are run, but having 4G of memory for cache instead of 3.8G might make a difference to system performance. So even systems which can run without using any swap can still be expected to give better performance if there is some swap space for programs that are very inactive.
I have some systems with as much 4G of swap, but they are for corner cases where swap is allocated but there isn’t a lot of paging going on. For example I have a workstation with 3G of RAM and 4G of swap for the benefit of Chromium.
Hardware Developments
Disk IO performance hasn’t increased much when compared to increases in RAM size over the last ~15 years. A low-end 1988 era hard drive could sustain 512KB/s contiguous transfer rates and had random access times of about 28ms. A modern SATA disk will have contiguous transfer rates getting close to 200MB/s and random access times less than 4ms. So we are looking at about 400* the contiguous performance and about 10* the random access since 1988. In 1988 2MB was a lot of RAM for a PC, now no-one would buy a new PC with less than 4G – so we are looking at something like 2000* the RAM size in the same period.
When comparing with 1993 (when I first ran a Linux server 24*7) it was 4M of RAM, maybe 15ms random access, and maybe 1MB/s contiguous IO – approximately double everything I had in 1988. But I’ll use the numbers from 1988 because I’m more certain of them even though I never ran an OS capable of proper memory management on the 1988 PC.
In the most unrealistically ideal situation paging IO will be a factor of 5* more expensive now relatively than it was in 1988 (RAM size increase by a factor of 2000 divided by contiguous IO performance increase by a factor of 400). But in a more realistic situation (random IO) it will be 200* more expensive relatively than it was. In the early 90s a Linux system with swap use equal or greater than RAM could perform well (my recollection is that a system with 4M of RAM and 4M of active swap performed well and the same system with 8M of active swap was usable). But nowadays there’s no chance of a system that has 4G of RAM and 4G of swap in active use being usable in any way unless you have some corner case of an application allocating RAM and not using it much.
Note that 4G of RAM doesn’t make a big system by today’s standards, so it might be reasonable to compare 2M in 1988 to 8G or 16G today. But I think that assuming 4G of RAM for the purpose of comparison makes my point.
Using Lots of Swap
It is probably possible to have some background Chromium tabs using 4G of paged out RAM with good system performance. I’ve had a system use 2G of swap for Chromium and perform well overall. There are surely many other ways that a user can launch processes that use a lot of memory and not use them actively. But this will probably require that the user know how the system works and alter their usage patterns. If you have a lot of swap in use and one application starts to run slowly you really don’t want to flip to another application which would only make things worse.
If you use tmpfs for /tmp then you need enough swap to store everything that is written to /tmp. There is no real upper limit to how much data that could be apart from the fact that applications and users generally don’t want to write big data files that disappear on reboot. I generally only use a tmpfs for /tmp on servers. Programs that run on servers tend not to store a large amount of data in /tmp and that data is usually very temporary, unlike workstation users who store large video files in /tmp and leave them there until the next reboot.
Are there any other common corner cases that allow using a lot of swap space without serious performance problems?
My Swap Space Allocation
When I setup Linux workstations for other people I generally use 1G of swap. For a system that often has multiple users (via the user-switch feature) I use 2G of swap if it only has 3G of RAM (I run many systems which are limited to 3G of RAM by their chipset).
For systems that I personally use I generally allocate 4G of swap, this is to cater to excessive RAM use by Chromium and also because I test out random memory hungry programs and can’t easily predict how much swap space will be needed – a typical end-user desktop system is a lot more predictable than a workstation used for software development.
For servers I generally allocate 512M of swap space. That’s enough to page out things that aren’t used and make space for cache memory. If that turns out to be inadequate then it’s probably best to just let the watchdog daemon reboot the system.
Suspend to disk requires your swap allocation be the same as or bigger than the amount of RAM.
If I were to use Linux memory overcommit then I generally agree with you. But because of the problems associated with Linux memory overcommit and the OOM Killer I always disable it. With overcommit disabled it reintroduces the need to have a larger virtual memory and with it swap space available even if it is extremely unlikely to be used. I never actually use it. It is that small difference between extremely unlikely and never that makes the difference between the two methods.
Your post makes sense to me, but I have a problem with this:
“For servers I generally allocate 512M of swap space. That’s enough to page out things that aren’t used and make space for cache memory. If that turns out to be inadequate then it’s probably best to just let the watchdog daemon reboot the system.”
That sounds like your servers are very small (webservers?) and… reboot the system? Crashing systems is really *something* you want to avoid at all costs on production servers. Certainly the ones that take a *long* time to reboot.
I guess, you haven’t tried to compile chromium or webkit-gtk with debug symbols on ;)
My laptop has 4GB of RAM and 8GB swap, and is running gentoo, so with every chromium update, I have to compile it.
During linking phase of chromium swap usage maxes at about 4GB (with X server, irssi and chromium that has 21 tabs open, running, swap usage is about 4.2GB).
I think, swap size shouldn’t be choosen according to any general rule, instead it should be based on use cases – not every system needs swap, and for some it’s essential.
Nice explanation. I only miss the word “hibernation” in it (the sole reason I have 10GB of swap…)
On non-laptop systems, I do agree with using as few as swap as possible. Some tuning of vm.swappiness may be needed.
Of course, with solid state disks, which nowadays are standard, the numbers look quite differently. I do not say that they change all the arguments, but doing a performance consideration using a plain old spinning disk in 2013 is probably misleading.
Jeremy and Bart-Jan: Good point, that’s a good reason for using larger swap spaces on laptops. Some people apparently use such suspend features on workstations, but I think it’s quite rare.
Bob: I haven’t had any great problems with overcommit, but I agree if you don’t want to use it then that changes things.
Claudio: I agree that forced reboots of servers are generally undesirable. But it’s also undesirable to have a system become unresponsive due to excessive paging and stay unresponsive until a sysadmin gets access to it. The fastest imaginable response by a sysadmin will be a lot slower than a reboot of a DomU on a virtual server. Of course a reboot of a physical server is going to take a lot longer (the BIOS part alone usually takes a minute), but it will probably still compare well to a sysadmin response – especially at night.
aukkras: I haven’t tried compiling those things. I agree that swap size needs to take usage into account and that development workstations need a lot more.
Patrick: Yes, SSD does change things. However one issue with SSD is that few people have the confidence in the technology to use it much for swap even though all the theoretical calculations indicate that if the devices work according to spec then it should never be a problem.
Patrick: I’ve just done a quick Bonnie++ test comparing an Intel 120G SSD running BTRFS to a pair of 3TB disks running BTRS RAID-1 and Ext4 with Linux Software RAID-1 (both BTRFS and Ext4+Software RAID gave about the same performance). The most significant difference in the test results was for random seeks which was about 7* faster on the SSD.
So it seems that if we count SSD as the modern disk for swap then instead of there being a factor of 200 disparity between improvements in random seek speed and RAM size we would have a factor of about 30. That is still very significant.