DRBD Benchmarking

I’ve got some performance problems with a mail server that’s using DRBD so I’ve done some benchmark tests to try and improve things. I used Postal for testing delivery to an LMTP server [1]. The version of Postal I released a few days ago had a bug that made LMTP not work, I’ll release a new version to fix that next time I work on Postal – or when someone sends me a request for LMTP support (so far no-one has asked for LMTP support so I presume that most users don’t mind that it’s not yet working).

The local spool on my test server is managed by Dovecot, the Dovecot delivery agent stores the mail and the Dovecot POP and IMAP servers provide user access. For delivery I’m using the LMTP server I wrote which has been almost ready for GPL release for a couple of years. All I need to write is a command-line parser to support delivery options for different local delivery agents. Currently my LMTP server is hard-coded to run /usr/lib/dovecot/deliver and has it’s parameters hard-coded too. As an aside if someone would like to contribute some GPL C/C++ code to convert a string like “/usr/lib/dovecot/deliver -e -f %from% -d %to% -n” into something that will populate an argv array for execvp() then that would be really appreciated.

Authentication is to a MySQL server running on a fast P4 system. The MySQL server was never at any fraction of it’s CPU or disk IO capacity so using a different authentication system probably wouldn’t have given different results. I used MySQL because it’s what I’m using in production. Apart from my LMTP server and the new version of Postal all software involved in the testing is from Debian/Squeeze.

The Tests

All tests were done on a 20G IDE disk. I started testing with a Pentium-4 1.5GHz system with 768M of RAM but then moved to a Pentium-4 2.8GHz system with 1G of RAM when I found CPU time to be a bottleneck with barrier=0. All test results are for the average number of messages delivered per minute for a 19 minute test run where the first minute’s results are discarded. The delivery process used 12 threads to deliver mail.

P4-1.5 p4-2.8
Default Ext4 1468 1663
Ext4 max_batch_time=30000 1385 1656
Ext4 barrier=0 1997 2875
Ext4 on DRBD no secondary 1810 2409

When doing the above tests the 1.5GHz system was using 100% CPU time when the filesystem was mounted with barrier=0, about half of that was for system (although I didn’t make notes at the time). So the testing on the 1.5GHz system showed that increasing the Ext4 max_batch_time number doesn’t give a benefit for a single disk, that mounting with barrier=0 gives a significant performance benefit, and that using DRBD in disconnected mode gives a good performance benefit through forcing barrier=0. As an aside I wonder why they didn’t support barriers on DRBD given all the other features that they have for preserving data integrity.

The tests with the 2.8GHz system demonstrate the performance benefits of having adequate CPU power, as an aside I hope that Ext4 is optimised for multi-core CPUs because if a 20G IDE disk needs a 2.8GHz P4 then modern RAID arrays probably require more CPU power than a single core can provide.

It’s also interesting to note that a degraded DRBD device (where the secondary has never been enabled) only gives 84% of the performance of /dev/sda4 when mounted with barrier=0.

p4-2.8
Default Ext4 1663
Ext4 max_batch_time=30000 1656
Ext4 min_batch_time=15000,max_batch_time=30000 1626
Ext4 max_batch_time=0 1625
Ext4 barrier=0 2875
Ext4 on DRBD no secondary 2409
Ext4 on DRBD connected C 1575
Ext4 on DRBD connected B 1428
Ext4 on DRBD connected A 1284

Of all the options for batch times that I tried it seemed that every change decreased the performance slightly but as the greatest decrease in performance was only slightly more than 2% it doesn’t matter much.

One thing that really surprised me was the test results from different replication protocols. The DRBD replication protocols are documented here [2]. Protocol C is fully synchronous – a write request doesn’t complete until the remote node has it on disk. Protocol B is memory synchronous, the write is complete when it’s on a local disk and in RAM on the other node. Protocol A is fully asynchronous, a write is complete when it’s on a local disk. I had expected protocol A to give the best performance as it has lower latency for critical write operations and for protocol C to be the worst. My theory is that DRBD has a performance bug for the protocols that the developers don’t recommend.

One other thing I can’t explain is that according to iostat the data partition on the secondary DRBD node had almost 1% more sectors written than the primary and the number of writes was more than 1% greater on the secondary. I had hoped that with protocol A the writes would be combined on the secondary node to give a lower disk IO load.

I filed Debian bug report #654206 about the kernel not exposing the correct value for max_batch_time. The fact that no-one else has reported that bug (which is in kernels from at least 2.6.32 to 3.1.0) is an indication that not many people have found it useful.

Conclusions

When using DRBD use protocol C as it gives better integrity and better performance.

Significant CPU power is apparently required for modern filesystems. The fact that a Maxtor 20G 7200rpm IDE disk [3] can’t be driven at full speed by a 1.5GHz P4 was a surprise to me.

DRBD significantly reduces performance when compared to a plain disk mounted with barrier=0 (for a fair comparison). The best that DRBD could do in my tests was 55% of native performance when connected and 84% of native performance when disconnected.

When comparing a cluster of cheap machines running DRBD on RAID-1 arrays to a single system running RAID-6 with redundant PSUs etc the performance loss from DRBD is a serious problem that can push the economic benefit back towards the single system.

Next I will benchmark DRBD on RAID-1 and test the performance hit of using bitmaps with Linux software RAID-1.

If anyone knows how to make a HTML table look good then please let me know. It seems that the new blog theme that I’m using prevents borders.

Update:

I mentioned my Debian bug report about the mount option and the fact that it’s all on Debian/Squeeze.

Released Bonnie++ 1.96

I have released version 1.96 of Bonnie++ in the experimental branch [1].

The main changes are:

  1. Made it compile on Solaris again (version 1.95 broke that)
  2. Now supports more files for the small file creation test (16^10 files is the limit), and it handles an overflow better. Incidentally this will in some situations change the results so I changed the result version in the CSV file.
  3. Fixed some bugs in bon_csv2html and added some new features to give nicer looking displays and correct colors

I still plan to add support for semi-random data and validation of data when reading it back before making a 2.0 release. But 2.0 is getting close.

Vibration and Strange SATA Performance

Almost two years ago I blogged about a strange performance problem with SATA disks [1]. The problem was that certain regions of a disk gave poor linear read performance on some machines, but performed well on machines which appeared to be identical. I discovered what the problem was shortly after that but was prevented from disclosing the solution due to an SGI NDA. The fact that SGI now no longer exists as a separate company decreases my obligations under the NDA. The fact that the sysadmins of the University of Toronto published all the most important data entirely removes my obligations in this regard [2].

In their Wiki they write “after SGI installed rubber grommits around the 5 or 6 tiny fans in the xe210 nodes, the read and write plots now look like” and then some graphs showing good disk performance appear.

The problem was that a certain brand and model of disk was particularly sensitive to vibrations. When that model of disk was installed in some machines then the vibrations would interfere with disk reads. It seems that there was some sort of harmonic frequency between the vibration of the disk and that of the cooling fans which explains why some sections of the disk were read slowly and some gave normal performance (my previous post has the graphs which show a pattern). Some other servers of the same make and model didn’t have that problem, so it seemed that some slight manufacturing differences in the machines determined whether the vibration would affect the disk performance.

One thing that I’ve been meaning to do is to test the performance of disks while being vibrated. I was thinking of getting a large bass speaker, a powerful amplifier, and using the sound hardware in a PC to produce a range of frequencies. Then having the hard disk securely attached to a piece of plywood which would be in front of the speaker. But as I haven’t had time to do this over the last couple of years it seems unlikely that I will do it any time soon. Hopefully this blog post will inspire someone to do such tests. One thing to note if you want to do this is that it’s quite likely to damage the speaker, powerful bass sounds that are sustained can melt parts of the coil in a speaker. So buy a speaker second-hand.

If someone in my region (Melbourne) wants to try this then I can donate some old IDE disks. I can offer advice on how to run the tests for anyone who is interested.

Also it’s worth considering that systems which make less noise might deliver better performance.

New version of Bonnie++ and Violin Memory

I have just released version 1.03e of my Bonnie++ benchmark [1]. The only change is support for direct IO in Bonnie++ (via the -D command-line parameter). The patch for this was written by Dave Murch of Violin Memory [2]. Violin specialise in 2RU storage servers based on DRAM and/or Flash storage. One of their products is designed to handle a sustained load of 100,000 write IOPS (in 4K blocks) and 200,000 read IOPS per second for it’s 10 year life (but it’s not clear whether you could do 100,000 writes AND 200,000 reads in a second). The only pricing information that they have online is a claim that flash costs less than $50 per gig, while that would be quite affordable for dozens of gigs and not really expensive for hundreds of gigs, as they are discussing a device with 4TB capacity it sounds rather expensive – but of course it would be a lot cheaper than using hard disks if you need that combination of capacity and performance.

I wonder how much benefit you would get from using a Violin device to manage the journals for 100 servers in a data center. It seems that 1000 writes per second is near the upper end of the capacity of a 2RU server for many common work-loads, this is of course just a rough estimation based on observations of some servers that I run. If the main storage was on a SAN then using data journaling and putting the journals on a Violin device seems likely to improve latency (data is committed faster and the application can report success to the client sooner) while also reducing the load on the SAN disks (which are really expensive).

Now given that their price point is less than $50 per gig, it seems that a virtual hosting provider could provide really fast storage to their customers for a quite affordable price. $5 per month per gig for flash storage in a virtual hosting environment would be an attractive option for many people. Currently if you have a small service that you want hosted a virtual server is the best way to do it, and as most providers offer little information on the disk IO capacity of their services it seems quite unlikely that anyone has taken any serious steps to prevent high load from one customer from degrading the performance of the rest. With flash storage you not only get a much higher number of writes per second, but one customer writing data won’t seriously impact read speed for other customers (with hard drive one process that does a lot of writes can cripple the performance of processes that do reads).

The experimental versions of Bonnie++ have better support for testing some of these usage scenarios. One new feature is measuring the worst-case latency of all operations in each section of the test run. I will soon release Bonnie++ version 1.99 which includes direct IO support, it should show some significant benefits for all usage cases involving Violin devices, ZFS (when configured with multiple types of storage hardware), NetApp Filers, and other advanced storage options.

For a while I have been dithering about the exact feature list of Bonnie++ 2.x. After some pressure from a contributor to the OpenSolaris project I have decided to freeze the feature list at the current 1.94 level plus direct IO support. This doesn’t mean that I will stop adding new features in the 2.0x branch, but I will avoid doing anything that can change the results. So in future benchmark results made from Bonnie++ version 1.94 can be directly compared to results that will be made from version 2.0 and above. There is one minor issue, new versions of GCC have in the past made differences to some of the benchmark results (the per-character IO test was the main one) – but that’s not my problem. As far as I am concerned Bonnie++ benchmarks everything from the compiler to the mass storage device in terms of disk IO performance. If you compare two systems with different kernels, different versions of GCC, or other differences then it’s up to you to make appropriate notes of what was changed.

This means that the OpenSolaris people can now cease using the 1.0x branch of Bonnie++, and other distributions can do the same if they wish. I have just uploaded version 1.03e to Debian and will request that it goes in Lenny – I believe that it is way too late to put 1.9x in Lenny. But once Lenny is released I will upload version 2.00 to Debian/Unstable and that will be the only version supported in Debian after that time.

New HP Server

I’ve just started work on a new HP server running RHEL5 AS (needs to be AS to support more than 4 DomU’s). While I still have the Xen issues that made me give up using it on Debian [1] (the killer one being that an AMD64 Xen Dom0 would kernel panic on any serious disk IO) but the Xen implementation in RHEL is quite solid.

The first thing I did was run zcav (part of my Bonnie++ benchmark suite) [2] to see how the array performs (as well as ensuring that the entire array actually appears to work). The result is below. For a single disk performance is expected to decrease as you read along the disk (from outer to inner tracks). I don’t know why the performance decreases until the half-way point and then starts with good performance again and again decreases.

zcav results from HP CCISS RAID-6 array

The next thing was to ensure that the machine had RAID-6 (I have been convinced that that using only RAID-5 verges on professional malpractice). As the machine is rented from a hosting company there was no guarantee that they would follow my clear written instructions involving running RAID-6.

The machine is a HP rack-mounted server with a CCISS RAID controller, so to manage the array the command /usr/sbin/hpacucli is used.

The command hpacucli controller all show reveals that there is a “Smart Array P400 in Slot 1“.

The command hpacucli controller slot=1 show gives the following (amongst a lot of other output):
RAID 6 (ADG) Status: Enabled
Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 25% Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 512 MB
Battery Pack Count: 1
Battery Status: OK
SATA NCQ Supported: True

So the write-back cache is enabled, 384M of data is for the write-back cache and 128M is for the read cache (hopefully all for read-ahead – the OS should do all the real caching for reads).

The command hpacucli controller slot=1 array all show reveals that there is one array: “array A (SAS, Unused Space: 0 MB)“.

The command hpacucli controller slot=1 array a show status tells me that the status is “array AOK“.

Finally the command hpacucli controller slot=1 show config gives me the data that I really want at this time and says:
Smart Array P400 in Slot 1 (sn: *****)
array A (SAS, Unused Space: 0 MB)
logicaldrive 1 (820.2 GB, RAID 6 (ADG), OK)

Then it gives all the data on the disks. It would be nice if there was a command to just dump all that. I would like to be able to show the configuration of all controllers with a single command.

Also it would be nice if the fact that hpacucli is the tool to use for managing CCISS RAID arrays when using Linux on HP servers was more widely documented. It took me an unreasonable amount of effort to discover what tool to use for CCISS RAID management.

New ZCAV Development

I have just been running some ZCAV tests on some new supposedly 1TB disks (10^40 bytes is about 931*2^30 so is about 931G according to almost everyone in the computer industry who doesn’t work for a hard disk vendor).

I’ve added a new graph to my ZCAV results page [1] with the results.

One interesting thing that I discovered is that the faster disks can deliver contiguous data at a speed of more than 110MB/s, previously the best I’d seen from a single disk was about 90MB/s. When I first wrote ZCAV the best disks I had to test with all had a maximum speed of about 10MB/s so KB/s was a reasonable unit. Now I plan to change the units to MB/s to make it easier to read the graphs. Of course it’s not that difficult to munge the data before graphing it, but I think that it will give a better result for most users if I just change the units.

The next interesting thing I discovered is that by default GNUplot defaults to using exponential notation at the value of 1,000,000 (or 1e+06). I’m sure that I could override that but it would still make it difficult to read for the users. So I guess it’s time to change the units to GB.

I idly considered using the hard drive manufacturer’s definition of GB so that a 1TB disk would actually display as having 1000GB (the Wikipedia page for Gibibyte has the different definitions [2]). But of course having decimal and binary prefixes used in the X and Y axis of a graph would be a horror. Also the block and chunk sizes used have to be multiples of a reasonably large power of two (at least 2^14) to get reasonable performance from the OS.

The next implication of this is that it’s a bad idea to have a default block size that is not a power of two. The previous block sizes were 100M and 200M (for 1.0x and 1.9x branches respectively). Expressing these as 0.0976G and 0.1953G respectively would not be user-friendly. So I’m currently planning on 0.25G as the block size for both branches.

While changing the format it makes sense to change as many things as possible at once to reduce the number of incompatable file formats that are out there. The next thing I’m considering is the precision. In the past the speed in K/s was an integer. Obviously an integer for the speed in M/s is not going to work well for some of the slower devices that are still in use (EG a 4* CD-ROM drive maxes out at 600KB/s). Of course the accuracy of this is determined by the accuracy of the system clock. The gettimeofday() system call returns the time in micro-seconds. I expect that most systems don’t approach miro-second accuracy. I expect that it’s not worth reporting with a precision that is greater than the accuracy. Then there’s no point in making the precision of the speed any greater than the precision of the time.

Things were easier with the Bonnie++ program when I just reduced the precision as needed to fit in an 80 column display. ;)

Finally I ran my tests on my new Dell T105 system. While I didn’t get time to do as many tests as I desired before putting the machine in production I did get to do a quick test of two disks running at full speed. Previously when testing desktop systems I had not found a system which when run with two disks of the same age as the machine could extract full performance from both disks simultaneously. While the Dell T105 is a server-class system, it is a rather low-end server and I had anticipated that it would lack performance in this regard. I was pleased to note that I could run both 1TB disks at full speed at the same time. I didn’t get a chance to test three or four disks though (maybe for scheduled down-time in the future).

Xen and Swap

The way Xen works is that the RAM used by a virtual machine is not swappable, so the only swapping that happens is to the swap device used by the virtual machine. I wondered whether I could improve swap performance by using a tmpfs for that swap space. The idea is that as only one out of several virtual machines might be using swap space, a tmpfs storage could cache the most recently used data and result in the virtual machine which is swapping heavily taking less time to complete the memory-hungry job.

I decided to test this on Debian/Etch (both DomU and Dom0).

RAM size vs time

Above is a graph of the time taken in seconds (on the Y axis) to complete the command “apt-get -y install psmisc openssh-client openssh-server linux-modules-2.6.18-5-xen-686 file less binutils strace ltrace bzip2 make m4 gcc g++“, while the X axis has the amount of RAM assigned to the DomU in megabytes.

The four graphs are for using a real disk (in this case an LVM logical volume) and for using tmpfs with different amounts of RAM backing it. The numbers 128000, 196000, and 256000 and the numbers of kilobytes of RAM assigned to the Dom0 (which manages the tmpfs). As you can see it’s only below about 20M of RAM that tmpfs provides a benefit. I don’t know why it didn’t provide a benefit with larger amounts of RAM, below 48M the amount of time taken started increasing exponentially and I expected that there was the potential for a significant performance boost.

After finding that the benefits for a single active DomU were not that great I did some tests with three DomU’s running the same APT command. With 16M of RAM and swap on the hard drive it took an average of 408 seconds, but with swap on the tmpfs it took an average of 373 seconds – an improvement of 8.5%. With 32M of RAM the times were 225 and 221 seconds – a 1.8% improvement.

Incidentally to make the DomU boot successfully with less than 30M of RAM I had to use “MODULES=dep” in /etc/initramfs-tools/initramfs.conf. To get it to boot with less than 14M I had to manually hack the initramfs to remove LVM support (I create my initramfs in the Dom0 so it gets drivers that aren’t needed in the DomU). I was unable to get a DomU with 12M of RAM to boot with any reasonable amount of effort (I expect that compiling the kernel without an initramfs would have worked but couldn’t be bothered).

Future tests will have to be on another machine as the machine used for these tests caught on fire – this is really annoying, if someone suggests an extra test I can’t run it.

To plot the data I put the following in a file named “command” and then ran “gnuplot command“:
unset autoscale x
set autoscale xmax
unset autoscale y
set autoscale ymax
set xlabel “MB”
set ylabel “seconds”
plot “128000”
replot “196000”
replot “256000”
set terminal png
set output “xen-cache.png”
replot “disk”

In future when doing such tests I will use “time -p ” (for POSIX format) which means that it displays a count of seconds rather than minutes and seconds (and saves me from running sed and bc to fix things up).

I am idly considering writing a program to exercise virtual memory for the purpose of benchmarking swap on virtual machines.

My raw data is below:
Continue reading “Xen and Swap”

New Bonnie++ Releases

Today I released new versions of my Bonnie++ [1] benchmark. The main new feature (in both the stable 1.03b version and the experimental 1.93d version) is the ability of zcav to write to devices. The feature in question was originally written at the request of some people who had strange performance results when testing SATA disks [2].

Now I plan to focus entirely on the 1.9x branch. I have uploaded 1.03b to Debian/unstable but shortly I plan to upgrade a 1.9x version and to have Lenny include Bonnie++ 2.0x.

One thing to note is that Bonnie++ in the 1.9x branch is multi-threaded which does mean that lower performance will be achieved with some combinations of OS and libc. I think that this is valid as many applications that you will care about (EG MySQL and probably all other modern database servers) will only support a threaded mode of operation (at least for the default configuration) and many other applications (EG Apache) will have a threaded option which can give performance benefits.

In any case the purpose of a benchmark is not to give a high number that you can boast about, but to identify areas of performance that need improvement. So doing things that your OS might not be best optimised for is a feature!

While on this topic, I will never add support for undocumented APIs to the main Bonnie++ and ZCAV programs. The 1.9x branch of Bonnie++ includes a program named getc_putc which is specifically written to test various ways of writing a byte at a time, among other things it uses getc_unlocked() and putc_unlocked() – both of which were undocumented at the time I started using them. Bonnie++ will continue using the locking versions of those functions, last time I tested it meant that the per-char IO tests in Bonnie++ on Linux gave significantly less performance than on Solaris (to the degree that it obviously wasn’t hardware). I think this is fine, everyone knows that IO one character at a time is not optimal anyway so whether your program sucks a little or a lot because of doing such things probably makes little difference.

RAID and Bus Bandwidth

As correctly pointed out by cmot [1] my previous post about software RAID [2] made no mention of bus bandwidth.

I have measured the bus bottlenecks of a couple of desktop machines running IDE disks with my ZCAV [3] benchmark (part of the Bonnie++ suite). The results show that two typical desktop machines had significant bottlenecks when running two disks for contiguous read operations [4]. Here is one of the graphs which shows that when two disks were active (on different IDE cables) the aggregate throughput was just under 80MB/s on a P4 1.5GHz while the disks were capable of delivering up to 120MB/s:
ZCAV result for two 300G disks on a P4 1.5GHz

On a system such as the above P4 using software RAID will give a performance hit when compared to a hardware RAID device which is capable of driving both disks at full speed. I did not benchmark the relative speeds of read and write operations (writing is often slightly slower), but if for the sake of discussion we assume that read and write give the same performance then software RAID would only give 2/3 the performance of a theoretical perfect hardware RAID-1 implementation for large contiguous writes.

On a RAID-5 array the bandwidth for large contiguous writes is the data size multiplied by N/(N-1) (where N is the number of disks), and on a RAID-6 array it is N/(N-2). For the case of a four disk RAID-6 array that would give the same overhead as writing to a RAID-1 and for the case of a minimal RAID-5 array it would be 50% more writes. So from the perspective of “I need X bandwidth, can my hardware deliver it” if I needed 40MB/s of bandwidth for contiguous writes then a 3 disk RAID-5 might work but a RAID-1 definitely would hit a bottleneck.

Given that large contiguous writes to a RAID-1 is a corner case and that minimal sized RAID-5 and RAID-6 arrays are rare in most cases there should not be a significant overhead. As the number of seeks increases the actual amount of data transferred gets quite small. A few years ago I was running some mail servers which had a very intense IO load, four U320 SCSI disks in a hardware RAID-5 array was a system bottleneck – yet the IO was only 600KB/s of reads and 3MB/s of writes. In that case seeks were the bottleneck and write-back caching (which is another problem area for Linux software RAID) was necessary for good performance.

For the example of my P4 system, it is quite obvious that with a four disk software RAID array consisting of disks that are reasonably new (anything slightly newer than the machine) there would be some bottlenecks.

Another problem with Linux software RAID is that traditionally it has had to check the consistency of the entire RAID array in the case of an unexpected power failure. Such checks are the best way to get all disks in a RAID array fully utilised (Linux software RAID does not support reading from all disks in a mirror and checking that they are consistent for regular reads), so of course the issue of a bus bottleneck becomes an issue.

Of course the solution to these problems is to use a server for server tasks and then you will not run out of bus bandwidth so easily. In the days before PCI-X and PCIe there were people running Linux software RAID-0 across multiple 3ware hardware controllers to get better bandwidth. A good server will have multiple PCI buses so getting an aggregate throughput greater than PCI bus bandwidth is possible. Reports of 400MB/s transfer rates using two 64bit PCI buses (each limited to ~266MB/s) were not uncommon. Of course then you run into the same problem, but instead of being limited to the performance of IDE controllers on the motherboard in a desktop system (as in my test machine) you would be limited to the number of PCI buses and the speed of each bus.

If you were to install enough disks to even come close to the performance limits of PCIe then I expect that you would find that the CPU utilisation for the XOR operations is something that you want to off-load. But then on such a system you would probably want the other benefits of hardware RAID (dynamic growth, having one RAID that has a number of LUNs exported to different machines, redundant RAID controllers in the same RAID box, etc).

I think that probably 12 disks is about the practical limit of Linux software RAID due to these issues and the RAID check speed. But it should be noted that the vast majority of RAID installations have significantly less than 12 disks.

One thing that cmot mentioned was a RAID controller that runs on the system bus and takes data from other devices on that bus. Does anyone know of such a device?

Bonnie++ and Postal shirts

Dear lazyweb, I want to design T-Shirts for my Bonnie++ and Postal projects. But representing those projects in a picture seems more difficult than SE Linux (see one of my SE Linux T-Shirt designs below). If you have any conceptual design ideas then please let me know.

Here are my current designs for SE Linux shirts:

Play Machine
t-shirt design with SE Linux play machine root password
SE Linux MLS
t-shirt design with SE Linux MLS logo