Table of Contents
The Need for Speed
One of my clients has an important server running ZFS. They need to have a filesystem that detects corruption, while regular RAID is good for the case where a disk gives read errors it doesn’t cover the case where a disk returns bad data and claims it to be good (which I’ve witnessed in BTRFS and ZFS systems). BTRFS is good for the case of a single disk or a RAID-1 array but I believe that the RAID-5 code for BTRFS is not sufficiently tested for business use. ZFS doesn’t perform very well due to the checksums on data and metadata requiring multiple writes for a single change which also causes more fragmentation. This isn’t a criticism of ZFS, it’s just an engineering trade-off for the data integrity features.
ZFS supports read-caching on a SSD (the L2ARC) and write-back caching (ZIL). To get the best benefit of L2ARC and ZIL you need fast SSD storage. So now with my client investigating 10 gigabit Ethernet I have to investigate SSD.
For some time SSDs have been in the same price range as hard drives, starting at prices well below $100. Now there are some SSDs on sale for as little as $50. One issue with SATA for server use is that SATA 3.0 (which was released in 2009 and is most commonly used nowadays) is limited to 600MB/s. That isn’t nearly adequate if you want to serve files over 10 gigabit Ethernet. SATA 3.2 was released in 2013 and supports 1969MB/s but I doubt that there’s much hardware supporting that. See the SATA Wikipedia page for more information.
Another problem with SATA is getting the devices physically installed. My client has a new Dell server that has plenty of spare PCIe slots but no spare SATA connectors or SATA power connectors. I could have removed the DVD drive (as I did for some tests before deploying the server) but that’s ugly and only gives 1 device while you need 2 devices in a RAID-1 configuration for ZIL.
M.2
M.2 is a new standard for expansion cards, it supports SATA and PCIe interfaces (and USB but that isn’t useful at this time). The wikipedia page for M.2 is interesting to read for background knowledge but isn’t helpful if you are about to buy hardware.
The first M.2 card I bought had a SATA interface, then I was unable to find a local company that could sell a SATA M.2 host adapter. So I bought a M.2 to SATA adapter which made it work like a regular 2.5″ SATA device. That’s working well in one of my home PCs but isn’t what I wanted. Apparently systems that have a M.2 socket on the motherboard will usually take either SATA or NVMe devices.
The most important thing I learned is to buy the SSD storage device and the host adapter from the same place then you are entitled to a refund if they don’t work together.
The alternative to the SATA (AHCI) interface on an M.2 device is known as NVMe (Non-Volatile Memory Express), see the Wikipedia page for NVMe for details. NVMe not only gives a higher throughput but it gives more command queues and more commands per queue which should give significant performance benefits for a device with multiple banks of NVRAM. This is what you want for server use.
Eventually I got a M.2 NVMe device and a PCIe card for it. A quick test showed sustained transfer speeds of around 1500MB/s which should permit saturating a 10 gigabit Ethernet link in some situations.
One annoyance is that the M.2 devices have a different naming convention to regular hard drives. I have devices /dev/nvme0n1 and /dev/nvme1n1, apparently that is to support multiple storage devices on one NVMe interface. Partitions have device names like /dev/nvme0n1p1 and /dev/nvme0n1p2.
Power Use
I recently upgraded my Thinkpad T420 from a 320G hard drive to a 500G SSD which made it faster but also surprisingly quieter – you never realise how noisy hard drives are until they go away. My laptop seemed to feel cooler, but that might be my imagination.
The i5-2520M CPU in my Thinkpad has a TDP of 35W but uses a lot less than that as I almost never have 4 cores in use. The z7k320 320G hard drive is listed as having 0.8W “low power idle” and 1.8W for read-write, maybe Linux wasn’t putting it in the “low power idle” mode. The Samsung 500G 850 EVO SSD is listed as taking 0.4W when idle and up to 3.5W when active (which would not be sustained for long on a laptop). If my CPU is taking an average of 10W then replacing the hard drive with a SSD might have reduced the power use of the non-screen part by 10%, but I doubt that I could notice such a small difference.
I’ve read some articles about power use on the net which can be summarised as “SSDs can draw more power than laptop hard drives but if you do the same amount of work then the SSD will be idle most of the time and not use much power”.
I wonder if the SSD being slightly thicker than the HDD it replaced has affected the airflow inside my Thinkpad.
From reading some of the reviews it seems that there are M.2 storage devices drawing over 7W! That’s going to create some cooling issues on desktop PCs but should be OK in a server. For laptop use they will hopefully release M.2 devices designed for low power consumption.
The Future
M.2 is an ideal format for laptops due to being much smaller and lighter than 2.5″ SSDs. Spinning media doesn’t belong in a modern laptop and using a SATA SSD is an ugly hack when compared to M.2 support on the motherboard.
Intel has released the X99 chipset with M.2 support (see the Wikipedia page for Intel X99) so it should be commonly available on desktops in the near future. For most desktop systems an M.2 device would provide all the storage that is needed (or 2*M.2 in a RAID-1 configuration for a workstation). That would give all the benefits of reduced noise and increased performance that regular SSDs provide, but with better performance and fewer cables inside the PC.
For a corporate desktop PC I think the ideal design would have only M.2 internal storage and no support for 3.5″ disks or a DVD drive. That would allow a design that is much smaller than a current SFF PC.
SSDs in ZFS for ZIL are very comparable to Ceph Journal device, so there a good guides out, one of the is the every increasing list of devices:
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
The NVMe devices require special BIOS support to be bootable and not all boards have that yet. But the PCIe M.2 SSDs do not (and also have the added benefit of just showing up under /dev like all other drives). There are some performance differences, NVMe are better but the PCIe ones are often easier to deal with.
Drives
* Samsung XP941 is a PCIe 2.0 x4 device
* Samsung SM951 is a PCIe 3.0 x4 device
* Samsung PM951 is a PCIe 3.0 x4 device
* Intel DC S3500 is a PCIe 3.0 x4 device
Adapters
* http://www.mfactors.com/m2p4a-pcie-x4-to-m-2-ngff-ssd-adapter/
* http://www.mfactors.com/m-2-pcie-to-pcie-3-0-x4-adapter/
* http://www.mfactors.com/m2p4s-m-2-ngff-pcie-base-ssd-to-pcie-x4-adapter/
* http://addonics.com/products/adm2px4.php
taggart: I presume that you mean SATA when you talk about the ones that aren’t NVMe, as the SATA SSDs use UHCI and should be able to boot with any BIOS. As for booting Linux with older BIOS it’s not a problem as there are lots of other ways to boot. You could have a USB stick as /boot, server class motherboards typically have an internal USB socket on the motherboard for this purpose. You could have a boot loader on a CD-ROM that chain loads to the NVMe device.
Making a NVMe device Windows C: is apparently a difficult task at this time. But there are people who report success.
Good call on btrfs raid5, if something of an understatement.
“Btrfs RAID 5/6 Code Found To Be Very Unsafe & Will Likely Require A Rewrite”
https://www.phoronix.com/scan.php?page=news_item&px=Btrfs-RAID-56-Is-Bad
etbe: no, I mean PCIe using ACHI, see https://en.wikipedia.org/wiki/M.2#Features
oops typo, AHCI