Archives

Categories

Isolating PHP Web Sites

If you have multiple PHP web sites on a server in a default configuration they will all be able to read each other’s files in a default configuration. If you have multiple PHP web sites that have stored data or passwords for databases in configuration files then there are significant problems if they aren’t all trusted. Even if the sites are all trusted (IE the same person configures them all) if there is a security problem in one site it’s ideal to prevent that being used to immediately attack all sites.

mpm_itk

The first thing I tried was mpm_itk [1]. This is a version of the traditional “prefork” module for Apache that has one process for each HTTP connection. When it’s installed you just put the directive “AssignUserID USER GROUP” in your VirtualHost section and that virtual host runs as the user:group in question. It will work with any Apache module that works with mpm_prefork. In my experiment with mpm_itk I first tried running with a different UID for each site, but that conflicted with the pagespeed module [2]. The pagespeed module optimises HTML and CSS files to improve performance and it has a directory tree where it stores cached versions of some of the files. It doesn’t like working with copies of itself under different UIDs writing to that tree. This isn’t a real problem, setting up the different PHP files with database passwords to be read by the desired group is easy enough. So I just ran each site with a different GID but used the same UID for all of them.

The first problem with mpm_itk is that the mpm_prefork code that it’s based on is the slowest mpm that is available and which is also incompatible with HTTP/2. A minor issue of mpm_itk is that it makes Apache take ages to stop or restart, I don’t know why and can’t be certain it’s not a configuration error on my part. As an aside here is a site for testing your server’s support for HTTP/2 [3]. To enable HTTP/2 you have to be running mpm_event and enable the “http2” module. Then for every virtual host that is to support it (generally all https virtual hosts) put the line “Protocols h2 h2c http/1.1” in the virtual host configuration.

A good feature of mpm_itk is that it has everything for the site running under the same UID, all Apache modules and Apache itself. So there’s no issue of one thing getting access to a file and another not getting access.

After a trial I decided not to keep using mpm_itk because I want HTTP/2 support.

php-fpm Pools

The Apache PHP module depends on mpm_prefork so it also has the issues of not working with HTTP/2 and of causing the web server to be slow. The solution is php-fpm, a separate server for running PHP code that uses the fastcgi protocol to talk to Apache. Here’s a link to the upstream documentation for php-fpm [4]. In Debian this is in the php7.3-fpm package.

In Debian the directory /etc/php/7.3/fpm/pool.d has the configuration for “pools”. Below is an example of a configuration file for a pool:

# cat /etc/php/7.3/fpm/pool.d/example.com.conf
[example.com]
user = example.com
group = example.com
listen = /run/php/php7.3-example.com.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

Here is the upstream documentation for fpm configuration [5].

Then for the Apache configuration for the site in question you could have something like the following:

ProxyPassMatch "^/(.*\.php(/.*)?)$" "unix:/run/php/php7.3-example.com.sock|fcgi://localhost/usr/share/wordpress/"

The “|fcgi://localhost” part is just part of the way of specifying a Unix domain socket. From the Apache Wiki it appears that the method for configuring the TCP connections is more obvious [6]. I chose Unix domain sockets because it allows putting the domain name in the socket address. Matching domains for the web server to port numbers is something that’s likely to be error prone while matching based on domain names is easier to check and also easier to put in Apache configuration macros.

There was some additional hassle with getting Apache to read the files created by PHP processes (the options include running PHP scripts with the www-data group, having SETGID directories for storing files, and having world-readable files). But this got things basically working.

Nginx

My Google searches for running multiple PHP sites under different UIDs didn’t turn up any good hits. It was only after I found the DigitalOcean page on doing this with Nginx [7] that I knew what to search for to find the way of doing it in Apache.

Links June 2020

Bruce Schneier wrote an informative post about Zoom security problems [1]. He recommends Jitsi which has a Debian package of their software and it’s free software.

Axel Beckert wrote an interesting post about keyboards with small numbers of keys, as few as 28 [2]. It’s not something I’d ever want to use, but interesting to read from a computer science and design perspective.

The Guardian has a disturbing article explaining why we might never get a good Covid19 vaccine [3]. If that happens it will change our society for years if not decades to come.

Matt Palmer wrote an informative blog post about private key redaction [4]. I learned a lot from that. Probably the simplest summary is that you should never publish sensitive data unless you are certain that all that you are publishing is suitable, if you don’t understand it then you don’t know if it’s suitable to be published!

This article by Umair Haque on eand.co has some interesting points about how Freedom is interpreted in the US [5].

This article by Umair Haque on eand.co has some good points about how messed up the US is economically [6]. I think that his analysis is seriously let down by omitting the savings that could be made by amending the US healthcare system without serious changes (EG by controlling drug prices) and by reducing the scale of the US military (there will never be another war like WW2 because any large scale war will be nuclear). If the US government could significantly cut spending in a couple of major areas they could then put the money towards fixing some of the structural problems and bootstrapping a first-world economic system.

The American Conservatrive has an insightful article “Seven Reasons Police Brutality is Systemic Not Anecdotal [7].

Scientific American has an informative article about how genetic engineering could be used to make a Covid-19 vaccine [8].

Rike wrote an insightful post about How Language Changes Our Concepts [9]. They cover the differences between the French, German, and English languages based on gender and on how the language limits thoughts. Then conclude with the need to remove terms like master/slave and blacklist/whitelist from our software, with a focus on Debian but it’s applicable to all software.

Gunnar Wolf also wrote an insightful post On Masters and Slaves, Whitelists and Blacklists [10], they started with why some people might not understand the importance of the issue and then explained some ways of addressing it. The list of suggested terms includes Primary-secondary, Leader-follower, and some other terms which have slightly different meanings and allow more precision in describing the computer science concepts used. We can be more precise when describing computer science while also not using terms that marginalise some groups of people, it’s a win-win!

Both Rike and Gunnar were responding to a LWN article about the plans to move away from Master/Slave and Blacklist/Whitelist in the Linux kernel [11]. One of the noteworthy points in the LWN article is that there are about 70,000 instances of words that need to be changed in the Linux kernel so this isn’t going to happen immediately. But it will happen eventually which is a good thing.

How Will the Pandemic Change Things?

The Bulwark has an interesting article on why they can’t “Reopen America” [1]. I wonder how many changes will be long term. According to the Wikipedia List of Epidemics [2] Covid-19 so far hasn’t had a high death toll when compared to other pandemics of the last 100 years. People’s reactions to this vary from doing nothing to significant isolation, the question is what changes in attitudes will be significant enough to change society.

Transport

One thing that has been happening recently is a transition in transport. It’s obvious that we need to reduce CO2 and while electric cars will address the transport part of the problem in the long term changing to electric public transport is the cheaper and faster way to do it in the short term. Before Covid-19 the peak hour public transport in my city was ridiculously overcrowded, having people unable to board trams due to overcrowding was really common. If the economy returns to it’s previous state then I predict less people on public transport, more traffic jams, and many more cars idling and polluting the atmosphere.

Can we have mass public transport that doesn’t give a significant disease risk? Maybe if we had significantly more trains and trams and better ventilation with more airflow designed to suck contaminated air out. But that would require significant engineering work to design new trams, trains, and buses as well as expense in refitting or replacing old ones.

Uber and similar companies have been taking over from taxi companies, one major feature of those companies is that the vehicles are not dedicated as taxis. Dedicated taxis could easily be designed to reduce the spread of disease, the famed Black Cab AKA Hackney Carriage [3] design in the UK has a separate compartment for passengers with little air flow to/from the driver compartment. It would be easy to design such taxis to have entirely separate airflow and if setup to only take EFTPOS and credit card payment could avoid all contact between the driver and passengers. I would prefer to have a Hackney Carriage design of vehicle instead of a regular taxi or Uber.

Autonomous cars have been shown to basically work. There are some concerns about safety issues as there are currently corner cases that car computers don’t handle as well as people, but of course there are also things computers do better than people. Having an autonomous taxi would be a benefit for anyone who wants to avoid other people. Maybe approval could be rushed through for autonomous cars that are limited to 40Km/h (the maximum collision speed at which a pedestrian is unlikely to die), in central city areas and inner suburbs you aren’t likely to drive much faster than that anyway.

Car share services have been becoming popular, for many people they are significantly cheaper than owning a car due to the costs of regular maintenance, insurance, and depreciation. As the full costs of car ownership aren’t obvious people may focus on the disease risk and keep buying cars.

Passenger jets are ridiculously cheap. But this relies on the airline companies being able to consistently fill the planes. If they were to add measures to reduce cross contamination between passengers which slightly reduces the capacity of planes then they need to increase ticket prices accordingly which then reduces demand. If passengers are just scared of flying in close proximity and they can’t fill planes then they will have to increase prices which again reduces demand and could lead to a death spiral. If in the long term there aren’t enough passengers to sustain the current number of planes in service then airline companies will have significant financial problems, planes are expensive assets that are expected to last for a long time, if they can’t use them all and can’t sell them then airline companies will go bankrupt.

It’s not reasonable to expect that the same number of people will be travelling internationally for years (if ever). Due to relying on economies of scale to provide low prices I don’t think it’s possible to keep prices the same no matter what they do. A new economic balance of flights costing 2-3 times more than we are used to while having significantly less passengers seems likely. Governments need to spend significant amounts of money to improve trains to take over from flights that are cancelled or too expensive.

Entertainment

The article on The Bulwark mentions Las Vegas as a city that will be hurt a lot by reductions in travel and crowds, the same thing will happen to tourist regions all around the world. Australia has a significant tourist industry that will be hurt a lot. But the mention of Las Vegas makes me wonder what will happen to the gambling in general. Will people avoid casinos and play poker with friends and relatives at home? It seems that small stakes poker games among friends will be much less socially damaging than casinos, will this be good for society?

The article also mentions cinemas which have been on the way out since the video rental stores all closed down. There’s lots of prime real estate used for cinemas and little potential for them to make enough money to cover the rent. Should we just assume that most uses of cinemas will be replaced by Netflix and other streaming services? What about teenage dates, will kissing in the back rows of cinemas be replaced by “Netflix and chill”? What will happen to all the prime real estate used by cinemas?

Professional sporting matches have been played for a TV-only audience during the pandemic. There’s no reason that they couldn’t make a return to live stadium audiences when there is a vaccine for the disease or the disease has been extinguished by social distancing. But I wonder if some fans will start to appreciate the merits of small groups watching large TVs and not want to go back to stadiums, can this change the typical behaviour of groups?

Restaurants and cafes are going to do really badly. I previously wrote about my experience running an Internet Cafe and why reopening businesses soon is a bad idea [4]. The question is how long this will go for and whether social norms about personal space will change things. If in the long term people expect 25% more space in a cafe or restaurant that’s enough to make a significant impact on profitability for many small businesses.

When I was young the standard thing was for people to have dinner at friends homes. Meeting friends for dinner at a restaurant was uncommon. Recently it seemed to be the most common practice for people to meet friends at a restaurant. There are real benefits to meeting at a restaurant in terms of effort and location. Maybe meeting friends at their home for a delivered dinner will become a common compromise, avoiding the effort of cooking while avoiding the extra expense and disease risk of eating out. Food delivery services will do well in the long term, it’s one of the few industry segments which might do better after the pandemic than before.

Work

Many companies are discovering the benefits of teleworking, getting it going effectively has required investing in faster Internet connections and hardware for employees. When we have a vaccine the equipment needed for teleworking will still be there and we will have a discussion about whether it should be used on a more routine basis. When employees spend more than 2 hours per day travelling to and from work (which is very common for people who work in major cities) that will obviously limit the amount of time per day that they can spend working. For the more enthusiastic permanent employees there seems to be a benefit to the employer to allow working from home. It’s obvious that some portion of the companies that were forced to try teleworking will find it effective enough to continue in some degree.

One company that I work for has quit their coworking space in part because they were concerned that the coworking company might go bankrupt due to the pandemic. They seem to have become a 100% work from home company for the office part of the work (only on site installation and stock management is done at corporate locations). Companies running coworking spaces and other shared offices will suffer first as their clients have short term leases. But all companies renting out office space in major cities will suffer due to teleworking. I wonder how this will affect the companies providing services to the office workers, the cafes and restaurants etc. Will there end up being so much unused space in central city areas that it’s not worth converting the city cinemas into useful space?

There’s been a lot of news about Zoom and similar technologies. Lots of other companies are trying to get into that business. One thing that isn’t getting much notice is remote access technologies for desktop support. If the IT people can’t visit your desk because you are working from home then they need to be able to remotely access it to fix things. When people make working from home a large part of their work time the issue of who owns peripherals and how they are tracked will get interesting. In a previous blog post I suggested that keyboards and mice not be treated as assets [5]. But what about monitors, 4G/Wifi access points, etc?

Some people have suggested that there will be business sectors benefiting from the pandemic, such as telecoms and e-commerce. If you have a bunch of people forced to stay home who aren’t broke (IE a large portion of the middle class in Australia) they will probably order delivery of stuff for entertainment. But in the long term e-commerce seems unlikely to change much, people will spend less due to economic uncertainty so while they may shift some purchasing to e-commerce apart from home delivery of groceries e-commerce probably won’t go up overall. Generally telecoms won’t gain anything from teleworking, the Internet access you need for good Netflix viewing is generally greater than that needed for good video-conferencing.

Money

I previously wrote about a Basic Income for Australia [6]. One of the most cited reasons for a Basic Income is to deal with robots replacing people. Now we are at the start of what could be a long term economic contraction caused by the pandemic which could reduce the scale of the economy by a similar degree while also improving the economic case for a robotic workforce. We should implement a Universal Basic Income now.

I previously wrote about the make-work jobs and how we could optimise society to achieve the worthwhile things with less work [7]. My ideas about optimising public transport and using more car share services may not work so well after the pandemic, but the rest should work well.

Business

There are a number of big companies that are not aiming for profitability in the short term. WeWork and Uber are well documented examples. Some of those companies will hopefully go bankrupt and make room for more responsible companies.

The co-working thing was always a precarious business. The companies renting out office space usually did so on a monthly basis as flexibility was one of their selling points, but they presumably rented buildings on an annual basis. As the profit margins weren’t particularly high having to pay rent on mostly empty buildings for a few months will hurt them badly. The long term trend in co-working spaces might be some sort of collaborative arrangement between the people who run them and the landlords similar to the way some of the hotel chains have profit sharing agreements with land owners to avoid both the capital outlay for buying land and the risk involved in renting. Also city hotels are very well equipped to run office space, they have the staff and the procedures for running such a business, most hotels also make significant profits from conventions and conferences.

The way the economy has been working in first world countries has been about being as competitive as possible. Just in time delivery to avoid using storage space and machines to package things in exactly the way that customers need and no more machines than needed for regular capacity. This means that there’s no spare capacity when things go wrong. A few years ago a company making bolts for the car industry went bankrupt because the car companies forced the prices down, then car manufacture stopped due to lack of bolts – this could have been a wake up call but was ignored. Now we have had problems with toilet paper shortages due to it being packaged in wholesale quantities for offices and schools not retail quantities for home use. Food was destroyed because it was created for restaurant packaging and couldn’t be packaged for home use in a reasonable amount of time.

Farmer’s markets alleviate some of the problems with packaging food etc. But they aren’t a good option when there’s a pandemic as disease risk makes them less appealing to customers and therefore less profitable for vendors.

Religion

Many religious groups have supported social distancing. Could this be the start of more decentralised religion? Maybe have people read the holy book of their religion and pray at home instead of being programmed at church? We can always hope.

Squirrelmail vs Roundcube

For some years I’ve had SquirrelMail running on one of my servers for the people who like such things. It seems that the upstream support for SquirrelMail has ended (according to the SquirrelMail Wikipedia page there will be no new releases just Subversion updates to fix bugs). One problem with SquirrelMail that seems unlikely to get fixed is the lack of support for base64 encoded From and Subject fields which are becoming increasingly popular nowadays as people who’s names don’t fit US-ASCII are encoding them in their preferred manner.

I’ve recently installed Roundcube to provide an alternative. Of course one of the few important users of webmail didn’t like it (apparently it doesn’t display well on a recent Samsung Galaxy Note), so now I have to support two webmail systems.

Below is a little Perl script to convert a SquirrelMail abook file into the csv format used for importing a RoundCube contact list.

#!/usr/bin/perl

print "First Name,Last Name,Display Name,E-mail Address\n";
while(<STDIN>)
{
  chomp;
  my @fields = split(/\|/, $_);
  printf("%s,%s,%s %s,%s\n", $fields[1], $fields[2], $fields[0], $fields[4], $fields[3]);
}

Storage Trends

In considering storage trends for the consumer side I’m looking at the current prices from MSY (where I usually buy computer parts). I know that other stores will have slightly different prices but they should be very similar as they all have low margins and wholesale prices are the main factor.

Small Hard Drives Aren’t Viable

The cheapest hard drive that MSY sells is $68 for 500G of storage. The cheapest SSD is $49 for 120G and the second cheapest is $59 for 240G. SSD is cheaper at the low end and significantly faster. If someone needed about 500G of storage there’s a 480G SSD for $97 which costs $29 more than a hard drive. With a modern PC if you have no hard drives you will notice that it’s quieter. For anyone who’s buying a new PC spending an extra $29 is definitely worthwhile for the performance, low power use, and silence.

The cheapest 1TB disk is $69 and the cheapest 1TB SSD is $159. Saving $90 on the cost of a new PC probably isn’t worth while.

For 2TB of storage the cheapest options are Samsung NVMe for $339, Crucial SSD for $335, or a hard drive for $95. Some people would choose to save $244 by getting a hard drive instead of NVMe, but if you are getting a whole system then allocating $244 to NVMe instead of a faster CPU would probably give more benefits overall.

Computer stores typically have small margins and computer parts tend to quickly either become cheaper or be obsoleted by better parts. So stores don’t want to stock parts unless they will sell quickly. Disks smaller than 2TB probably aren’t going to be profitable for stores for very long. The trend of SSD and NVMe becoming cheaper is going to make 2TB disks non-viable in the near future.

NVMe vs SSD

M.2 NVMe devices are at comparable prices to SATA SSDs. For some combinations of quality and capacity NVMe is about 50% more expensive and for some it’s slightly cheaper (EG Intel 1TB NVMe being cheaper than Samsung EVO 1TB SSD). Last time I checked about half the motherboards on sale had a single M.2 socket so for a new workstation that doesn’t need more than 2TB of storage (the largest NVMe that MSY sells) it wouldn’t make sense to use anything other than NVMe.

The benefit of NVMe is NOT throughput (even though NVMe devices can often sustain over 4GB/s), it’s low latency. Workstations can’t properly take advantage of this because RAM is so cheap ($198 for 32G of DDR4) that compiles etc mostly come from cache and because most filesystem writes on workstations aren’t synchronous. For servers a large portion of writes are synchronous, for example a mail server can’t acknowledge receiving mail until it knows that it’s really on disk, so there’s a lot of small writes that block server processes and the low latency of NVMe really improves performance. If you are doing a big compile on a workstation (the most common workstation task that uses a lot of disk IO) then the writes aren’t synchronised to disk and if the system crashes you will just do all the compilation again. While NVMe doesn’t give a lot of benefit over SSD for workstation use (I’ve uses laptops with SSD and NVMe and not noticed a great difference) of course I still want better performance. ;)

Last time I checked I couldn’t easily buy a PCIe card that supported 2*NVMe cards, I’m sure they are available somewhere but it would take longer to get and probably cost significantly more than twice as much. That means a RAID-1 of NVMe takes 2 PCIe slots if you don’t have an M.2 socket on the motherboard. This was OK when I installed 2*NVMe devices on a server that had 18 disks and lots of spare PCIe slots. But for some systems PCIe slots are an issue.

My home server has all PCIe slots used by a video card and Ethernet cards and the BIOS probably won’t support booting from NVMe. It’s a Dell server so I can’t just replace the motherboard with one that has more PCIe slots and M.2 on the motherboard. As it’s running nicely and doesn’t need replacing any time soon I won’t be using NVMe for home server stuff.

Small Servers

Most servers that I am responsible for have less than 2TB of storage. For my clients I now only recommend SSD storage for small servers and am recommending SSD for replacing any failed disks.

My home server has 2*500G SSDs in a BTRFS RAID-1 for the root filesystem, and 3*4TB disks in a BTRFS RAID-1 for storing big files. I bought the SSDs when 500G SSDs were about $250 each and bought 2*4TB disks when they were about $350 each. Currently that server has about 3.3TB of space used and I could probably get it down to about 2.5TB if I deleted things I don’t really need. If I was getting storage for that server now I’d use 2*2TB SSDs and 3*1TB hard drives for the stuff that doesn’t fit on SSDs (I have some spare 1TB disks that came with servers). If I didn’t have spare hard drives I’d get 3*2TB SSDs for that sort of server which would give 3TB of BTRFS RAID-1 storage.

Last time I checked Dell servers had a card for supporting M.2 as an optional extra so Dells probably won’t boot from NVMe without extra expense.

Ars Technica has an informative article about WD selling SMR disks as “NAS” disks [1]. The Shingled Magnetic Recording technology allows greater storage density on a platter which leads to either larger capacity or cheaper disks but at the cost of lower write performance and apparently extremely bad latency in some situations. NAS disks are supposed to be low latency as the expectation is that they will be used in a RAID array and kicked out of the array if they have problems. There are reports of ZFS kicking SMR disks from RAID sets. I think this will end the use of hard drives for small servers. For a server you don’t want to deal with this sort of thing, by definition when a server goes down multiple people will stop work (small server implies no clustering). Spending extra to get SSDs just to avoid the risk of unexpected SMR would be a good plan.

Medium Servers

The largest SSD and NVMe devices that are readily available are 2TB but 10TB disks are commodity items, there are reports of 20TB hard drives being available but I can’t find anyone in Australia selling them.

If you need to store dozens or hundreds of terabytes than hard drives have to be part of the mix at this time. There’s no technical reason why SSDs larger than 10TB can’t be made (the 2.5″ SATA form factor has more than 5* the volume of a 2TB M.2 card) and it’s likely that someone sells them outside the channels I buy from, but probably at a price higher than what my clients are willing to pay. If you want 100TB of affordable storage then a mid range server like the Dell PowerEdge T640 which can have up to 18*3.5″ disks is good. One of my clients has a PowerEdge T630 with 18*3.5″ disks in the 8TB-10TB range (we replace failed disks with the largest new commodity disks available, it used to have 6TB disks). ZFS version 0.8 introduced a “Special VDEV Class” which stores metadata and possibly small data blocks on faster media. So you could have some RAID-Z groups on hard drives for large storage and the metadata on a RAID-1 on NVMe for fast performance. For medium size arrays on hard drives having a “find /” operation take hours is not uncommon, for large arrays having it take days isn’t that uncommon. So far it seems that ZFS is the only filesystem to have taken the obvious step of storing metadata on SSD/NVMe while bulk data is on cheap large disks.

One problem with large arrays is that the vibration of disks can affect the performance and reliability of nearby disks. The ZFS server I run with 18 disks was originally setup with disks from smaller servers that never had ZFS checksum errors, but when disks from 2 small servers were put in one medium size server they started getting checksum errors presumably due to vibration. This alone is a sufficient reason for paying a premium for SSD storage.

Currently the cost of 2TB of SSD or NVMe is between the prices of 6TB and 8TB hard drives, and the ratio of price/capacity for SSD and NVMe is improving dramatically while the increase in hard drive capacity is slow. 4TB SSDs are available for $895 compared to a 10TB hard drive for $549, so it’s 4* more expensive on a price per TB. This is probably good for Windows systems, but for Linux systems where ZFS and “special VDEVs” is an option it’s probably not worth considering. Most Linux user cases where 4TB SSDs would work well would be better served by smaller NVMe and 10TB disks running ZFS. I don’t think that 4TB SSDs are at all popular at the moment (MSY doesn’t stock them), but prices will come down and they will become common soon enough. Probably by the end of the year SSDs will halve in price and no hard drives less than 4TB will be viable.

For rack mounted servers 2.5″ disks have been popular for a long time. It’s common for vendors to offer 2 versions of a rack mount server for 2.5″ and 3.5″ disks where the 2.5″ version takes twice as many disks. If the issue is total storage in a server 4TB SSDs can give the same capacity as 8TB HDDs.

SMR vs Regular Hard Drives

Rumour has it that you can buy 20TB SMR disks, I haven’t been able to find a reference to anyone who’s selling them in Australia (please comment if you know who sells them and especially if you know the price). I expect that the ZFS developers will soon develop a work-around to solve the problems with SMR disks. Then arrays of 20TB SMR disks with NVMe for “special VDEVs” will be an interesting possibility for storage. I expect that SMR disks will be the majority of the hard drive market by 2023 – if hard drives are still on the market. SSDs will be large enough and cheap enough that only SMR disks will offer enough capacity to be worth using.

I think that it is a possibility that hard drives won’t be manufactured in a few years. The volume of a 3.5″ disk is significantly greater than that of 10 M.2 devices so current technology obviously allows 20TB of NVMe or SSD storage in the space of a 3.5″ disk. If the price of 16TB NVMe and SSD devices comes down enough (to perhaps 3* the price of a 20TB hard drive) almost no-one would want the hard drive and it wouldn’t be viable to manufacture them.

It’s not impossible that in a few years time 3D XPoint and similar fast NVM technologies occupy the first level of storage (the ZFS “special VDEV”, OS swap device, log device for database servers, etc) and NVMe occupies the level for bulk storage with no space left in the market for spinning media.

Computer Cases

For servers I expect that models supporting 3.5″ storage devices will disappear. A 1RU server with 8*2.5″ storage devices or a 2RU server with 16*2.5″ storage devices will probably be of use to more people than a 1RU server with 4*3.5″ or a 2RU server with 8*3.5″.

My first IBM PC compatible system had a 5.25″ hard drive, a 5.25″ floppy drive, and a 3.5″ floppy drive in 1988. My current PC is almost a similar size and has a DVD drive (that I almost never use) 5 other 5.25″ drive bays that have never been used, and 5*3.5″ drive bays that I have never used (I have only used 2.5″ SSDs). It would make more sense to have PC cases designed around 2.5″ and maybe 3.5″ drives with no more than one 5.25″ drive bay.

The Intel NUC SFF PCs are going in the right direction. Many of them only have a single storage device but some of them have 2*M.2 sockets allowing RAID-1 of NVMe and some of them support ECC RAM so they could be used as small servers.

A USB DVD drive costs $36, it doesn’t make sense to have every PC designed around the size of an internal DVD drive that will probably only be used to install the OS when a $36 USB DVD drive can be used for every PC you own.

The only reason I don’t have a NUC for my personal workstation is that I get my workstations from e-waste. If I was going to pay for a PC then a NUC is the sort of thing I’d pay to have on my desk.

Comparing Compression

I just did a quick test of different compression options in Debian. The source file is a 1.1G MySQL dump file. The time is user CPU time on a i7-930 running under KVM, the compression programs may have different levels of optimisation for other CPU families.

Facebook people designed the zstd compression system (here’s a page giving an overview of it [1]). It has some interesting new features that can provide real differences at scale (like unusually large windows and pre-defined dictionaries), but I just tested the default mode and the -9 option for more compression. For the SQL file “zstd -9” provides significantly better compression than gzip while taking only slightly less CPU time than “gzip -9” while zstd with the default option (equivalent to “zstd -3”) gives much faster compression than “gzip -9” while also being slightly smaller. For this use case bzip2 is too slow for inline compression of a MySQL dump as the dump process locks tables and can hang clients. The lzma and xz compression algorithms provide significant benefits in size but the time taken is grossly disproportionate.

In a quick check of my collection of files compressed with gzip I was only able to fine 1 fild that got less compression with zstd with default options, and that file got better compression with “zstd -9”. So zstd seems to beat gzip everywhere by every measure.

The bzip2 compression seems to be obsolete, “zstd -9” is much faster and has slightly smaller output.

Both xz and lzma seem to offer a combination of compression and time taken that zstd can’t beat (for this file type at least). The ultra compression mode 22 gives 2% smaller output files but almost 28 minutes of CPU time for compression is a bit ridiculous. There is a threaded mode for zstd that could potentially allow a shorter wall clock time for “zstd --ultra -22” than lzma/xz while also giving better compression.

Compression Time Size
zstd 5.2s 130m
zstd -9 28.4s 114m
gzip -9 33.4s 141m
bzip2 -9 3m51 119m
lzma 6m20 97m
xz 6m36 97m
zstd -19 9m57 99m
zstd --ultra -22 27m46 95m

Conclusion

For distributions like Debian which have large archives of files that are compressed once and transferred a lot the “zstd --ultra -22” compression might be useful with multi-threaded compression. But given that Debian already has xz in use it might not be worth changing until faster CPUs with lots of cores become more commonly available. One could argue that for Debian it doesn’t make sense to change from xz as hard drives seem to be getting larger capacity (and also smaller physical size) faster than the Debian archive is growing. One possible reason for adopting zstd in a distribution like Debian is that there are more tuning options for things like memory use. It would be possible to have packages for an architecture like ARM that tends to have less RAM compressed in a way that decreases memory use on decompression.

For general compression such as compressing log files and making backups it seems that zstd is the clear winner. Even bzip2 is far too slow and in my tests zstd clearly beats gzip for every combination of compression and time taken. There may be some corner cases where gzip can compete on compression time due to CPU features, optimisation for CPUs, etc but I expect that in almost all cases zstd will win for compression size and time. As an aside I once noticed the 32bit of gzip compressing faster than the 64bit version on an Opteron system, the 32bit version had assembly optimisation and the 64bit version didn’t at that time.

To create a tar archive you can run “tar czf” or “tar cJf” to create an archive with gzip or xz compression. To create an archive with zstd compression you have to use “tar --zstd -cf”, that’s 7 extra characters to type. It’s likely that for most casual archive creation (EG for copying files around on a LAN or USB stick) saving 7 characters of typing is more of a benefit than saving a small amount of CPU time and storage space. It would be really good if tar got a single character option for zstd compression.

The external dictionary support in zstd would work really well with rsync for backups. Currently rsync only supports zlib, adding zstd support would be a good project for someone (unfortunately I don’t have enough spare time).

Now I will change my database backup scripts to use zstd.

Update:

The command “tar acvf a.zst filenames” will create a zstd compressed tar archive, the “a” option to GNU tar makes it autodetect the compression type from the file name. Thanks Enrico!

Cruises and Covid19

Problems With Cruises

GQ has an insightful and detailed article about Covid19 and the Diamond Princess [1], I recommend reading it.

FastCompany has a brief article about bookings for cruises in August [2]. There have been many negative comments about this online.

The first thing to note is that the cancellation policies on those cruises are more lenient than usual and the prices are lower. So it’s not unreasonable for someone to put down a deposit on a half price holiday in the hope that Covid19 goes away (as so many prominent people have been saying it will) in the knowledge that they will get it refunded if things don’t work out. Of course if the cruise line goes bankrupt then no-one will get a refund, but I think people are expecting that won’t happen.

The GQ article highlights some serious problems with the way cruise ships operate. They have staff crammed in to small cabins and the working areas allow transmission of disease. These problems can be alleviated, they could allocate more space to staff quarters and have more capable air conditioning systems to put in more fresh air. During the life of a cruise ship significant changes are often made, replacing engines with newer more efficient models, changing the size of various rooms for entertainment, installing new waterslides, and many other changes are routinely made. Changing the staff only areas to have better ventilation and more separate space (maybe capsule-hotel style cabins with fresh air piped in) would not be a difficult change. It would take some money and some dry-dock time which would be a significant expense for cruise companies.

Cruises Are Great

People like social environments, they want to have situations where there are as many people as possible without it becoming impossible to move. Cruise ships are carefully designed for the flow of passengers. Both the layout of the ship and the schedule of events are carefully planned to avoid excessive crowds. In terms of meeting the requirement of having as many people as possible in a small area without being unable to move cruise ships are probably ideal.

Because there is a large number of people in a restricted space there are economies of scale on a cruise ship that aren’t available anywhere else. For example the main items on the menu are made in a production line process, this can only be done when you have hundreds of people sitting down to order at the same time.

The same applies to all forms of entertainment on board, they plan the events based on statistical knowledge of what people want to attend. This makes it more economical to run than land based entertainment where people can decide to go elsewhere. On a ship a certain portion of the passengers will see whatever show is presented each night, regardless of whether it’s singing, dancing, or magic.

One major advantage of cruises is that they are all inclusive. If you are on a regular holiday would you pay to see a singing or dancing show? Probably not, but if it’s included then you might as well do it – and it will be pretty good. This benefit is really appreciated by people taking kids on holidays, if kids do things like refuse to attend a performance that you were going to see or reject food once it’s served then it won’t cost any extra.

People Who Criticise Cruises

For the people who sneer at cruises, do you like going to bars? Do you like going to restaurants? Live music shows? Visiting foreign beaches? A cruise gets you all that and more for a discount price.

If Groupon had a deal that gave you a cheap hotel stay with all meals included, free non-alcoholic drinks at bars, day long entertainment for kids at the kids clubs, and two live performances every evening how many of the people who reject cruises would buy it? A typical cruise is just like a Groupon deal for non-stop entertainment from 8AM to 11PM.

Will Cruises Restart?

The entertainment options that cruises offer are greatly desired by many people. Most cruises are aimed at budget travellers, the price is cheaper than a hotel in a major city. Such cruises greatly depend on economies of scale, if they can’t get the ships filled then they would need to raise prices (thus decreasing demand) to try to make a profit. I think that some older cruise ships will be scrapped in the near future and some of the newer ships will be sold to cruise lines that cater to cheap travel (IE P&O may scrap some ships and some of the older Princess ships may be transferred to them). Overall I predict a decrease in the number of middle-class cruise ships.

For the expensive cruises (where the cheapest cabins cost over $1000US per person per night) I don’t expect any real changes, maybe they will have fewer passengers and higher prices to allow more social distancing or something.

I am certain that cruises will start again, but it’s too early to predict when. Going on a cruise is about as safe as going to a concert or a major sporting event. No-one is predicting that sporting stadiums will be closed forever or live concerts will be cancelled forever, so really no-one should expect that cruises will be cancelled forever. Whether companies that own ships or stadiums go bankrupt in the mean time is yet to be determined.

One thing that’s been happening for years is themed cruises. A group can book out an entire ship or part of a ship for a themed cruise. I expect this to become much more popular when cruises start again as it will make it easier to fill ships. In the past it seems that cruise lines let companies book their ships for events but didn’t take much of an active role in the process. I think that the management of cruise lines will look to aggressively market themed cruises to anyone who might help, for starters they could reach out to every 80s and 90s pop group – those fans are all old enough to be interested in themed cruises and the musicians won’t be asking for too much money.

Conclusion

Humans are social creatures. People want to attend events with many other people. Covid 19 won’t be the last pandemic, and it may not even be eradicated in the near future. The possibility of having a society where no-one leaves home unless they are in a hazmat suit has been explored in science fiction, but I don’t think that’s a plausible scenario for the near future and I don’t think that it’s something that will be caused by Covid 19.

A Good Time to Upgrade PCs

PC hardware just keeps getting cheaper and faster. Now that so many people have been working from home the deficiencies of home PCs are becoming apparent. I’ll give Australian prices and URLs in this post, but I think that similar prices will be available everywhere that people read my blog.

From MSY (parts list PDF ) [1] 120G SATA SSDs are under $50 each. 120G is more than enough for a basic workstation, so you are looking at $42 or so for fast quiet storage or $84 or so for the same with RAID-1. Being quiet is a significant luxury feature and it’s also useful if you are going to be in video conferences.

For more serious storage NVMe starts at around $100 per unit, I think that $124 for a 500G Crucial NVMe is the best low end option (paying $95 for a 250G Kingston device doesn’t seem like enough savings to be worth it). So that’s $248 for 500G of very fast RAID-1 storage. There’s a Samsung 2TB NVMe device for $349 which is good if you need more storage, it’s interesting to note that this is significantly cheaper than the Samsung 2TB SSD which costs $455. I wonder if SATA SSD devices will go away in the future, it might end up being SATA for slow/cheap spinning media and M.2 NVMe for solid state storage. The SATA SSD devices are only good for use in older systems that don’t have M.2 sockets on the motherboard.

It seems that most new motherboards have one M.2 socket on the motherboard with NVMe support, and presumably support for booting from NVMe. But dual M.2 sockets is rare and the price difference is significantly greater than the cost of a PCIe M.2 card to support NVMe which is $14. So for NVMe RAID-1 it seems that the best option is a motherboard with a single NVMe socket (starting at $89 for a AM4 socket motherboard – the current standard for AMD CPUs) and a PCIe M.2 card.

One thing to note about NVMe is that different drivers are required. On Linux this means means building a new initrd before the migration (or afterwards when booted from a recovery image) and on Windows probably means a fresh install from special installation media with NVMe drivers.

All the AM4 motherboards seem to have RADEON Vega graphics built in which is capable of 4K resolution at a stated refresh of around 24Hz. The ones that give detail about the interfaces say that they have HDMI 1.4 which means a maximum of 30Hz at 4K resolution if you have the color encoding that suits text (IE for use other than just video). I covered this issue in detail in my blog post about DisplayPort and 4K resolution [2]. So a basic AM4 motherboard won’t give great 4K display support, but it will probably be good for a cheap start.

$89 for motherboard, $124 for 500G NVMe, $344 for a Ryzen 5 3600 CPU (not the cheapest AM4 but in the middle range and good value for money), and $99 for 16G of RAM (DDR4 RAM is cheaper than DDR3 RAM) gives the core of a very decent system for $656 (assuming you have a working system to upgrade and peripherals to go with it).

Currently Kogan has 4K resolution monitors starting at $329 [3]. They probably won’t be the greatest monitors but my experience of a past cheap 4K monitor from Kogan was that it is quite OK. Samsung 4K monitors started at about $400 last time I could check (Kogan currently has no stock of them and doesn’t display the price), I’d pay an extra $70 for Samsung, but the Kogan branded product is probably good enough for most people. So you are looking at under $1000 for a new system with fast CPU, DDR4 RAM, NVMe storage, and a 4K monitor if you already have the case, PSU, keyboard, mouse, etc.

It seems quite likely that the 4K video hardware on a cheap AM4 motherboard won’t be that great for games and it will definitely be lacking for watching TV documentaries. Whether such deficiencies are worth spending money on a PCIe video card (starting at $50 for a low end card but costing significantly more for 3D gaming at 4K resolution) is a matter of opinion. I probably wouldn’t have spent extra for a PCIe video card if I had 4K video on the motherboard. Not only does using built in video save money it means one less fan running (less background noise) and probably less electricity use too.

My Plans

I currently have a workstation with 2*500G SATA SSDs in a RAID-1 array, 16G of RAM, and a i5-2500 CPU (just under 1/4 the speed of the Ryzen 5 3600). If I had hard drives then I would definitely buy a new system right now. But as I have SSDs that work nicely (quiet and fast enough for most things) and almost all machines I personally use have SSDs (so I can’t get a benefit from moving my current SSDs to another system) I would just get CPU, motherboard, and RAM. So the question is whether to spend $532 for more than 4* the CPU performance. At the moment I’ll wait because I’ll probably get a free system with DDR4 RAM in the near future, while it probably won’t be as fast as a Ryzen 5 3600, it should be at least twice as fast as what I currently have.

IT Asset Management

In my last full-time position I managed the asset tracking database for my employer. It was one of those things that “someone” needed to do, and it seemed that only way that “someone” wouldn’t equate to “no-one” was for me to do it – which was ok. We used Snipe IT [1] to track the assets. I don’t have enough experience with asset tracking to say that Snipe is better or worse than average, but it basically did the job. Asset serial numbers are stored, you can have asset types that allow you to just add one more of the particular item, purchase dates are stored which makes warranty tracking easier, and every asset is associated with a person or listed as available. While I can’t say that Snipe IT is better than other products I can say that it will do the job reasonably well.

One problem that I didn’t discover until way too late was the fact that the finance people weren’t tracking serial numbers and that some assets in the database had the same asset IDs as the finance department and some had different ones. The best advice I can give to anyone who gets involved with asset tracking is to immediately chat to finance about how they track things, you need to know if the same asset IDs are used and if serial numbers are tracked by finance. I was pleased to discover that my colleagues were all honourable people as there was no apparent evaporation of valuable assets even though there was little ability to discover who might have been the last person to use some of the assets.

One problem that I’ve seen at many places is treating small items like keyboards and mice as “assets”. I think that anything that is worth less than 1 hour’s pay at the minimum wage (the price of a typical PC keyboard or mouse) isn’t worth tracking, treat it as a disposable item. If you hire a programmer who requests an unusually expensive keyboard or mouse (as some do) it still won’t be a lot of money when compared to their salary. Some of the older keyboards and mice that companies have are nasty, months of people eating lunch over them leaves them greasy and sticky. I think that the best thing to do with the keyboards and mice is to give them away when people leave and when new people join the company buy new hardware for them. If a company can’t spend $25 on a new keyboard and mouse for each new employee then they either have a massive problem of staff turnover or a lack of priority on morale.

About Reopening Businesses

Currently there is political debate about when businesses should be reopened after the Covid19 quarantine.

Small Businesses

One argument for reopening things is for the benefit of small businesses. The first thing to note is that the protests in the US say “I need a haircut” not “I need to cut people’s hair”. Small businesses won’t benefit from reopening sooner.

For every business there is a certain minimum number of customers needed to be profitable. There are many comments from small business owners that want it to remain shutdown. When the government has declared a shutdown and paused rent payments and provided social security to employees who aren’t working the small business can avoid bankruptcy. If they suddenly have to pay salaries or make redundancy payouts and have to pay rent while they can’t make a profit due to customers staying home they will go bankrupt.

Many restaurants and cafes make little or no profit at most times of the week (I used to be 1/3 owner of an Internet cafe and know this well). For such a company to be viable you have to be open most of the time so customers can expect you to be open. Generally you don’t keep a cafe open at 3PM to make money at 3PM, you keep it open so people can rely on there being a cafe open there, someone who buys a can of soda at 3PM one day might come back for lunch at 1:30PM the next day because they know you are open. A large portion of the opening hours of a most retail companies can be considered as either advertising for trade at the profitable hours or as loss making times that you can’t close because you can’t send an employee home for an hour.

If you have seating for 28 people (as my cafe did) then for about half the opening hours you will probably have 2 or fewer customers in there at any time, for about a quarter the opening hours you probably won’t cover the salary of the one person on duty. The weekend is when you make the real money, especially Friday and Saturday nights when you sometimes get all the seats full and people coming in for takeaway coffee and snacks. On Friday and Saturday nights the 60 seat restaurant next door to my cafe used to tell customers that my cafe made better coffee. It wasn’t economical for them to have a table full for an hour while they sell a few cups of coffee, they wanted customers to leave after dessert and free the table for someone who wants a meal with wine (alcohol is the real profit for many restaurants).

The plans of reopening with social distancing means that a 28 seat cafe can only have 14 chairs or less (some plans have 25% capacity which would mean 7 people maximum). That means decreasing the revenue of the most profitable times by 50% to 75% while also not decreasing the operating costs much. A small cafe has 2-3 staff when it’s crowded so there’s no possibility of reducing staff by 75% when reducing the revenue by 75%.

My Internet cafe would have closed immediately if forced to operate in the proposed social distancing model. It would have been 1/4 of the trade and about 1/8 of the profit at the most profitable times, even if enough customers are prepared to visit – and social distancing would kill the atmosphere. Most small businesses are barely profitable anyway, most small businesses don’t last 4 years in normal economic circumstances.

This reopen movement is about cutting unemployment benefits not about helping small business owners. Destroying small businesses is also good for big corporations, kill the small cafes and restaurants and McDonald’s and Starbucks will win. I think this is part of the motivation behind the astroturf campaign for reopening businesses.

Forbes has an article about this [1].

Psychological Issues

Some people claim that we should reopen businesses to help people who have psychological problems from isolation, to help victims of domestic violence who are trapped at home, to stop older people being unemployed for the rest of their lives, etc.

Here is one article with advice for policy makers from domestic violence experts [2]. One thing it mentions is that the primary US federal government program to deal with family violence had a budget of $130M in 2013. The main thing that should be done about family violence is to make it a priority at all times (not just when it can be a reason for avoiding other issues) and allocate some serious budget to it. An agency that deals with problems that affect families and only has a budget of $1 per family per year isn’t going to be able to do much.

There are ongoing issues of people stuck at home for various reasons. We could work on better public transport to help people who can’t drive. We could work on better healthcare to help some of the people who can’t leave home due to health problems. We could have more budget for carers to help people who can’t leave home without assistance. Wanting to reopen restaurants because some people feel isolated is ignoring the fact that social isolation is a long term ongoing issue for many people, and that many of the people who are affected can’t even afford to eat at a restaurant!

Employment discrimination against people in the 50+ age range is an ongoing thing, many people in that age range know that if they lose their job and can’t immediately find another they will be unemployed for the rest of their lives. Reopening small businesses won’t help that, businesses running at low capacity will have to lay people off and it will probably be the older people. Also the unemployment system doesn’t deal well with part time work. The Australian system (which I think is similar to most systems in this regard) reduces the unemployment benefits by $0.50 for every dollar that is earned in part time work, that effectively puts people who are doing part time work because they can’t get a full-time job in the highest tax bracket! If someone is going to pay for transport to get to work, work a few hours, then get half the money they earned deducted from unemployment benefits it hardly makes it worthwhile to work. While the exact health impacts of Covid19 aren’t well known at this stage it seems very clear that older people are disproportionately affected, so forcing older people to go back to work before there is a vaccine isn’t going to help them.

When it comes to these discussions I think we should be very suspicious of people who raise issues they haven’t previously shown interest in. If the discussion of reopening businesses seems to be someone’s first interest in the issues of mental health, social security, etc then they probably aren’t that concerned about such issues.

I believe that we should have a Universal Basic Income [3]. I believe that we need to provide better mental health care and challenge the gender ideas that hurt men and cause men to hurt women [4]. I believe that we have significant ongoing problems with inequality not small short term issues [5]. I don’t think that any of these issues require specific changes to our approach to preventing the transmission of disease. I also think that we can address multiple issues at the same time, so it is possible for the government to devote more resources to addressing unemployment, family violence, etc while also dealing with a pandemic.