6

Redundancy in Network Infrastructure

It’s generally accepted that certain things need redundancy. RAID is generally regarded as essential for every server except for the corner case of compute clusters where a few nodes can go offline without affecting the results (EG the Google servers). Having redundant network cables with some sort of failover system between big switches is regarded as a good idea, and multiple links to the Internet is regarded as essential for every serious data-center and is gaining increasing acceptance in major corporate offices.

Determining whether you need redundancy for a particular part of the infrastructure is done on the basis of the cost of the redundant device (in terms of hardware and staff costs related to installing it), the cost of not having it available, and the extent to which the expected down-time will be reduced by having some redundancy.

It’s also regarded as a good idea to have more than one person with the knowledge of how to run the servers, jokes are often made about what might happen if a critical person “fell under a bus“, but more mundane things such as the desire to take an occasional holiday or a broken mobile phone can require a backup person.

One thing that doesn’t seem to get any attention is redundancy in the machine used for system administration. I’ve been using an EeePC [1] for supporting my clients, and it’s been working really well for me. Unfortunately I have misplaced the power supply. So I need to replace the machine (if only for the time taken to find the PSU). I have some old Toshiba Satellite laptops, they are quite light by laptop standards (but still heavier than the EeePC) and they only have 64M of RAM. But as a mobile SSH client they will do well. So my next task is to set up a Satellite as a backup machine for my network support work.

It seems that this problem is fairly widespread. I’ve worked in a few companies with reasonably large sysadmin teams. The best managed one had a support laptop that was assigned to the person who was on-call outside business hours. That laptop was not backed up (to the best of my knowledge, it was never connected to the corporate LAN so it seems that no-one had an opportunity to do so) and there was no second machine.

One thing I have been wondering is what happens to laptops with broken screens when the repair price exceeds the replacement cost. I wouldn’t mind buying an EeePC with a broken screen if it comes with a functional PSU, I could use it as a portable server.

10

Fixing the Correct Network Bottleneck

The latest news in the Australian IT industry is the new National Broadband Network (NBN) plan [1]. It will involve rolling out Fiber To The Home for 90% of the population, the plan is that it will cost the government $43,000,000,000 making it the biggest government project. Kevin Rudd used Twitter to say “Just announced biggest ever investment in Australian broadband – really exciting, infrastructure for the future” [2].

Now whenever someone says that a certain quantity of a resource is enough you can expect someone to try and refute that claim by mentioning that Bill Gates supposedly stated that “640K is enough” when referring to the RAM limits of the original IBM PC. As an aside, it’s generally believed that Bill Gates actually didn’t claim that 640K would be enough RAM, Wikiquote has him claiming to have never said any such thing [3]. He did however say that he had hoped that it would be enough for 10 years. I think that I needed that disclaimer before stating that I think that broadband speeds in Australia are high enough at the moment.

In any computer system you will have one or more resources that will be limited and will be bottlenecks that limit the overall performance.  Adding more of other resources will often make no difference to performance that a user might notice.

On the machine I’m using right now to browse the web the bottleneck is RAM.  A combination of bloated web pages and memory inefficient web browsers uses lots of memory, I have 1.5G of RAM and currently there is 1.3G of swap in use and performance suffers because of it.  It’s not uncommon for the machine to page enough that the mouse cursor is not responsive while browsing the web.

My options for getting faster net access on this machine are to add more RAM (it can’t take more than 2G – so that doesn’t gain much), to use more memory efficient web browsers and X server, and to simply buy a new machine. Dell is currently selling desktop machines with 2G of RAM, as they are 64bit systems and will therefore use more memory than 32bit systems for the same tasks they will probably give less performance than my 32bit machine with 1.5G of RAM for my usage patterns.

Also the latest EeePC [4] ships with 1G of RAM as standard and is limited to a maximum of 2G, I think that this is typical of Netbook class systems. I don’t use my EeePC for any serious work, but I know some people who do.

Does anyone have suggestions on memory efficient web browsers for Linux? I’m currently using Konqueror and Iceweasel (Firefox). Maybe the government could get a better return on their investment by spending a small amount of money sponsoring the development of free web browsers. A million dollars spent on optimising Firefox seems likely to provide good performance benefits for everyone.

My wife’s web browsing experience is bottlenecked by the speed of the video hardware in her machine (built-in video on a Dell PowerEdge T105 which is an ATI ES1000). The recent dramatic price reductions of large TFT monitors seem likely to make video performance more of an issue, and also increases the RAM used by the X server.

Someone who has reasonably good net access at the moment will have an ADSL2+ connection and a computer that is equivalent to a low-end new Dell machine (which is more powerful than the majority of systems in use). In that case the bottleneck will be in the PC used for web browsing if you are doing anything serious (EG having dozens of windows open, including PDFs and other files that are commonly loaded from the web). If however a machine was used for simply downloading web pages with large pictures in a single session then FTTH would provide a real benefit. Downloading movies over the net would also benefit a lot from FTTH. So it seems to me that browsing the web for research and education (which involves cross-referencing many sites) would gain more of a benefit from new hardware (which will become cheap in a few years) while porn surfing and downloading movies would gain significantly from FTTH.

The NBN will have the potential to offer great bi-directional speeds. The ADSL technology imposes a limit on the combination of upload and download speeds, and due to interference it’s apparently technically easier to get a high download speed. But the upload speeds could be improved a lot by using different DSLAMS. Being able to send out data at a reasonable speed (20Mbit/s or more) has the potential to significantly improve the use of the net in Australia. But if the major ISPs continue to have terms of service prohibiting the running of servers then that won’t make much difference to most users.

Finally there’s the issue of International data transfer which is slow and expensive. This is going to keep all affordable net access plans limited to a small quota (20G of downloads per month or less).

It seems to me that the best way of spending taxpayer money to improve net access would be to provide better connectivity to the rest of the world through subsidised International links.

Brendan makes an interesting point that the NBN is essentially a subsidy to the entertainment industry and that copyright law reform should be a higher priority [5].

1

Bridging and Redundancy

I’ve been working on a redundant wireless network for a client. The network has two sites that have pairs of links (primary and backup) which have dedicated wireless hardware (not 802.11 and some proprietary controller in the device – it’s not an interface for a Linux box).

When I first started work the devices were configured in a fully bridged mode, so I decided to use Linux Bridging (with brctl) to bridge an Ethernet port connected to the LAN with only one of the two wireless devices. The remote end had a Linux box that would bridge both the wireless devices at it’s end (there were four separate end-points as the primary and backup links were entirely independent). This meant of course that packets would go over the active link and then return via the inactive link, but needless data transfer on the unused link didn’t cause any problems.

The wireless devices claimed to implement bridging but didn’t implement STP (Spanning Tree Protocol) and they munged every packet to have the MAC address of the wireless device (unlike a Linux bridge which preserves the MAC address). The lack of STP meant that the devices couldn’t be connected at both ends. They also only forwarded IP packets so I couldn’t use STP implementations in Linux hosts or switches to prevent loops.

Below (in the part of this post which shouldn’t be in the RSS feed) I have included the script I wrote to manage a failover bridge. It pings the router at the other end when the primary link is in use, if it can’t reach it then it removes the Ethernet device that corresponds to the primary link and adds the device related to the secondary link. I had an hourly cron job that would flip it back to the primary link if it was on the secondary.

I ended up not using this in production because there were other some routers on the network which couldn’t cope with a MAC address changing and needed a reboot after such changes (even waiting 15 minutes didn’t result in the new MAC being reliably detected). So I’m posting it here for the benefit of anyone who is interested.
Continue reading

4

Gmail and Anti-Spam

I have just received an email with a question about SE Linux that was re-sent due to the first attempt being blocked by my anti-spam measures. I use the rfc-ignorant.org DNSBL services to stop some of the spam that is sent to me.

The purpose of rfc-ignorant.org is to list systems that are run by people who don’t know how to set up mail servers correctly. But the majority of mail that is blocked when using them comes from large servers owned by companies large enough that they almost certainly employ people who know the RFCs (and who could for a trivial fraction of their budget hire such people). So it seems more about deliberately violating the standards than ignorance.

The person who sent me the email in question said “hopefully, Google knows how to make their MTA compliant with RFC 2142“, such hope is misplaced as a search for gmail.com in the rfc-ignorant.org database shows that it is listed for not having a valid postmaster address [1]. A quick test revealed that two of the Gmail SMTP servers support the postmaster account (or at least it doesn’t give an error response to the RCPT TO command that is referenced in the complaint). However Gmail administrators have not responded to the auto-removal requests, which suggests that postmaster@gmail.com is a /dev/null address.

However that is not a reason to avoid using Gmail. Some time ago Gmail took over the role of “mail server of last resort” from Hotmail. If you have trouble sending email to someone then using a free Gmail account seems to be the standard second option. Because so many people use Gmail and such a quantity of important mail is sent through that service (in my case mail from clients and prospective clients) it is not feasible to block Gmail. I have whitelisted Gmail for the rfc-ignorant.org tests and if Gmail starts failing other tests then I will consider additional white-lists for them.

Gmail essentially has a monopoly of a segment of the market (that of free webmail systems). They don’t have 100%, but they have enough market share that it’s possible to ignore their competitors (in my experience). When configuring mail servers for clients I make sure that whatever anti-spam measures they request don’t block Gmail. As a rule of thumb, when running a corporate mail server you have to set up anti-spam measures to not block the main ISPs in the country (this means not blocking Optus or Telstra BigPond for Australian companies) and not block Gmail. Not blocking Yahoo (for “Yahoo Groups”) is also a good thing, but I have had a client specifically request that I block Yahoo Groups in the past – so obviously there is a range of opinions about the value of Yahoo.

Someone contacted Biella regarding an email that they couldn’t send to me [2]. I have sent an email to Biella’s Gmail account from my Gmail account – that should avoid all possibility of blocking. If the person who contacted Biella also has a Gmail account then they can use that to send me email to my Gmail account (in the event that my own mail server rejects it – I have not whitelisted Gmail for all my anti-spam measures and it is quite possible for SpamAssassin to block mail from Gmail).

It turns out that the person in question used an account on Verizon’s server, according to rfc-ignorant.org Verizon have an unusually broken mail server [3].

If your ISP is Optus, BigPond, Verizon, or something similarly broken and you want to send mail to people in other countries (where your ISP is just another annoyance on the net and not a significant entity that gets special treatment) then I suggest that you consider using Gmail. If nothing else then your Gmail account will still work even after your sub-standard ISP “teaches you a lesson” [4].

5

The National Cost of Slow Internet Access

Australia has slow Internet access when compared to other first-world countries. The costs of hosting servers are larger and the cost of residential access is greater with smaller limits. I read news reports with people in other countries complaining about having their home net connection restricted after they transfer 300G in one month, I have two net connections at the moment and the big (expensive) one allows me 25G of downloads per month. I use Internode, here are their current prices [1] (which are typical for Australia – they weren’t the cheapest last time I compared but they offer a good service and I am quite happy with them).

Most people in Australia don’t want to pay $70 per month for net access, I believe that the plans which have limits of 10G of download or less are considerably more popular.

Last time I investigated hosting servers in Australia I found that it would be totally impractical. The prices offered for limits such as 10G per month (for a server!) were comparable to prices offered by Linode [2] (and other ISPs in the US) for hundreds of gigs of transfer per month. I have recently configured a DomU at Linode for a client, Linode conveniently offers a choice of server rooms around the US so I chose a server room that was in the same region as my client’s other servers – giving 7 hops according to traceroute and a ping time as low as 2.5ms!

Currently I am hosting www.coker.com.au and my blog in Germany thanks to the generosity of a German friend. An amount of bandwidth that would be rather expensive for hosting in Australia is by German standards unused capacity in a standard hosting plan. So I get to host my blog in Germany with higher speeds than my previous Australian hosting (which was bottlenecked due to overuse of it’s capacity) and no bandwidth quotas that I am likely to hit in the near future. This also allows me to do new and bigger things, for example one of my future plans is to assemble a collection of Xen images of SE Linux installations – that will be a set of archives that are about 100MB in size. Even when using bittorrent transferring 100MB files from a server in Australia becomes unusable.

Most Australians who access my blog and have reasonably fast net connections (cable or ADSL2+) will notice a performance improvement. Australians who use modems might notice a performance drop due to longer latencies of connections to Germany (an increase of about 350ms in ping times). But if I could have had a fast cheap server in Australia then all Australians would have benefited. People who access my blog and my web site from Europe (and to a slightly lesser extent from the US) should notice a massive performance increase, particularly when I start hosting big files.

It seems to me that the disadvantages of hosting in Australia due to bandwidth costs are hurting the country in many ways. For example I run servers in the US (both physical and Xen DomUs) for clients. My clients pay the US companies for managing the servers, these companies employ skilled staff in the US (who pay US income tax). It seems that the career opportunities for system administrators in the US and Europe are better than for Australia – which is why so many Australians choose to work in the US and Europe. Not only does this cost the country the tax money that they might pay if employed here, but it also costs the training of other people. It is impossible to estimate the cost of having some of the most skilled and dedicated people (the ones who desire the career opportunities that they can’t get at home) working in another country, contributing to users’ groups and professional societies, and sharing their skills with citizens of the country where they work.

Companies based in Europe and the US have an advantage in that they can pay for hosting in their own currency and not be subject to currency variations. People who run Australian based companies that rent servers in the US get anxious whenever the US dollar goes up in value.

To quickly investigate the hosting options chosen for various blogs I used the command “traceroute -T -p80” to do SYN traces to port 80 for some of the blogs syndicated on Planet Linux Australia [3]. Of the blogs I checked there were 13 hosted in Australia, 11 hosted independently in the US, and 5 hosted with major US based blog hosting services (WordPress.com, Blogspot, and LiveJournal). While this is a small fraction of the blogs syndicated on that Planet, and blog hosting is also a small fraction of the overall Internet traffic, I think it does give an indication of what choices people are making in terms of hosting.

Currently the Australian government is planning to censor the Internet with the aim of stopping child porn. Their general plan is to spend huge amounts of money filtering HTTP traffic in the hope that pedophiles don’t realise that they can use encrypted email, HTTPS, or even a VPN to transfer files without them getting blocked. If someone wanted to bring serious amounts of data to Australia, getting a tourist to bring back a few terabyte hard disks in their luggage would probably be the easiest and cheapest way to do it. Posting DVDs is also a viable option.

Given that the Internet censorship plan is doomed to failure, it would be best if they could spend the money on something useful. Getting a better Internet infrastructure in the country would be one option to consider. The cost of Internet connection to other countries is determined by the cost of the international cables – which can not be upgraded quickly or cheaply. But even within Australia bandwidth is not as cheap as it could be. If the Telstra monopoly on the local loop was broken and the highest possible ADSL speeds were offered to everyone then it would be a good start towards improving Australia’s Internet access.

Australia and NZ seem to have a unique position on the Internet in terms of being first-world countries that are a long way from the nearest net connections and which therefore have slow net access to the rest of the world. It seems that the development of Content Delivery Network [4] technology could potentially provide more benefits for Australia than for most countries. CDN enabling some common applications (such as WordPress) would not require a huge investment but has the potential to decrease international data transfer while improving the performance for everyone. For example if I could have a WordPress slave server in Australia which directed all writes to my server in Germany and have my DNS server return an IP address for the server which matches the region where the request came from then I could give better performance to the 7% of my blog readers who appear to reside in Australia while decreasing International data transfer by about 300MB per month.

7

Jabber

I’ve just been setting up jabber.

I followed the advice from System Monitoring on setting up ejabberd [1]. I had previously tried the default jabber server but couldn’t get it working. The ejabberd is written in Erlang [2] which has it’s own daemon that it launches. It seems that Erlang is designed for concurrent and distributed programming so it has an Erlang Port Mapper Daemon (epmd) to manage communications between nodes. I’ve written SE Linux policy for epmd and for ejabberd, but I’m not sure how well it will work when there are multiple Erlang programs running in different security contexts. It seems that I might be the first person to try running a serious Jabber server on SE Linux. The policy was written a while ago and didn’t support connecting to TCP port 5269 – the standard port for Jabber inter-server communication and the port used by the Gmail jabber server.

The ejabberd has a default configuration file that only requires minor changes for any reasonable configuration and a command-line utility for managing it (adding users, changing passwords, etc). It’s so easy to set up that I got it working and wrote the SE Linux policy for ejabberd in less time than I spent unsuccessfully trying to get jabber to work!

It seems that Jabber clients default to using the domain part of the address to determine which server to talk to (it is possible to change this). So I setup an A record for coker.com.au pointing to my Jabber server, I’ll have the same machine run a web server to redirect http://coker.com.au to http://www.coker.com.au.

For Jabber inter-server communication you need a SRV record [3] in your zone. I used the following line in my BIND configuration:

_xmpp-server._tcp IN SRV 0 5 5269 coker.com.au.

Also for conferencing the default is to use the hostname “conference” in the domain of your Jabber server. So I’ve created conference.coker.com.au to point to my server. This name is used both in Jabber clients and in sample directives in the ejabberd configuration file, so it seemed too difficult to try something different (and there’s nothing wrong with conference as an A record).

I tried using the cabber client (a simple text-mode client), but found two nasty bugs within minutes (SEGV when a field is missing from the config file – Debian bug #503424 and not resetting the terminal mode on exit – Debian bug #503422). So I gave up on cabber as a bad idea.

I am now testing kopete (the KDE IM client) and GAIM aka Pidgin. One annoying bug in Kopete is that it won’t let me paste in a password (see Debian bug #50318). My wife is using Pidgin (formerly known as GAIM) on CentOS 5.2 and finding it to work just as well as GAIM has always worked for her. One significant advantage of Pidgin is that it seems impossible to create a conference in Kopete. Kopete uses one window for each chat and by default Pidgin/GAIM uses a single window with a tab for each chat (with an option to change it). I haven’t seen an option in Kopete to change this, so if you want to have a single window for all your chats and conferences with tabs then you might want to use Pidgin/GAIM.

Another annoying thing about Kopete is that it strictly has a wizard based initial install. I found it difficult to talk my mother through installing it because I couldn’t get my machine to see the same dialogs that were displayed on her machine. In retrospect I probably should have run “ssh -X test@localhost” to run it under a different account.

RPC and SE Linux

One ongoing problem with TCP networking is the combination of RPC services and port based services on the same host. If you have an RPC service that uses a port less than 1024 then typically it will start at 1023 and try lower ports until it finds one that works. A problem that I have had in the past is that an RPC service used port 631 and I then couldn’t start CUPS (which uses that port). A similar problem can arise in a more insidious manner if you have strange networking devices such as a BMC [1] which uses the same IP address as the host and just snarfs connections for itself (as documented by pantz.org [2]), this means that according to the OS the port in question is not in use, but connections to that port will go to the hardware BMC and the OS won’t see them.

Another solution is to give a SE Linux security context to the port which prevents the RPC service from binding to it. RPC applications seem to be happy to make as many bind attempts as necessary to get an available port (thousands of attempts if necessary) so reserving a few ports is not going to cause any problems. As far as I recall my problems with CUPS and RPC services was a motivating factor in some of my early work on writing SE Linux policy to restrict port access.

Of course the best thing to do is to assign IP addresses for IPMI that are different from the OS IP addresses. This is easy to do and merely requires an extra IP address for each port. As a typical server will have two Ethernet ports on the baseboard (one for the front-end network and one for the private network) that means an extra two IP addresses (you want to use both interfaces for redundancy in case the problem which cripples a server is related to one of the Ethernet ports). But for people who don’t have spare IP addresses, SE Linux port labeling could really help.

4

Switches and Cables

I’ve just read an amusing series of blog posts about bad wiring [1]. I’ve seen my share of wiring horror in the past. There are some easy ways of minimising wiring problems which seem to never get implemented.

The first thing to do is to have switches near computers. Having 48 port switches in a server room and wires going across the building causes mess and is difficult to manage. A desktop machine doesn’t need a dedicated Gig-E (or even 100baseT) connection to the network backbone. Cheap desktop switches installed on desks allow one cable to go to each group of desks (or two cables if you have separate networks for VOIP and data). If you have a large office area then a fast switch in the corner of the room connecting to desktop switches on the desks is a good way to reduce the cabling requirements. The only potential down-side is that some switches are noisy, the switches with big fans can be easily eliminated by a casual examination, but the ones that make whistling sounds from the PSU need to be tested first. The staff at your local electronics store should be very happy to open one item for inspection and plug it in if you are about to purchase a moderate number (they will usually do so even if you are buying a single item).

A common objection to this is the perceived lack of reliability of desktop switches. One mitigating factor is that if a spare switch is available the people who work in the area can replace a broken switch. Another is that my observation is that misconfiguration on big expensive switches causes significantly more down-time than hardware failures on cheap switches ever could. A cheap switch that needs to be power-cycled once a month will cause little interruption to work, while a big expensive switch (which can only be configured by the “network experts” – not regular sysadmins such as me) can easily cause an hour of down-time for most of an office during peak hours. Finally the reliability of the cables themselves is also an issue, having two cables running to the local switch in every office can allow an easy replacement to fix a problem – it can be done without involving the IT department (who just make sure that both cables are connected to the switch in the server room). If there is exactly one cable running to each PC from the server room and one of the cables fails then someone’s PC will be offline for a while.

In server rooms the typical size of a rack is 42RU (42 Rack Units). If using 1RU servers that means 42 Ethernet cables. A single switch can handle 48 Ethernet ports in a 1RU mount (for the more dense switches), others have 24 ports or less. So a single rack can handle 41 small servers and a switch with 48 ports (two ports to go to the upstream switch and five spare ports). If using 2RU servers a single rack could handle 20 servers and a 24port switch that has two connections to the upstream switch and two spare ports. Also it’s generally desirable to have at least two Ethernet connections to each server (public addresses and private addresses for connecting to databases and management). For 1RU servers you could have two 48 port switches and 40 servers in a rack. For 2RU servers you could have 20 servers and either two 24port switches or one 48port switch that supports VLANs (I prefer two switches – it’s more difficult to mess things up when there are two switches, if one switch fails you can login via the other switch to probe it, and it’s also cheaper). If the majority of Ethernet cables are terminated in the same rack it’s much harder for things to get messed up. Also it’s very important to leave some spare switch ports available as it’s a common occurrence for people to bring laptops into a server room to diagnose problems and you really don’t want them to unplug server A to diagnose a problem with server B…

Switches should go in the middle of the rack. While it may look nicer to have the switch at the top or the bottom, that means that the server which is above or below it will have the cables for all the other switches going past it. Ideally the cables would go in neat cable runs at the side of the rack but in my experience they usually end up just dangling in front. If the patch cables are reasonably short and they only dangle across half the servers things won’t get too ugly (this is harm minimisation in server room design).

The low end of network requirements is usually the home office. My approach to network design for my home office is quite different, I have no switches! I bought a bunch of dual-port Ethernet cards and now every machine that I own has at least two Ethernet ports (and some have as many as four). My main router and gateway has four ports which allows connections from all parts of my house. Then every desktop machine has at least two ports so that I can connect a laptop in any part of the house. This avoids the energy use of switches (I previously used a 24 port switch that drew 45W [2]), switches of course also make some noise and are an extra point of failure. While switches are more reliable than PCs, as I have to fix any PC that breaks anyway my overall network reliability is increased by not using switches.

For connecting the machines in my home I mostly use bridging (only the Internet gateway acts as a router), I have STP enabled on all machines that have any risk of having their ports cross connected but disable it on some desktop machines with two ports (so that I can plug my EeePC in and quickly start work for small tasks).

3

New Net Connections

On Thursday my new InterNode ADSL2+ service was connected [1]. I needed to get a connection with a larger download cap and a better upload speed because one of my clients wants me to transfer some significant amounts of data as well as hosting some Xen DomU’s for him. Strangely InterNode couldn’t offer a regular ADSL service but could offer Naked DSL (which means DSL without a phone service on the same pair of wires). So I now have Naked ADSL 2+[2] although unfortunately the line speed is reported as being 8263/814 Kbps – ADSL2 speed. For the moment this will do, but I’ll investigate the possibility of improving this eventually. Another strange thing about this is that Optus is the carrier for the ADSL line, Telstra is the monopoly telco with the vast majority of local-loop copper pairs so it’s surprising that I end up with Optus owning my copper – the wires in question were used by the previous owner of my house for a Telstra connection!

On Friday I converted my network to using the new ADSL link and on Saturday I got my SE Linux Play Machine online again [3]. I could have managed the transition without ~20 hours of down-time for the Play Machine, but as I only get a few logins per day it didn’t seem to be worth the effort of rushing it.

Also on Friday I got a new 3G modem from Three [4]. They advertise that the USB or PC-Card modem will cost $5 per month on a 24 month plan, but when I ordered it I discovered that as I have an existing mobile phone plan with them the $5 per month is waived. So all that I have to pay is $15 per month for a 1GB data allowance (which is about the best deal available). A client is paying for this so that I can provide remote support for his network. I had previously written about my search for an ideal mobile SSH client [5], I ended up getting an EeePC 701 which cost $300, an 8G SD card and 8G USB stick (for expanding the internal storage and for moving files around respectively) which cost $83, and now $15 per month over 24 months for net access. That gives a total of $743 for two years of mobile net access. This compares well to the $960 that an iPhone would cost over two years and provides a lot more utility (while admittedly not fitting into a pocket).

Dave Hall [6] gave me a lot of great advice about selecting and using 3G modems. He recommended that I get the E220 modem as it’s easy to configure (mine was easy on Debian/Lenny but didn’t work with a Debian/Etch kernel as the necessary driver was not available). Three also sells another model the E169G which is apparently tricky to set up. There was one mistake made in the design of the E220, the cable is installed in such a way that the LED which indicates connection status is going to be face-down when connected to almost any laptop which has horizontally mounted USB ports (which includes all the thinner ones).

Here is the PPP chatscript for use with an E220 modem (suggested file name /etc/chatscripts/three):
Continue reading

7

Pollution and Servers

There is a lot of interest in making organisations “green” nowadays. One issue is how to make the IT industry green. People are talking about buying “offsets” for CO2 production, but the concern is that some of the offset schemes are fraudulent. Of course the best thing to do is to minimise the use of dirty power as much as possible.

Of course the first thing to do is to pay for “green power” (if available) and if possible install solar PV systems on building roofs. While the roof space of a modern server room would only supply a small amount of the electricity needed (maybe less than needed to power the cooling) every little bit helps. The roof space of an office building can supply a significant portion of the electricity needs, two years ago Google started work on instralling Solar PV panels on the roof of the “Googleplex” [1] with the aim of supplying 30% of the building’s power needs.

For desktop machines a significant amount of power can be saved if they are turned off overnight. For typical office work the desktop machines should be idle most of the time, so if the machine is turned off outside business hours then it will use something close to 45/168 of the power that it might otherwise use. Of course this requires that the OS support hibernation (which isn’t supported well enough in Linux for me to want to use it) or that applications can be easily stopped and restarted so that the system can be booted every morning. One particular corner case is that instant-messaging systems need to be server based with an architecture that supports storing messages on the server (as Jabber does [2]) rather than requiring that users stay connected (as IRC does). Of course there are a variety of programs to proxy the IRC protocol and using screen on a server to maintain a persistent IRC presence is popular among technical users (for a while I used that at a client site so that I could hibernate the PowerMac I had on my desktop when I left the office).

It seems that most recent machines have BIOS support for booting at a pre-set time. This would allow the sys-admin to configure the desktop machines to boot at 8:00AM on every day that the office is open. That way most employees will arrive at work to find that their computer is already booted up and waiting for them. We have to keep in mind the fact that when comparing the minimum pay (about $13 per hour in Australia) with the typical electricity costs ($0.14 per KWh – which means that a desktop computer might use $0.14 of electricity per day) there is no chance of saving money if employee time is wasted. While companies are prepared to lose some money in the process of going green, they want to minimise that loss as much as possible.

The LessWatts.org project dedicated to saving energy on Linux systems reports that Gigabit Ethernet uses about 2W more power than 100baseT on the same adapter [3]. It seems most likely that similar savings can be achieved from other operating systems and also from other network hardware. So I expect that using 100baseT speed would not only save about 2W at the desktop end, but it would also save about 2W at the switch in the server-room and maybe 1W in cooling as well. If you have a 1RU switch with 24 Gig-E ports then that could save 48W if the entire switch ran at 100baseT speed, compared to a modern 1RU server which might take a minimum of 200W that isn’t very significant.

The choice of server is going to be quite critical to power use, it seems that all vendors are producing machines that consume less power (if only so that they can get more servers installed without adding more air-conditioning), so some effort in assessing power use before purchase could produce some good savings. When it comes time to decommission old servers it is a good idea to measure the power use and decommission the most power hungry ones first whenever convenient. I am not running any P4 systems 24*7 but have a bunch of P3 systems running as servers, this saves me about 40W per machine.

It’s usually the case that the idle power is a significant portion of the maximum power use. In the small amount of testing I’ve done I’ve never been able to find a case where idle power was less than 50% of the maximum power – of course if I spun-down a large number of disks when idling this might not be the case. So if you can use one virtual server that’s mostly busy instead of a number of mostly idle servers then you can save significant amounts of power. Before I started using Xen I had quite a number of test and development machines and often left some running idle for weeks (if I was interrupted in the middle of a debugging session it might take some time to get back to it). Now if one of my Xen DomU’s doesn’t get used for a few weeks it uses little electricity that wouldn’t otherwise be used. It is also possible to suspend Xen DomU’s to disk when they are not being used, but I haven’t tried going that far.

Xen has a reputation for preventing the use of power saving features in hardware. For a workstation this may be a problem, but for a server that is actually getting used most of the time it should not be an issue. KVM development is apparently making good progress, and KVM does not suffer from any such problems. Of course the down-side to KVM is that it requires an AMD64 (or Intel clone) system with hardware virtualisation, and such systems often aren’t the most energy efficient. A P3 system running Xen will use significantly less power than a Pentium-D running KVM – server consolidation on a P3 server really saves power!

I am unsure of the energy benefits of thin-client computing. I suspect that thin clients can save some energy as the clients take ~30W instead of ~100W so even if a server for a dozen users takes 400W there will still be a net benefit. One of my clients does a lot of thin-client work so I’ll have to measure the electricity use of their systems.

Disks take a significant amount of power. For a desktop system they can be hibernated at times (an office machine can be configured such that the disks can spin-down during a lunch break). This can save 7W per disk (the exact amount depends on the type of disk and the efficiency of the PSU – (see the Compaq SFF P3 results and the HP/Compaq Celeron 2.4GHz on my computer power use page [4]). Network booting of diskless workstations could save 7W for the disk (and also reduce the noise which makes the users happy) but would drive the need for Gigabit Ethernet which then wastes 4W per machine (2W at each end of the Ethernet cable).

Recently I’ve been reading about the NetApp devices [5]. By all accounts the advanced features of the NetApp devices (which includes their algorithms for the use of NVRAM as write-back cache and the filesystem journaling which allows most writes to be full stripes of the RAID) allow them to deliver performance that is significantly greater than a basic RAID array with a typical filesystem. It seems to me that there is the possibility of using a small number of disks in a NetApp device to replace a larger number of disks that are directly connected to hosts. Therefore use of NetApp devices could save electricity.

Tele-commuting has the potential to save significant amounts of energy in employee travel. A good instant-messaging system such as Jabber could assist tele-commuters (it seems that a Jabber server is required for saving energy in a modern corporate environment).

Have I missed any ways that sys-admins can be involved in saving energy use in a corporation?

Update: Albert pointed out that SSD (Solid State Disks) can save some power. They also reduce the noise of the machine both by removing one moving part and by reducing heat (and therefore operation of the cooling fan). They are smaller than hard disks, but are large enough for an OS to boot from (some companies deliberately only use a small portion of the hard drives in desktop machines to save space on backup tapes). It’s strange that I forgot to mention this as I’m about to buy a laptop with SSD.