Archives

Categories

SE Linux Policy Packaging for a Distribution

Caleb Case (Ubuntu contributer and Tresys employee) has written about the benefits of using separate packages for SE Linux policy modules [1].

Firstly I think it’s useful to consider some other large packages that could be split into multiple packages. The first example that springs to mind is coreutils which used to be textutils, shellutils, and fileutils. Each of those packages contained many programs and could conceivably have been split. Some of the utilities in that package are replaced for most use, for example no-one uses the cksum utility, generally md5sum and sha1sum (which are in the same package) are used instead. Also the pinky command probably isn’t even known by most users who use finger instead (apart from newer Unix users who don’t even know what finger is). So in spite of the potential benefit of splitting the package (or maintaining the previous split) it was decided that it would be easier for everyone to have a single package. The merge of the three packages was performed upstream, but there was nothing preventing the Debian package maintainer from splitting the package – apart from the inconvenience to everyone. The coreutils package in Etch takes 10M of disk space when installed, as it’s almost impossible to buy a new hard drive smaller than 80G that doesn’t seem to be a problem for most users.

The second example is the X server which has separate packages for each video card. One thing to keep in mind about the X server is that the video drivers don’t change often. While it is quite possible to remove a hard drive from one machine and install it in another, or duplicate a hard drive to save the effort of a re-install (I have done both many times) they are not common operations in the life of a system. Of course when you do require such an update you need to first install the correct package (out of about 60 choices), which can be a challenge. I suspect that most Debian systems have all the video driver packages installed (along with drivers for wacom tablets and other hardware devices that might be used) as that appears to be the default. So it seems likely that a significant portion of the users have all the packages installed and therefore get no benefit from the split package.

Now let’s consider the disk space use of the selinux-policy-default package – it’s 24M when installed. Of that 4.9M is in the base.pp file (the core part of the policy which is required), then there’s 848K for the X server (which is going to be loaded on all Debian systems that have X clients installed – due to an issue with /tmp/.ICE-unix labelling [2]). Then there’s 784K for the Postfix policy (which is larger than it needs to be – I’ve been planning to fix this for the past four years or so) and 696K for the SSH policy (used by almost everyone). The next largest is 592K for the Unconfined policy, the number of people who choose not to use this will be small, and as it’s enabled by default it seems impractical to provide a way of removing it.

One possibility for splitting the policy is to create a separate package of modules used for the less common daemons and services, if modules for INN, Cyrus, distcc, ipsec, kerberos, ktalk, nis, PCMCIA, pcscd, RADIUS, rshd, SASL, and UUCP were in a separate package then that would reduce the installed size of the main package by 1.9M while providing no change in functionality to the majority of users.

One thing to keep in mind is that each package at a minimum will have a changelog and a copyright file (residing in a separate directory under /usr/share/doc) and three files as part of the dpkg data store, each of which takes up at least one allocation unit on disk (usually 4K). So adding one extra package will add at least 24K of disk space to every system that installs it (or 32K if the package has postinst and postrm scripts). This is actually a highly optimal case, the current policy packages (selinux-policy-default and selinux-policy-mls) each take 72K of disk space for their doc directory.

One of my SE Linux server sytems (randomly selected) has 23 policy modules installed, if they were in separate packages there would be a minimum of 552K of disk space used by packaging, 736K if there were postinst and postrm scripts, and as much as 2M if the doc directory for each package was similar to the current doc directories). As the system in question needs 5796K of policy modules, the 2M of overhead would make it approach 8M of disk space. So it would only be a saving of 16M over the current situation. While saving that amount of disk space is a good thing, I think that when balanced against the usability issues it’s not worth-while.

Currently the SE Linux policy packages will determine what applications are installed and automatically load policy packages to match. I don’t believe that it’s possible to have a package post-inst script install other packages (and if it is possible I don’t think it’s desirable). Therefore to have separate packages would make a significant difference to the ease of use, it seems that the best way to manage it would be to have the core policy package include a script to install the other packages.

Finally there’s the issue of when you recognise the need for a policy module. It’s not uncommon for me to do some work for a client while on a train, bus, or plane journey. I will grab packages needed to simulate a configuration that the client desires and then work out how to get it going correctly while on the journey. While it would not be a problem for me (I always have the SE Linux policy source and all packages on hand) I expect that many people who have similar needs might find themself a long way from net access without the policy package that they need to do their work. Sure such people could do their work in permissive mode, but that would encourage them to deploy in permissive mode too and thus defeat the goals of the SE Linux project (in terms of having wide-spread adoption).

My next post on this topic will cover the issue of custom policy.

Updated to note that Caleb is a contributor to Ubuntu not a developer.

Australian Business and IT Expo

I’ve just visited the Australian Business and IT Expo (ABITE) [1]. I haven’t been to such an event for a while, but Peter Baker sent a link for a free ticket to the LUV mailing list and I was a bit bored to I attended.

The event was a poor shadow on previous events that I had attended. The exhibition space was shared with an event promoting recreational activities for retirees, and an event promoting wine and gourmet food. I’m not sure why the three events were in the room, maybe they figured that IT people and senior citizens both like gourmet food and wine.

The amount of space used for the computer stands was small, and there was no great crowd of delegates – when they can’t get a good crowd for Saturday afternoon it’s a bad sign for the show.

I have previously blogged about the idea of putting advertising on people’s butts [2]. One company had two women working on it’s stand with the company’s name on the back of their shorts.

A representative of a company in the business of Internet advertising asked me how many hits I get on my blog, I told him 2,000 unique visitors a month (according to Webalizer) which seemed to impress him. Actually it’s about 2,000 unique visitors a day. I should rsync my Webalizer stats to my EeePC so I can give detailed answers to such questions.

The IT event seemed mostly aimed at managers. There were some interesting products on display, one of which was a device from TabletPC.com.au which had quite good handwriting recognition (but the vocabulary seemed limited as it couldn’t recognise a swear-word I used as a test).

Generally the event was fun (including the wine and cheese tasting) and I don’t regret going. If I had paid $10 for a ticket I probably would have been less happy with it.

Updated to fix the spelling of “wine”. Not a “wind tasting”.

Starting to Blog

The best way to run a blog is to run your own blog server. This can mean running an instance on someone else’s web server (some ISPs have special hosting deals for bloggers on popular platforms such as WordPress), but usually means having shell access to your own server (I’ve previously written about my search for good cheap Xen hosting [1]).

There are platforms that allow you to host your own blog without any technical effort. Three popular ones are WordPress.com, LiveJournal.com, and Blogger.com. But they give you less control over your own data, particularly if you don’t use your own DNS name (blogger allows you to use their service with your own DNS name).

Currently it seems to me that WordPress is the best blog software by many metrics. It has a good feature set, a plugin interface with lots of modules available, and the code is free. The down-side is that it’s written in PHP and has the security issues that tend to be associated with large PHP applications.

Here is a good summary of the features of various blog server software [2]. One that interests me is Blojsom – a blog server written in Java [3]. The Java language was designed in a way that leads to less risk of security problems than most programming languages, as it seems unlikely that anyone will write a Blog server in Ada it seems that Java is the best option for such things. I am not planning to switch, but if I was starting from scratch I would seriously consider Blojsom.

But for your first effort at blogging it might be best to start with one of the free hosted options. You can always change later on and import the old posts into your new blog. If you end up not blogging seriously then using one of the free hosted services saves you the effort of ongoing maintenance.

CPU vs RAM

When configuring servers the trade-offs between RAM and disk are well known. If your storage is a little slow then you can often alleviate the performance problems by installing more RAM for caching and to avoid swapping. If you have more than adequate disk IO capacity then you can over-commit memory and swap out the things that don’t get used much.

One that often doesn’t get considered i the trade-off between RAM and CPU. I just migrated a server image from a machine with two P4 CPUs to a DomU on a machine with Opteron CPUs. The P4 system seemed lightly loaded (a maximum of 30% CPU time in use over any 5 minute period) so I figured that if two P4 CPUs are 30% busy then a single Opteron core should do the job. It seems that when running 32bit code, 30% of 2*3.0GHz P4 CPUs is close to the CPU power of one core of an Opteron 2352 (2.1GHz). I’m not sure whether this is due to hyper-threading actually doing some good or inefficiencies in running 32bit code on the Opteron – but the Opteron is not giving the performance I expected in this regard.

Now having about 90% of the power of that CPU core in use might not be a problem, except that the load came in bursts. When a burst took the machine to 100% CPU power a server process kept forking off children to answer requests. As all the CPU power was being used it took a long time to answer queries (several seconds) so the queue started growing without end. Eventually there were enough processes running that all memory was used, the machine started thrashing, and eventually the kernel out of memory handler started killing things.

I rebooted the DomU with two VCPUs (two Opteron cores) and there was no problem, performance was good, and because the load bursts last less than a minute the load average seems to stay below 1.

It seems that the use of virtual machines increases the scope of this problem. The advantage of virtual machines is that you can add extra virtual hardware more easily (up to the limit of the physical hardware of course) – I could give the DomU in question 6 Opteron cores in a matter of minutes if it was necessary. The disadvantage is that the CPU use of other virtual machines can impact the operation. As there seems to be an exponential relationship between the number of CPU cores in a system and the overall price it’s not feasible to just put in 32 core Xen servers. While CPU power has generally been increasing faster than disk performance for a long time (at least the last 20 years) it seems that virtualisation provides a way of using a lot of that CPU power. It is possible to have a 1:1 mapping of real CPUs and VCPUs in the Xen DomU’s, if you were to install 8 DomU’s that each had one VCPU on a server with 8 cores then there would be no competition between DomUs for CPU time – but that would significantly increase the cost of running them (some ISPs offer this for a premium price).

In this example, if I had a burst of load for the service in question at the same time as other DomUs were using a lot of CPU time (which is a possibility as the other DomUs are the clients for the service in question) then I might end up with the same problem in spite of having assigned two VCPUs to the DomU.

The real solution is to configure the server to limit the number of children that it forks off, the limit can be high enough to guarantee 100% CPU use at times of peak load without being high enough to start swapping.

I wonder how this goes with ISPs that offer Xen hosting. It seems that you would only need to have one customer who shares the same Xen server as you experiencing such a situation to cause enough disk IO to cripple the performance that you get.

SpamAssassin During SMTP

For some time people have been telling me about the benefits of SpamAssassin (SA). I have installed it once for a client (at their demand and against my recommendation) but was not satisfied with the result (managing the spam folder was too complex for their users).

The typical configuration of SA has it run after mail has been accepted by the server. Messages that it regards as spam are put into a spam folder. This means that when someone phones you about some important message you didn’t receive then you have to check that folder. Someone who sends mail to a user who has such a SA configuration can not expect that the message will either be received or rejected (thus giving them a bounce message).

Even worse it seems to be quite common for technical users to train the Bayesian part of SA on messages from the spam folder – without reviewing them! Submitting a folder of spam that has been carefully reviewed for Bayesian training can increase the accuracy of classification (including taking account for locality and language differences in spam). Submitting a folder which is not reviewed means that when a false-positive gets into that folder (which will eventually happen) it is used as training for spam recognition thus increasing the incidence of false-positives!

Spam has been becoming more of a problem for me recently, on a typical day between 20 and 40 spam messages would get past the array of DNSBL services I use and be re-sent to pass the grey-listing. Also I have been receiving complaints from people who want to send email to me about some of the DNSBL and RHSBL services I use (the rfc-ignorant.org service gets a lot of complaints – there are a huge number of ignorant and lazy people running mail servers).

So now I have installed spamassassin-milter to have SA run during the SMTP protocol. Then if SA checks indicate that the message is SPAM my mail server can just reject the message with a 55x which will cause the sending mail server to generate a local bounce (if it’s a legitimate message) or to just be discard it in the case of a spam server. Here is how to set it up on Debian/Lenny and CentOS 5:

Install the package yum install spamass-milter or apt-get install spamass-milter spamassassin spamc (spamassassin seems to be installed by default on CentOS). On a Debian system the milter will be setup and running. On CentOS you have to run the following commands:
useradd -m -c "Spamassassin Milter" -s /bin/false spamass-milter
mkdir /var/run/spamass-milter
chown spamass-milter /var/run/spamass-milter
chmod 711 /var/run/spamass-milter
echo SOCKET="/var/run/spamass-milter/spamass.sock" >> /etc/sysconfig/spamass-milter

On CentOS edit /etc/init.d/spamass-milter and change the daemon start line to ‘runuser – spamass-milter -s /bin/bash -c "/usr/sbin/spamass-milter -p $SOCKET -f $EXTRA_FLAGS"‘ Then add the following lines below it:
chown postfix:postfix /var/run/spamass-milter/spamass.sock
chmod 660 /var/run/spamass-milter/spamass.sock

The spamass-milter program talks to the SpamAssassin daemon spamd.

On both Debian and CentOS run the command “useradd -c Spamassassin -m -s /bin/false spamassassin” to create an account for SA. The Debian bug #486914 [1] has a request to have SA not run as root by default.

On CentOS it seems that SA wants to use a directory under the spamass-milter home directory, the following commands alllow this. It would be good to have it not do that, or maybe it would be better to have the one Unix account used for SA and the milter.
chmod 711 ~spamass-milter
mkdir ~spamassassin/.spamassassin
chown spamassassin ~spamassassin/.spamassassin

On Debian edit the file /etc/default/spamassassin and add “-u spamassassin -g spamassassin” to the OPTIONS line. On CentOS edit the file /etc/sysconfig/spamassassin and add “-u spamassassin -g spamassassin” to the SPAMDOPTIONS line.

To enable the daemons, on CentOS you need to run “chkconfig spamass-milter on ; chkconfig spamassassin on“, on Debian edit the file /etc/default/spamassassin and set ENABLED=1.

Now start the daemons, on CentOS use the command “service spamassassin start ; service spamass-milter start“, on Debian use the command “/etc/init.d/spamassassin start“.

Now you have to edit the mail server configuration, for Postfix on CentOS the command “postconf -e smtpd_milters=unix:/var/run/spamass-milter/spamass.sock” will do it, for Postfix on Debian the command “postconf -e smtpd_milters=unix:/var/spool/postfix/spamass/spamass.sock” will do it.

Now restart Postfix and it should be working.

For correct operation you need to ensure that the score needed for a bounce is specified as the same number in both the spamass-milter and SA configuration. If you have a lower number for the spamass-milter configuration (as is the default in Debian) then bounces can be generated – you should never generate a bounce for a spam. The config file /etc/default/spamass-milter allows you to specify the score for rejecting mail, I am currently using a score of 5. Any changes to the score need matching changes to /etc/mail/spamassassin/local.cf (which has a default required_score of 5 in Debian).

You can grep for “spamd..result..Y” in your mail log to see entries for messages that were rejected.

One problem that I have with this configuration on Debian (not on CentOS) is that spamd is logging messages such as “spamd: handle_user unable to find user: ‘russell’“. I don’t want it to look for ~russell when processing mail for russell@coker.com.au because I have a virtual domain set up and the delivery mailbox has a different name. Ideally I could configure it to know the mapping between users and mailboxes (maybe by parsing the /etc/postfix/virtual.db file). But having it simply not attempt to access per-user configuration would be good too. Any suggestions would be appreciated.

Now that I have SpamAssassin running it seems that I am getting about 5 spams a day, the difference is significant. The next thing I will do is make some of the DNSBL checks that are prone to false-positives become SpamAssassin scores instead.

When I started writing this post I was not planning to compare the sys-admin experiences of CentOS and Debian. But it does seem that there is less work involved in the task of installing Debian packages.

Compassion for Windows Users

In a discussion which covered some of the differences between Linux and Windows, a Windows using friend asked me if I felt compassion for Windows users.

I feel some compassion for people who have bad working environments. While using an operating system that has poor support for the business tasks does decrease the quality of the working environment, there are bigger issues. For example a while ago I was doing some sys-admin work for a financial organisation. I had to use Windows for running the SSH client to connect to Linux servers, this was annoying and decreased my productivity due to the inability to script connections etc. My productivity was also decreased because of my unfamiliarity with the Windows environment, it seems reasonable to assume that when you hire a Linux sys-admin they will have some experience of Linux on the desktop and be quite productive with a Linux desktop system – while the same can not be said for a Windows desktop. But what really made the working environment awful was the paperwork and the procedures. If a server doesn’t work properly and someone says “please fix it now” and I only have a VT100 terminal then I’ll be reasonably happy with that work environment (really – I wouldn’t mind a contract where the only thing on my desk was a VT100 connected to a Linux server). But when a server process hangs in the same way several times a week, when the cause of the problem is known and the fix (restarting the process) is known it really pains me to have to wait for a management discussion about the scope of the problem before restarting it.

But I don’t have a great deal of sympathy for people who end up in bad working environments such as the one I was briefly in. Anyone who is capable of getting such a job is capable of getting a job with a better working environment while still earning significantly more than the median income. The people I feel sorry for are the ones who work on the minimum wage. I don’t think that the difference between Linux and Windows on the desktop would matter much if you were getting the minimum wage, and people who are on the minimum wage don’t have a lot of choice in regard to employment (I think that all options for them suck).

I don’t have much sympathy for adults who use Windows at home. I have to admit that there are some benefits to running Windows at home, mainly that the hardware vendors support it better (few companies sell PCs with Linux pre-loaded) and there are some commercial games which are in some ways better than the free games (of course there are more than enough free Linux games to waste all your time – and some games are best suited to a console). Linux has significantly lower hardware requirements than Windows (my main machine which I am using to write this blog post is more than three years old and has less power than any other machine on sale today apart from some ultra-mobile PCs), so any long-term Windows user can install Linux on one of their machines which lacks the resources to run the latest version of Windows.

The only Windows users for whom I have much sympathy are children. When I was young every PC came with a BASIC interpreter and everyone shared source code. Books were published which taught children how to program in BASIC which included fairly complete example programs. For the cases where proprietary software was needed the prices used to be quite low (admittedly the programs were much less complex – so pricing is probably in line with the effort or writing the code). Now it seems that computers are often being provided to children as closed systems that they can’t manipulate, the web browser has replaced the TV.

I believe that Linux is the ideal OS for a child to use. There is a wide range of free educational programs (including kturtle – the traditional Logo turtle) and there are also a range of free powerful programs which can be used by any child. Few parents would buy Photoshop or Adobe Illustrator for a child to play with, but anyone can give a child a $100 PC with GIMP and Inkscape installed. They might as well give 3yo children access to the GIMP – it will be less messy than fingerpainting!

I expect that some parents would not consider Linux for their children because they don’t know how to use it. Fortunately Linux is easy enough to use that a child can install it without effort. Some time ago the 11yo daughter of a friend who was visiting asked if she could play some computer games. I gave her a Fedora CD and one of the PCs from my test lab and told her that she had to install the OS first. Within a small amount of time she had Fedora installed and was playing games. While the games she played were not particularly educational, the fact that she could install the OS on a computer was a useful lesson.

It seems to me that children who are raised as Windows users are less likely to learn how computers work or be able to control them properly. I expect that on average a child who is raised in such a manner will have fewer career options in today’s environment than one who properly understands and controls computers.

Executable Stacks in Lenny

One thing that I would like to get fixed for Lenny is the shared objects which can reduce the security of a system. Almost a year ago I blogged about the libsmpeg0 library which is listed as requiring an executable stack [1]. I submitted a two-line patch which fixes the problem while making no code changes (the patch gives the same result as running “execstack -c” on the resulting shared object).

My previous post documents the results of the problem when running SE Linux (a process is not permitted to run and an AVC message is logged). Some people might incorrectly think that this is merely a SE Linux functionality issue.

The program paxtest (which is in Debian but is i386 only) tests for a variety of kernel security features in terms of memory management. To demonstrate the problem that is caused by this issue I ran the commands “paxtest kiddie” and “LD_PRELOAD=/usr/lib/libsmpeg-0.4.so.0 paxtest kiddie“. The difference is that the test named “Executable stack” returns a result of Vulnerable when the object is loaded.

This means for example that attacks which rely on an executable stack will be permitted if the libsmpeg-0.4.so.0 shared object is loaded. So for example a program that loads the library and which takes data from the Internet (EG FreeCiv in network mode) will become vulnerable to attacks which rely on an executable stack because of this bug!

My Etch SE Linux repository has had a libsmpeg0 package which fixes this bug on i386 for almost a year [2] (the AMD64 packages are more recent). I have now added packages to fix this bug to my Lenny SE Linux repository [3]. I have also volunteered to NMU the package for Lenny. It seems that it would be rather embarrassing for everyone concerned systems were vulnerable to attack because of a two-line patch not being applied for almost a year.

I expect that the Release Team will be very accepting of package updates for Lenny which have patches to address this issue. A patch that has one line per assembler file (in the worst-case) to mark the object code is very easy to review. The results of the patch can be tested easily, and failure to have such a patch opens potential security holes. Package maintainers who can’t fix the assembly code can always run “execstack -c” in the build scripts to give the same result.

Lintian performs checks for executable stacks and the results are archived here [4]. There are currently 36 packages which contain binaries listed as needing executable stacks, I would be surprised if more than 6 of them actually contain shared objects that need an executable stack. If you use a package that is on that list then please test whether an executable stack is required by running “execstack -c” on the shared object and see if it still works. If a test of most of the high-level operations of the program in question can be completed successfully without an executable stack then it’s a strong indication that it’s not needed. Note that execstack is in the prelink package. I am happy to help with writing the patches to the packages and using my repositories to distribute the packages, but am not going to do so unless I can work with someone who uses the program in question and can test it’s functions. As an example of such testing I played a game of Frozen Bubble to test out the libsmpeg0 patch.

Xen CPU use per Domain

The command “xm list” displays the number of seconds of CPU time used by each Xen domain. This makes it easy to compare the CPU use of the various domains if they were all started at the same time (usually system boot). But is not very helpful if they were started at different times.

I wrote a little Perl program to display the percentage of one CPU that has been used by each domain, here is a sample of the output:

Domain-0 uses 7.70% of one CPU
demo uses 0.06% of one CPU
lenny32 uses 2.07% of one CPU
unstable uses 0.30% of one CPU

Now the command “xm top” can give you the amount of CPU time used at any moment (which is very useful). But it’s also good to be able to see how much is being used over the course of some days of operation. For example if a domain is using the equivalent of 34% of one CPU over the course of a week (as one domain that I run is doing) then it makes sense to allocate more than one VCPU to it so that things don’t slow down at peak times or when cron jobs are running.

I believe that it’s best to limit the number of CPUs allocated to Xen domains. For example I am running a Xen server with 8 CPU cores. I could grant each domain access to 8 VCPUs, but then any domain could use all that CPU power if it ran wild. While if I give no domain more than 2 VCPUs then one domain could use all the CPU resources allocated to it without the other domains being impacted. I realise that there are scheduling algorithms in the Xen kernel that are designed to deal with such situations, but I believe that simply denying access to excessive resource use is more effective and reliable.

I have not filed a bug report requesting that my script be added to one of the Xen packages as I’m not sure which one it would belong in (and also it’s a bit of a hack). It’s licensed under the GPL so anyone who wants to use it can do what they want. Any distribution package maintainer who wants to include it in a Xen utilities package is welcome to do so. The code is below.
Continue reading Xen CPU use per Domain

A Basic IPVS Configuration

I have just configured IPVS on a Xen server for load balancing between multiple virtual hosts. The benefit is not load balancing but management. With two virtual machines providing a service I can gracefully shut one down for maintenance and have the other take the load. When there are two machines providing a service a load balancing configuration is much better than a hot-spare, one reason is the fact that there may be application scaling issues that prevent one machine with twice the resources from giving as much performance as two smaller machines. Another is the fact that if you have a machine configured but never used there will always be some doubt as to whether it would work…

The first thing to do is to assign the IP address of the service to the front-end machine so that other machines on the segment (IE routers) will be able to send data to it. If the address for the service is 10.0.0.5 then the command “ip addr add dev eth0 10.0.0.5/24 broadcast +” will make it a secondary address on the eth0 interface. On a Debian system you would add the line “up ip addr add dev eth0 10.0.0.5/24 broadcast + || true” to the appropriate section of /etc/network/interfaces, for a Red Hat system it seems that /etc/rc.local is the best place for it. I expect that it would be possible to merely advertise the IP address via ARP without adding it to the interface, but the ability to ping the IPVS server on the service address seems useful and there seems no benefit in not assigning the address.

There are three methods used by IPVS for forwarding packets, gatewaying/routing (the default), IPIP encapsulation (tunneling), and masquerading. The gatewaying/routing method requires the back-end server to respond to requests on the service address. That would mean assigning the address to the back-end server without advertising it via ARP (which seems likely to have some issues for managing the system). The IPIP encapsulation method requires setting up IPIP which seemed like it would be excessively difficult (although maybe not more than required to set up masquerading). The masquerading option (which I initially chose) rewrites the packets to have the IP address of the real server. So for example if the service address is 10.0.0.5 and the back-end server has the address 10.0.1.5 then it will see packets addresses to 10.0.1.5. A benefit of masquerading is that it allows you to use different ports, so for example you could have a non-virtualised mail server listening on port 25 and a back-end server for a virtual service listening on port 26. While there is no practical limit to the number of private IP addresses that you might use it seems easier to manage servers listening on different ports with the same IP address – and there is the issue of server programs that are not written to support binding to an IP address.

ipvsadm -A -t 10.0.0.5:25 -s lblc -p
ipvsadm -a -t 10.0.0.5:25 -r 10.0.1.5 -m

The above two commands create an IPVS configuration that listens on port 25 of IP address 10.0.0.5 and then masquerades connections to 10.0.1.5 on port 25 (the default is to use the same port).

Now the problem is in getting the packets to return via the IPVS server. If the IPVS server happens to be your default gateway then it’s not a problem and it will already be working after the above two commands (if a service is listening on 10.0.1.5 port 25).

If the IPVS server is not the default gateway and you have only one IP address on the back-end server then this will require using netfilter to mark the packets and then route based on the packet matching. Marking via netfilter also seems to be the only well documented way of doing similar things. I spent some time working on this and didn’t get it working. However having multiple IP addresses per server is a recommended practice anyway (a back-end interface for communication between servers as well as a front-end interface for public data).

ip rule add from 10.0.1.5 table 1
ip route add default via 10.0.0.1 table 1

I use the above two commands to set up a new routing table for the data for the virtual service. The first line causes any packets from 10.0.1.5 to be sent to routing table 1 (I currently have a rough plan to have table numbers match ethernet device numbers, the data in question is going out device eth1). The second line adds a default router to table 1 which sends all packets to 10.0.0.1 (the private IP address of the IPVS server).

Then it SHOULD all be working, but in the network that I’m using (RHEL4 DomU and RHEL5 Dom0 and IPVS) it doesn’t. For some reason the data packets from the DomU are not seen as part of the same TCP stream (both in Net Filter connection tracking and by the TCP code in the kernel). So I get an established connection (3 way handshake completed) but no data transfer. The server sends the SMTP greeting repeatedly but nothing is received. At this stage I’m not sure whether there is something missing in my configuration or whether there’s a bug in IPVS. I would be happy to send tcpdump output to anyone who wants to try and figure it out.

My next attempt at this was via routing. I removed the “-m” option from the ipvsadm command and added the service IP address to the back-end with the command “ifconfig lo:0 10.0.0.5 netmask 255.255.255.255” and configured the mail server to bind to port 25 on address 10.0.0.5. Success at last!

Now I just have to get Piranha working to remove back-end servers from the list when they fail.

Update: It’s quite important that when adding a single IP address to device lo:0 you use a netmask of 255.255.255.255. If you use the same netmask as the front-end device (which would seem like a reasonable thing to do) then (with RHEL4 kernels at least) you get proxy ARPs by default. For example you used netmask 255.255.255.0 to add address 10.0.0.5 to device lo:0 then on device eth0 the machine will start answering ARP requests for 10.0.0.6 etc. Havoc then ensues.

Time Zones and Remote Servers

It’s widely regarded that the best practice is to set the time zone of a server to UTC if people are going to be doing sys-admin work from various countries. I’m currently running some RHEL4 servers that are set to Los Angeles time. So I have to convert the time from Melbourne time to UTC and then from UTC to LA time when tracking down log entries. This isn’t really difficult for recent times (within the last few minutes) as my KDE clock applet allows me to select various time zones to display on a pop-up. For other times I can use the GNU date command to convert from other time zones to the local zone of the machine, for example the command date -d "2008-08-06 10:57 +1000" will display the current Melbourne time (which is in the +1000 time zone) converted to the local time zone. But it is still painful.

In RHEL5, CentOS 5, and apparently all versions of Fedora newer Fedora Core 4 (including Fedora Core 4 updates) the command system-config-date allows you to select Etc/GMT as the time zone to get GMT. For reference selecting London is not a good option, particularly at the moment as it’s apparently daylight savings time there.

For RHEL4 and CentOS 4 the solution is to edit /etc/sysconfig/clock and change the first line to ZONE="Etc/GMT" (the quotes are important), and then run the command ln -sf /usr/share/zoneinfo/Etc/GMT /etc/localtime. Thanks to the Red Hat support guy who found the solution to this, it took a while but it worked in the end! Hopefully this blog post will allow others to fix this without needing to call Red Hat.

In Debian the command tzconfig allows you to select 12 (none of the above) and then GMT or UTC to set the zone. This works in Etch, I’m not sure about earlier versions (tzconfig always worked but I never tried setting UTC). In Lenny the tzconfig command seems to have disappeared, now to configure the time zone you use the command dpkg-reconfigure tzdata which has an option og Etc at the end of the list.

Updated to describe how to do this in Lenny, thanks for the comments.