|
|
Michael Janke is writing a series of posts about estimating availability of systems, here is a link to the introduction [1]. He covers lots of things that people often miss (such as cooling). If you aren’t about to implement a system for reliability then it’s an interesting read. If you are about to implement a system where reliability is required and you have control of the system (not paying someone else to run it and hope for the best) then it’s an essential read. It will probably also be good to give this URL to managers who make decisions about such things.
Interesting summary of the connections between the Iraq war and the oil industry in the Reid Report [2]. The suggestion made by one of the sources she cites is that the intention of the war was to reduce the supply of Iraqi oil to increase prices. Sam Varghese has written an essay about this which summarises where the Iraqi oil goes [3]. It seems that half of Iraq’s oil goes to US military use, the other half is used domestically, and some oil is imported as well! So because of the US occupation the country with the second largest known oil reserves is importing petroleum products! If the US military was to cease operations world-wide then the oil price would drop significantly, this doesn’t just mean the occupation of Iraq and the various actions in South America, but also the bases in Germany and Japan.
Interesting paper by Alexander Sotirov and Mark Dowd about Bypassing Browser Memory Protection in Windows [4]. This paper is good for people who are interested in computer security but don’t generally use Windows (such as me), if you want to learn about the latest things happening in Windows land then this is a good place to start.
A well researched article by Rick Moen about the unintended effects of anti-gay-marriage laws [5]. Maybe some of the “conservatives” who advocate such laws should get themselves and their spouses tested. It would be amusing if someone like Rush Limbaugh turned out to be involved in a “gay marriage”.
What Sysadmins should know about exposure to hazardous materials [6]. High-level overview of the issues, probably a good start for some google searches to get the details.
Diamond John McCain is an interesting blog about the 73 year old (who was born in Panama) candidate in the US presidential election [7].
Update: Corrected my statement about Iraq’s oil reserves based on a comment by Sam.
I just read an interesting post about latency and how it affects web sites [1]. The post has some good ideas but unfortunately mixed information on some esoteric technologies such as infiniband that are not generally applicable with material that is of wide use (such as ping times).
The post starts by describing the latency requirements of Amazon and stock broking companies. It’s obvious that stock brokers have a great desire to reduce latency, it’s also not surprising that Google and Amazon analyse the statistics of their operations and make changes to increase their results by a few percent. But it seems to be a widely held belief that personal web sites are exempt from such requirements. The purpose of creating content on a web site is to have people read it, if you can get an increase in traffic of a few percent by having a faster site and if those readers refer others then it seems likely to have the potential to significantly improve the result. Note that an increase in readership through a better experience is likely to be exponential, and an exponential increase of a few percent a year will eventually add up (an increase of 4% a year will double the traffic in 18 years).
I have been considering hosting my blog somewhere else for a while. My blog is currently doing about 3G of traffic a month which averages out to just over 1KB/s, peaks will of course be a lot greater than that and the 512Kb/s of the Internet connection would probably be a limit even if it wasn’t for the other sites onn the same link. The link in question is being used for serving about 8G of web data per month and there is some mail server use which also takes bandwidth. So performance is often unpleasantly slow.
For a small site such as mine the most relevant issues seem to be based around available bandwidth, swap space use (or the lack therof), disk IO (for when things don’t fit in cache) and available CPU power exceeding the requirements.
For hosting in Australia (as I do right now) bandwidth is a problem. Internet connectivity is not cheap in any way and bandwidth is always limited. Also the latency of connections from Australia to other parts of the world often is not as good as desired (especially if using cheap hosting as I currently do).
According to Webalizer only 3.14% of the people who access my blog are from Australia, they will get better access to my site if hosted in Australia, and maybe the 0.15% of people who access my blog from New Zealand will also benefit from the locality of sites hosted in Australia. But the 37% of readers who are described as “US Commercial” (presumably .com) and the 6% described as “United States” (presumably .us) will benefit from US hosting, as will most of the 30% who are described as “Network” (.net I guess).
For getting good network bandwidth it seems that the best option is to choose what seems to be the best ISP in the US that I can afford, where determining what is “best” is largely based on rumour.
One of the comments on my post about virtual servers and swap space [2] suggested just not using swap and referenced the Amazon EC2 (Elastic Computing) cloud service and the Gandi.net hosting (which is in limited beta and not generally available).
The Amazon EC2 clound service [3] has a minimum offering of 1.7G of RAM, 1EC2 Compute Unit (equivalent to a 1.0-1.2GHz 2007 Opteron or 2007 Xeon processor), 160G of “instance storage” (local disk for an instance) running 32bit software. Currently my server is using 12% of a Celeron 2.4GHz CPU on average (which includes a mail server with lots of anti-spam measures, Venus, and other things). Running just the web sites on 1EC2 Compute Unit should use significantly less than 25% of a 1.0GHz Opteron. I’m currently using 400M of RAM for my DomU (although the MySQL server is in a different DomU). 1.7G of RAM for my web sites is heaps even when including a MySQL server. Currently a MySQL dump of my blog is just under 10M of data, with 1.7G of RAM the database should stay entirely in RAM which will avoid the disk IO issues. I could probably use about 1/3 of that much RAM and still not swap.
The cost of EC2 is $US0.10 per hour of uptime (for a small server), so that’s $US74.40 per month. The cost for data transfer is 17 cents a GIG for sending and 10 cents a gig for receiving (bulk discounts are available for multiple terabytes per month).
I am not going to pay $74 per month to host my blog. But sharing that cost with other people might be a viable option. An EC2 instance provides up to 5 “Elastic IP addresses” (public addresses that can be mapped to instances) which are free when they are being used (there is a cost of one cent per hour for unused addresses – not a problem for me as I want 24*7 uptime). So it should be relatively easy to divide the costs of an EC2 instance among five people by accounting for data transfer per IP address. Hosting five web sites that use the same software (MySQL and Apache for example) should reduce memory use and allow more effective caching. A small server on EC2 costs about five times more than one of the cheap DomU systems that I have previously investigated [4] but provides ten times the RAM.
While the RAM is impressive, I have to wonder about CPU scheduling and disk IO performance. I guess I can avoid disk IO on the critical paths by relying on caching and not doing synchronous writes to log files. That just leaves CPU scheduling as a potential area where it could fall down.
Here is an interesting post describing how to use EC2 [5].
Another thing to consider is changing blog software. I currently use WordPress which is more CPU intensive than some other options (due to being written in PHP), is slightly memory hungry (PHP and MySQL), and doesn’t have the best security history. It seems that an ideal blog design would use a language such as Java or PHP for comments and use static pages for the main article (with the comments in a frame or loaded by JavaScript). Then the main article would load quickly and comments (which probably aren’t read by most users) would get loaded later.
In the mid 90’s I was part-owner of a small ISP. We had given out Trumpet Winsock [1] to a large number of customers and couldn’t convert them to anything else. Unfortunately a new release of the Linux kernel (from memory I think it was 2.0) happened to not work with Trumpet Winsock. Not wanting to stick to the old kernel I decided to install a Linux machine running a 1.2.x kernel for the sole purpose of proxying connections for the Winsock users. I had a 386 machine with 8M of RAM that was suitable for the purpose.
At that time hard disks were moderately expensive, and the servers were stored in a hot place which tended to make drives die more rapidly than they might otherwise. So I didn’t want to use a hard disk for that purpose.
I configured the machine to boot from a floppy disk (CD-ROM drives also weren’t cheap then) and use an NFS root filesystem. The problem was that it needed slightly more than 8M of RAM and swapping to NFS was not supported. My solution was to mount the floppy disk read-write and use a swap file on the floppy. The performance difference between floppy disks and hard disks was probably about a factor of 10 or 20 – but they were both glacially slow when compared to main memory. After running for about half an hour the machine achieved a state where about 400K of unused data was paged out and the floppy drive would then hardly ever be used.
I had initially expected that the floppy disk would get a lot of use and wear out, I had prepared a few spare disks so that they could be swapped in case of read errors. But in about a year of service I don’t recall having a bad sector on a floppy (I replaced the floppy whenever I upgraded the kernel or rebooted for any other reason as a routine precaution).
Does anyone have an anecdote to beat that?
The Problem:
A problem with virtual machines is the fact that one rogue DomU can destroy the performance of all the others by inappropriate resource use. CPU scheduling is designed to allow reasonable sharing of computational resources, it is unfortunately not well documented, the XenSource wiki currently doesn’t document the “credit” scheduler which is used in Debian/Etch and CentOS 5 [1]. One interesting fact is that CPU scheduling in Xen can have a significant effect on IO performance as demonstrated in the paper by Ludmila Cherkasova, Diwaker Gupta and Amin Vahdat [2]. But they only showed a factor of two performance difference (which while bad is not THAT bad).
A more significant problem is managing virtual memory, when there is excessive paging performance can drop by a factor of 100 and even the most basic tasks become impossible.
The design of Xen is that every DomU is allocated some physical RAM and has it’s own swap space. I have previously written about my experiments to optimise swap usage on Xen systems by using a tmpfs in the Dom0 [3]. The aim was to have every Xen DomU swap data out to a tmpfs so that if one DomU was paging heavily and the other DomUs were not paging then the paging might take place in the Dom0’s RAM and not hit disk. The experiments were not particularly successful but I would be interested in seeing further research in this area as there might be some potential to do some good.
I have previously written about the issues related to swap space sizing on Linux [4]. My conclusion is that following the “twice RAM” myth will lead to systems becoming unusable due to excessive swapping in situations where they might otherwise be usable if the kernel killed some processes instead (naturally there are exceptions to my general rule due to different application memory use patterns – but I think that my general rule is a lot better than the “twice RAM” one).
One thing that I didn’t consider at the time is the implications of this problem for Xen servers. If you have 10 physical machines and one starts paging excessively then you have one machine to reboot. If you have 10 Xen DomUs on a single host and one starts paging heavily then you end up with one machine that is unusable due to thrashing and nine machines that deliver very poor disk read performance – which might make them unusable too. Read performance can particularly suffer in a situation when one process or VM is writing heavily to disk due to the way that the disk queuing works, it’s not uncommon for an application to read dozens or hundreds of blocks from disk to satisfy a single trivial request from a user, if each of these block read requests has to wait for a large amount of data to be written out from the write-back cache then performance will suck badly (I have seen this in experiments on single disks and on Linux software RAID – but have not had the opportunity to do good tests on a hardware RAID array).
Currently for Xen DomUs I am allocating swap spaces no larger than 512M, as anything larger than that is likely to cause excessive performance loss to the rest of the server if it is actually used.
A Solution for Similar Problems:
A well known optimisation technique of desktop systems is to use a separate disk for swap, in desktop machines people often use the old disk as swap after buying a new larger disk for main storage. The benefit of this is that swap use will not interfere with other disk use, for example the disk reads needed to run the “ps” and “kill” programs won’t be blocked by the memory hog that you want to kill. I believe that similar techniques can be applied to Xen servers and give even greater benefits. When a desktop machine starts paging excessively the user will eventually take a coffee break and let the machine recover, but when an Internet server such as a web server starts paging excessively the requests keep coming in and the number of active processes increases so it seems likely that using a different device for the swap will allow some processes to satisfy requests by reading data from disk while some other processes are waiting to be paged in.
Applying it to Xen Servers:
The first thing that needs to be considered for such a design is the importance of reliable swap. When it comes to low-end servers there is ongoing discussion about the relative merits of RAID-0 and RAID-1 for swap. The benefit of RAID-0 is performance (at least in perception – I can imagine some OS swapping algorithms that could potentially give better performance on RAID-1 and I am not aware of any research in this area). The benefit of RAID-1 is reliability. Now there are two issues in regard to reliability, one is continuity of service (EG being able to hot-swap a failed disk while the server is running), and the other is the absence of data loss. For some systems it may be acceptable to have a process SEGV (which I presume is the result if a page-in request fails) due to a dead disk (reserving the data loss protection of RAID for files). One issue related to this is the ability to regain control of a server after a problem. For example if the host OS of a machine had non-RAID swap then a disk failure could prevent a page-in of data related to sshd or some similar process and thus make it impossible to recover the machine without hardware access. But if the swap for a virtual machine was on a non-RAID disk and the host had RAID for it’s swap then the sysadmin could login to the host and reboot the DomU after creating a new swap space on a working disk.
Now if you have a server with 8 or 12 disks (both of which seem to be reasonably common capacities of modern 2RU servers) and if you decide that RAID is not required for the swap space of DomUs then it would be possible to assign single disks for swap spaces for groups of virtual machines. So if one client had several virtual machines they could have them share the same single disk for the swap, so a thrashing server would only affect the performance of other VMs from the same client. One possible configuration would be a 12 disk server that has a four disk RAID-5 array for main storage and 8 single disks for swap. 8 CPU cores is common for a modern 2RU server, so it would be possible to lock 8 groups of DomUs so that they share CPUs and swap spaces. Another possibility would be to have four groups of DomUs where each group had a RAID-1 array for swap and two CPU cores.
I am not sure of the aggregate performance impact of such a configuration, I suspect that a group of single disks would give better performance for swap than a single RAID array and that RAID-1 would outperform RAID-5. For a single DomU it seems most likely that using part of a large RAID array for swap space would give better performance. But the benefit in partitioning the server seems clear. An architecture where each DomU had it’s own dedicated disk for a swap space is something that I would consider a significant benefit if renting a Xen DomU. I would rather have the risk of down-time (which should be short with hot-swap disks and hardware monitoring) in the rare case of a disk failure than have bad performance regularly in the common situation of someone else’s DomU being overloaded.
Failing that, having a separate RAID array for swap would be a significant benefit. If every process that isn’t being paged out could deliver full performance while one DomU was thrashing then it would be a significant benefit over the situation where any DomU can thrash and kill the file access performance of all other DomUs. A single RAID-1 array should handle all the swap space requirements for a small or medium size Xen server
One thing that I have not tested is the operation of LVM when one disk goes bad. In the case of a disk with bad sectors it’s possible to move the LVs that are not affected to other disks and to remove the LV that was affected and re-create it after removing the bad disk. The case of a disk that is totally dead (IE the PV header can’t be read or written) might cause some additional complications.
Update Nov 2012: This post was discussed on the Linode forum:
Comments include “The whole etbe blog is pretty interesting” and “Russell Coker is a long-time Debian maintainer and all-round smart guy” [5]. Thanks for that!
In a comment on my AppArmor is dead post [1] someone complained that SE Linux is not “Unixish“.
The security model in Unix is almost exclusively Discretionary Access Control (DAC) [2]. This means that any process that owns a resource can grant access to the resource to other processes without restriction. For example a user can run “chmod 777 ~” and grant every other user on the system the ability to access their files (and take over their account by modifying ~/.login and similar files). I say that it’s almost exclusively DAC because there are some things that a user can not give away, for example they can not permit a program running under a different non-root UID to ptrace their processes. But for file and directory access it’s entirely discretionary.
SE Linux is based around the concept of Mandatory Access Control (MAC) [3]. This means that the system security policy (as defined by the people who developed the distribution and the local sysadmin) can not be overridden by the user. When a daemon is prevented from accessing files in a user’s home directory by the SE Linux policy and the user is not running in the unconfined_t domain there is no possibility of them granting access.
SE Linux has separate measures for protecting integrity and confidentiality. An option is to use MultiLevel Security (MLS) [4], but a more user-friendly option is MCS (Multi-Category Security).
The design of SE Linux is based on the concept of having as much of the security policy as possible being loaded at boot time. The design of the Unix permissions model was based on the concept of using the minimal amount of memory at a time when 1M of RAM was a big machine. An access control policy is comprised of two parts, file labels (which is UID, GID, permissions, and maybe ACLs for Unix access controls and a “security context” for SE Linux) and a policy which determines how those file labels are used. The policy in the Unix system is compiled into the kernel and is essentially impossible to change. The SE Linux policy is loaded at boot time, and the most extreme changes to the policy will at most require a reboot.
The policy language used for SE Linux is based on the concept of deny by default (everything that is not specifically permitted is denied) and access controls apply to all operations. The Unix access control is mostly permissive and many operations (such as seeing more privileged processes in the output of “ps”) can not be denied on a standard Unix system.
So it seems that in many ways SE Linux is not “Unixish”, and it seems to me that any system which makes a Unix system reasonably secure could also be considered to be “not Unixish”. Unix just wasn’t designed for security, not that it is bad by the standards of modern server and desktop OSs.
Of course many of the compromises in the design of Unix (such as having all login sessions recorded in a single /var/run/utmp file and having all user accounts stored in a single /etc/passwd file) impact SE Linux systems. But some of them can be worked around, and others will be fixed eventually.
The Linux kernel has a number of code sections which look at the apparent size of the machine and determine what would be the best size for buffers. For physical hardware this makes sense as the hardware doesn’t change at runtime. There are many situations where performance can be improved by using more memory for buffers, enabling large buffers for those situations when the machine has a lot of memory makes it convenient for the sysadmin.
Virtual machines change things as the memory available to the kernel may change at run-time. For Xen the most common case is the Dom0 automatically shrinking when memory is taken by a DomU – but it also supports removing memory from a DomU via the xm mem-set command (the use of xm mem-set seems very rare).
Now a server that is purchased for the purpose of running Xen will have a moderate amount of RAM. In recent times the smallest machine I’ve seen purchased for running Xen had 4G of RAM – and it has spare DIMM slots for another 4G if necessary. While a non-virtual server with 8G of RAM would be an unusually powerful machine dedicated for some demanding application, a Xen server with 8G or 16G of RAM is not excessively big, it’s merely got space for more DomU’s. For example one of my Xen servers has 8 CPU cores, 8G of RAM, and 14 DomUs. Each DomU has on average just over half a gig of RAM and half of a CPU core – not particularly big.
In a default configuration the Dom0 will start by using all the RAM in the machine, which in this case meant that the buffer sizes were appropriate for a machine with 8G of RAM. Then as DomUs are started memory is removed from the Dom0 and these buffers become a problem. This ended up forcing a reboot of the machine by preventing Xen virtual network access to most of the DomUs. I was seeing many messages in the Dom0 kernel message log such as “xen_net: Memory squeeze in netback driver” and most DomUs were inaccessible from the Internet (I didn’t verify that all DomUs were partially or fully unavailable or test the back-end network as I was in a hurry to shut it down and reboot before too many customers complained).
The solution to this is to have the Dom0 start by using a small amount of RAM. To do this I edited the GRUB configuration file and put “dom0_mem=256000” at the end of the Xen kernel line (that is the line starting with “kernel /xen.gz“). This gives the Dom0 kernel just under 256M of RAM from when it is first loaded and prevents allocation of bad buffer sizes, it’s the only solution to this network problem that a quick Google search (the kind you do when trying to fix a serious outage before your client notices (*)) could find.
One thing to note is that my belief that it’s kernel buffer sizes that are at the root cause of this problem is based on my knowledge of how some of the buffers are allocated plus an observation of the symptoms. I don’t have a test machine with anything near 8G of RAM so I really can’t do anything more to track this down.
There is another benefit to limiting the Dom0 memory, I have found that on smaller machines it’s impossible to reduce the Dom0 memory below a certain limit at run-time. In the past I’ve had problems in reducing the memory of a Dom0 below about 250M, while such a reduction is hardly desirable on a machine with 8G of RAM, when running an old P3 machine with 512M of RAM there are serious benefits to making Dom0 smaller than that. As a general rule I recommend having a limit on the memory of the Dom0 on all Xen servers. If you use the model of having no services running on the Dom0 there is no benefit in having much ram assigned to it.
(*) Hiding problems from a client is a bad idea and is not something I recommend. But being able to fix a problem and then tell the client that it’s already fixed is much better than having them call you when you don’t know how long the fix will take.
From the 13th to the 14th of August my Play Machine [1] was offline. There was a power failure for a few seconds and the machine didn’t boot correctly. As I had a lot of work to do I left it offline for a day before fixing it. The reason it didn’t boot was that due to an issue with the GRUB package it was trying to boot a non-Xen kernel with Xen, this would cause the Xen Dom0 load to abort and it would then reboot after 5 seconds – and automatically repeat the process. The problem is that update-grub in Lenny will generate boot entries for Xen kernels to boot without Xen and for non-Xen kernels to boot with Xen.
Two days ago someone launched a DOS attack on my Play Machine and I’ve only just put it back online. I’ve changed the ulimit settings a bit, that won’t make DOS attacks impossible, just force the attacker to use a little bit more effort.
For some time there have been two mainstream Mandatory Access Control (MAC) [1] systems for Linux. SE Linux [2] and AppArmor [3].
In late 2007 Novell laid off almost all the developers of AppArmor [4] with the aim of having the community do all the coding. Crispin Cowan (the founder and leader of the AppArmor project) was later hired by Microsoft, which probably killed the chances for ongoing community development [5]. Crispin has an MSDN blog, but with only one post so far (describing UAC) [6], hopefully he will start blogging more prolifically in future.
Now SUSE is including SE Linux support in OpenSUSE 11.1 [7]. They say that they will not ship policies and SE Linux specific tools such as “checkpolicy”, but instead they will be available from “repositories”. Maybe this is some strange SUSE thing, but for most Linux users when something is in a “repository” then it’s shipped as part of the distribution. The SUSE announcement also included the line “This is particularly important for organizations that have already standardized on SELinux, but could not even test-drive SUSE Linux Enterprise before without major work and changes“. The next step will be to make SE Linux the default and AppArmor the one that exists in a repository, and the step after that will be to remove AppArmor.
In a way it’s a pity that AppArmor is going away so quickly. The lack of competition is not good for the market, and homogenity isn’t good for security. But OTOH this means more resources will be available for SE Linux development which will be a good thing.
Update: I’ve written some more about this topic in a later post [8].
I’ve just read an amusing series of blog posts about bad wiring [1]. I’ve seen my share of wiring horror in the past. There are some easy ways of minimising wiring problems which seem to never get implemented.
The first thing to do is to have switches near computers. Having 48 port switches in a server room and wires going across the building causes mess and is difficult to manage. A desktop machine doesn’t need a dedicated Gig-E (or even 100baseT) connection to the network backbone. Cheap desktop switches installed on desks allow one cable to go to each group of desks (or two cables if you have separate networks for VOIP and data). If you have a large office area then a fast switch in the corner of the room connecting to desktop switches on the desks is a good way to reduce the cabling requirements. The only potential down-side is that some switches are noisy, the switches with big fans can be easily eliminated by a casual examination, but the ones that make whistling sounds from the PSU need to be tested first. The staff at your local electronics store should be very happy to open one item for inspection and plug it in if you are about to purchase a moderate number (they will usually do so even if you are buying a single item).
A common objection to this is the perceived lack of reliability of desktop switches. One mitigating factor is that if a spare switch is available the people who work in the area can replace a broken switch. Another is that my observation is that misconfiguration on big expensive switches causes significantly more down-time than hardware failures on cheap switches ever could. A cheap switch that needs to be power-cycled once a month will cause little interruption to work, while a big expensive switch (which can only be configured by the “network experts” – not regular sysadmins such as me) can easily cause an hour of down-time for most of an office during peak hours. Finally the reliability of the cables themselves is also an issue, having two cables running to the local switch in every office can allow an easy replacement to fix a problem – it can be done without involving the IT department (who just make sure that both cables are connected to the switch in the server room). If there is exactly one cable running to each PC from the server room and one of the cables fails then someone’s PC will be offline for a while.
In server rooms the typical size of a rack is 42RU (42 Rack Units). If using 1RU servers that means 42 Ethernet cables. A single switch can handle 48 Ethernet ports in a 1RU mount (for the more dense switches), others have 24 ports or less. So a single rack can handle 41 small servers and a switch with 48 ports (two ports to go to the upstream switch and five spare ports). If using 2RU servers a single rack could handle 20 servers and a 24port switch that has two connections to the upstream switch and two spare ports. Also it’s generally desirable to have at least two Ethernet connections to each server (public addresses and private addresses for connecting to databases and management). For 1RU servers you could have two 48 port switches and 40 servers in a rack. For 2RU servers you could have 20 servers and either two 24port switches or one 48port switch that supports VLANs (I prefer two switches – it’s more difficult to mess things up when there are two switches, if one switch fails you can login via the other switch to probe it, and it’s also cheaper). If the majority of Ethernet cables are terminated in the same rack it’s much harder for things to get messed up. Also it’s very important to leave some spare switch ports available as it’s a common occurrence for people to bring laptops into a server room to diagnose problems and you really don’t want them to unplug server A to diagnose a problem with server B…
Switches should go in the middle of the rack. While it may look nicer to have the switch at the top or the bottom, that means that the server which is above or below it will have the cables for all the other switches going past it. Ideally the cables would go in neat cable runs at the side of the rack but in my experience they usually end up just dangling in front. If the patch cables are reasonably short and they only dangle across half the servers things won’t get too ugly (this is harm minimisation in server room design).
The low end of network requirements is usually the home office. My approach to network design for my home office is quite different, I have no switches! I bought a bunch of dual-port Ethernet cards and now every machine that I own has at least two Ethernet ports (and some have as many as four). My main router and gateway has four ports which allows connections from all parts of my house. Then every desktop machine has at least two ports so that I can connect a laptop in any part of the house. This avoids the energy use of switches (I previously used a 24 port switch that drew 45W [2]), switches of course also make some noise and are an extra point of failure. While switches are more reliable than PCs, as I have to fix any PC that breaks anyway my overall network reliability is increased by not using switches.
For connecting the machines in my home I mostly use bridging (only the Internet gateway acts as a router), I have STP enabled on all machines that have any risk of having their ports cross connected but disable it on some desktop machines with two ports (so that I can plug my EeePC in and quickly start work for small tasks).
I had a problem where the email address circsales@washpost.com spammed a Request Tracker (RT) [1] installation (one of the rules for running a vaction program is that you never respond twice to the same address, another rule is that you never respond to automatically generated messages).
Deleting these tickets was not easy, the RT web interface only supports deleting 50 tickets at a time.
To delete them I first had to find the account ID in RT, the following query does that:
select id from Users where EmailAddress='circsales@washpost.com';
Then to mark the tickets as deleted I ran the following SQL command (where X was the ID):
update Tickets set Status='deleted' where Creator=X;
Finally to purge the deleted entries from the database (which was growing overly large) I used the RTx-Shredder [2] tool. RTx-Shredder doesn’t seem to support deleting tickets based on submitter, which is why I had to delete them first.
I am currently using the following command to purge the tickets. The “limit,500” directive tells rtx-shredder to remove 500 tickets at one time (the default is to only remove 10 tickets).
./sbin/rtx-shredder --force --plugin 'Tickets=status,deleted;limit,500'
There are currently over 34,000 deleted tickets to remove, and rtx-shredder is currently proceeding at a rate of 9 tickets per minute, so it seems that it will take almost three days of database activity to clear the tickets out.
I also need to purge some tickets that have been resolved for a long time, I’m running the following command to remove them:
./sbin/rtx-shredder --force --plugin 'Tickets=status,resolved;updated_before,2008-03-01 01:01:34;limit,500'
With both the rtx-shredder commands running at once I’m getting a rate of 15 tickets per minute, so it seems that the bottleneck is more related to rtx-shredder than MySQL (which is what I expected). Although with two copies running at once I have mysqld listed as taking about 190% of CPU (two CPUs running two capacity). The machine in question has two P4 CPUs with hyper-threading enabled, so maybe running two copies of rtx-shredder causes mysqld to become CPU bottlenecked. I’m not sure how to match up CPU use as reported via top to actual CPU power in a system with hyper-threading (the hyper-threaded virtual CPUs do not double the CPU power). I wonder if this means that the indexes on the RT tables are inadequate to the task.
I tried adding the following indexes (as suggested in the rtx-shredder documentation), but it didn’t seem to do any good – it might have improved performance by 10% but that could be due to sampling error.
CREATE INDEX SHREDDER_CGM1 ON CachedGroupMembers(MemberId, GroupId, Disabled);
CREATE INDEX SHREDDER_CGM2 ON CachedGroupMembers(ImmediateParentId, MemberId);
CREATE UNIQUE INDEX SHREDDER_GM1 ON GroupMembers(MemberId, GroupId);
CREATE INDEX SHREDDER_TXN1 ON Transactions(ReferenceType, OldReference);
CREATE INDEX SHREDDER_TXN2 ON Transactions(ReferenceType, NewReference);
CREATE INDEX SHREDDER_TXN3 ON Transactions(Type, OldValue);
CREATE INDEX SHREDDER_TXN4 ON Transactions(Type, NewValue);
|
|