|
One thing that concerns me about using any online service is the security. When that service is a virtual server running in another country the risks are greater than average.
I’m currently investigating the Amazon EC2 service for some clients, and naturally I’m concerned about the security. Firstly they appear to have implemented a good range of Xen based security mechanisms, their documentation is worth reading by anyone who plans to run a Xen server for multiple users [1]. I think it would be a good thing if other providers would follow their example in documenting the ways that they protect their customers.
Next they seem to have done a good job at securing the access to the service. You use public key encryption for all requests to the service and they generate the keypair. While later in this article I identify some areas that could be improved, I want to make it known that overall I think that EC2 is a good service and it seems generally better than average in every way. But it’s a high profile service which deserves a good deal of scrutiny and I’ve found some things that need to be improved.
The first problem is when it comes to downloading anything of importance (kernel modules for use in a machine image, utility programs for managing AMIs, etc). All downloads are done via http (not https) and the files are not signed in any way. This is an obvious risk that anyone who controls a router could compromise EC2 instances by causing people to download hostile versions of the tools. The solution to this is to use https for the downloads AND to use GPG to sign the files, https is the most user-friendly way of authenticating the files (although it could be argued that anyone who lacks the skill needed to use GPG will never run a secure server anyway) and GPG allows end to end encryption and would allow me to verify files that a client is using if the signature was downloaded at the same time.
More likely problems start when it comes to the machine images that they provide. They have images of Fedora Core 4, Fedora Core 6, and Fedora 8 available. Fedora releases are maintained until one month after the release of two subsequent versions [6], so Fedora 8 will end support one month after the release of Fedora 10 (which will be quite soon) and Fedora Core 6 and Fedora Core 4 have been out of support for a long time. I expect that someone who wanted to 0wn some servers that are well connected could get a list of exploits that work on FC4 or FC6 and try them out on machines running on EC2. While it is theoretically possible for Amazon staff to patch the FC4 images for all security holes that are discovered, it would be a lot of work, and it wouldn’t apply to all the repositories of FC4 software. So making FC4 usable as a secure base for an online service really isn’t a viable option.
Amazon’s page on “Tips for Securing Your EC2 Instance” [3] mostly covers setting up ssh, I wonder whether anyone who needs advice on setting up ssh can ever hope to run a secure server on the net. It does have some useful information on managing the EC2 firewall that will be of general interest.
One of the services that Amazon offers is to have “shared images” where any Amazon customer can share an image with the world. Amazon has a document about AMI security issues [4], but it seems to only be useful against clueless security mistakes by the person who creates an image not malice. If a hostile party creates a machine image you can expect that you won’t discover the problem by looking for open ports and checking for strange processes. The Amazon web page says “you should treat shared AMIs as you would any foreign code that you might consider deploying in your own data center and perform the appropriate due diligence“, the difference of course is that most foreign code that you might consider deploying comes from companies and is shipped in shrink-wrap packaging. I don’t count the high quality free software available in a typical Linux distribution in the same category as this “foreign code”.
While some companies have accidentally shipped viruses on installation media in the past it has been quite rare. But I expect hostile AMIs on EC2 to be considerably more common. Amazon recommends that people know the source of the AMIs that they use. Of course there is a simple way of encouraging this, Amazon could refrain from providing a global directory of AMIs without descriptions (the output of “ec2dim -x all“) and instead force customers to subscribe to channels containing AMIs that have not been approved by Amazon staff (the images that purport to be from Oracle and Red Hat could easily have their sources verified and be listed as trusted images if they are what they appear to be).
There seems to be no way of properly tracking the identity of the person who created a machine image within the Amazon service. The ec2dim command only gives an ID number for the creator (and there seems to be no API tool to get information on a user based on their ID). The web interface gives a name and an Amazon account name.
The next issue is that of the kernel. Amazon notes that they include “vmsplice root exploit patch” in the 2.6.18 kernel image that they supply [2], however there have been a number of other Linux kernel security problems found since then and plenty of security issues for 2.6.18 were patched before the vmsplice bug was discovered – were they patched as well? The file date stamp on the kernel image and modules files of 20th Feb 2008 indicates that there are a few kernel security issues which are not patched in the Amazon kernel.
To fix this the obvious solution is to use a modern distribution image. Of course without knowing what other patches they include (they mention a patch for better network performance) this is going to be difficult. It seems that we need some distribution packages of kernels designed for EC2, they would incorporate the Amazon patches and the Amazon configuration as well as all the latest security updates. I’ve started looking at the Amazon EC2 kernel image to see what I should incorporate from it to make a Debian kernel image. It would be good if we could get such a package included in an update to Debian/Lenny. Also Red Hat is partnering with Amazon to offer RHEL on EC2 [5], I’m sure that they provide good kernels as part of that service – but as the costs for RHEL on EC2 more than double the cost of the cheapest EC2 instance I expect that only the customers that need the the larger instances all the time will use it. The source for the RHEL kernels will of course be good for CentOS (binaries produced from such sources may be in CentOS already, I haven’t checked).
This is not an exhaustive list of security issues related to EC2, I may end up writing a series of posts about this.
Update: Jef Spaleta has written an interesting post that references this one [7]. He is a bit harsher than I am, but his points are all well supported by the evidence.
I’ve updated my package of the Amazon EC2 API Tools for Debian [1]. Now it uses the Sun JDK. Kaffe doesn’t work due to not supporting annotations, I haven’t filed a bug because Kaffe is known to be incomplete.
OpenJDK doesn’t work – apparently because it doesn’t include trusted root certificates (see Debian bug #501643) [2].
GCJ doesn’t work, not sure why so I filed Debian bug #501743 [3].
I don’t think that Java is an ideal language choice for utility programs. It seems that Perl might be a better option as it’s supported everywhere and has always been free (the Sun JVM has only just started to become free). The lack of freeness of Java results in a lower quality, and in this case several hours of my time wasted.
A couple of days ago I attended a lecture about the Drizzle database server [1].
Drizzle is a re-write of MySQL for use in large web applications. It is only going to run on 64bit platforms because apparently everyone uses 64bit servers – except of course people who are Amazon EC2 customers as the $0.10 per hour instances in EC2 are all 32bit. It’s also designed to use large amounts of RAM for more aggressive caching, it’s being optimised for large RAM at the expense of performance on systems with small amounts of RAM. This is OK if you buy one (or many) new servers to dedicate to the task of running a database. But if a database is one of many tasks running on the machine, or if the machine is a Xen instance then this isn’t going to be good.
There are currently no plans to support replication between MySQL and Drizzle databases (although it would not be impossible to write the support).
The good news is that regular MySQL development will apparently continue in the same manner as before, so people who have small systems, run on Xen instances, or use EC2 can keep using that. Drizzle seems just aimed at people who want to run very large sharded databases.
Now I just wish that they would introduce checksums on all data transfers and stores into MySQL. I consider that to be a really significant feature of Drizzle.
I just watched an interesting TED.com talk about video games [1]. The talk focussed to a large degree on emotional involvement in games, so it seems likely that there will be many more virtual girlfriend services [2] (I’m not sure that “game” is the correct term for such things) in the future. The only reference I could find to a virtual boyfriend was a deleted Wikipedia page for V-Boy, but I expect that they will be developed soon enough. I wonder if such a service could be used by astronauts on long missions. An advantage of a virtual SO would be that there is no need to have a partner who is qualified, and if a human couple got divorced on the way to Mars then it could be a really long journey for everyone on the mission.
VR training has been used for a long time in the airline industry (if you have never visited an airline company and sat in a VR trainer for a heavy passenger jet then I strongly recommend that you do it). It seems that there are many other possible uses for this. The current prison system is widely regarded as a training ground for criminals, people who are sent to prison for minor crimes come out as hardened criminals. I wonder if a virtual environment for prisoners could do some good. Instead of prisoners having to deal with other prisoners they could deal with virtual characters who encourage normal social relationships, prisoners who didn’t want to meet other prisoners could be given the option of spending their entire sentence in “solitary confinement” with virtual characters, multi-player games, and Internet access if they behave well. Game systems such as the Nintendo Wii [3] would result in prisoners getting adequate exercise, so after being released from a VR prison it seems likely that the ex-con would be fitter, healthier, and better able to fit into normal society than most parolees. Finally it seems likely that someone who gets used to spending most all their spare time playing computer games will be less likely to commit crimes.
It seems to me that the potential for the use of virtual environments in schools is very similar to that of prisons, for similar reasons.
Update: Currently Google Adsense is showing picture adverts on this page for the “Shaiya” game, the pictures are of a female character wearing a bikini with the caption “your goddess awaits”. This might be evidence to support my point about virtual girlfriends.
A recent news item is the “hacking” of the Yahoo mailbox used by Sarah Palin (the Republican VP candidate) [1]. It seems most likely that it was a simple social-engineering attack on the password reset process of Yahoo (although we are unlikely to learn the details unless the case comes to trial). The email address in question had been used for some time to avoid government data-retention legislation but had only been “hacked” after she was listed as the VP candidate. The reason of course is that most people don’t care much about who is the governor of one of the least populous US states.
Remote attack on a mailbox (which is what we presume happened) is only one possible problem. Another of course is that of the integrity of the staff at the ISP. While I know nothing about what happens inside Yahoo, I have observed instances of unethical actions by employees at some ISPs where I have previously worked, I have no doubt that such people would have read the email of a VP candidate without any thought if they had sufficient access to do so. If an ISP stores unencrypted passwords then the way things usually work is that the helpdesk people are granted read access to the password data so that they can login to customer accounts to reproduce problems – this is a benefit for the customers in terms of convenience. But that also means that they can read the email of any customer at any time. I believe that my account on Gmail (the only webmail service I use) is relatively safe. I’m sure that there are a huge number of people who are more important than me who use Gmail. But if I was ever considered to have a reasonable chance of becoming Prime Minister then I would avoid using a Gmail account as a precaution.
There is a rumoured Chinese proverb and curse in three parts:
May you live in interesting times
May you come to the attention of those in authority
May you find what you are looking for [2]
In terms of your email, everyone who has root access to the machine which stores it (which includes employees of the companies that provide warranty service to all the hardware in the server room) and every help-desk person who can login to your account to diagnose problems is in a position of authority. Being merely one of thousands of customers (or millions of customers for a larger service) is a measure of safety.
As for the “interesting times” issue, the Republican party is trying to keep the issue focussed on the wars instead of on the economy. The problem with basing a campaign on wars is that many people will come to the conclusion that the election is not about people merely losing some money, but people dying. This could be sufficient to convince people that the right thing to do is not to abide by the usual standards for ethical behavior when dealing with private data, but to instead try and find something that can be used to affect the result of an election.
Mail that is not encrypted (most mail isn’t) and which is not transferred with TLS (so few mail servers support TLS that it hardly seems worth the effort of implementing it) can be intercepted at many locations (the sending system and routers are two options). But the receiving system is the easiest location. A big advantage for a hostile party in getting mail from the receiving system is that it can be polled quickly (an external attacker could use an open wireless access-point and move on long before anyone could catch them) and that if it is polled from inside the company that runs the mail server there is almost never any useful audit trail (if a sysadmin logs in to a server 10 times a day for real work reasons, and 11th login to copy some files will not be noticed).
One of the problems with leaks of secret data is that it is often impossible to know whether they have happened. While there is public evidence of one attack on Sarah Palin’s Yahoo account, there is no evidence that it was the first attack. If someone had obtained the password (through an insider in Yahoo, or through a compromised client machine) then they could have been copying all the mail for months without being noticed.
It seems to me that your choice of ISP needs to be partly determined by how many hostile parties will want to access your mail and what resources they may be prepared to devote to it. For a significant political candidate using a government email address seems like the best option, with the alternative being to use a server owned and run by the political party in question, if you can have the staff fired for leaking your mail then your email will be a lot safer!
One ongoing problem with TCP networking is the combination of RPC services and port based services on the same host. If you have an RPC service that uses a port less than 1024 then typically it will start at 1023 and try lower ports until it finds one that works. A problem that I have had in the past is that an RPC service used port 631 and I then couldn’t start CUPS (which uses that port). A similar problem can arise in a more insidious manner if you have strange networking devices such as a BMC [1] which uses the same IP address as the host and just snarfs connections for itself (as documented by pantz.org [2]), this means that according to the OS the port in question is not in use, but connections to that port will go to the hardware BMC and the OS won’t see them.
Another solution is to give a SE Linux security context to the port which prevents the RPC service from binding to it. RPC applications seem to be happy to make as many bind attempts as necessary to get an available port (thousands of attempts if necessary) so reserving a few ports is not going to cause any problems. As far as I recall my problems with CUPS and RPC services was a motivating factor in some of my early work on writing SE Linux policy to restrict port access.
Of course the best thing to do is to assign IP addresses for IPMI that are different from the OS IP addresses. This is easy to do and merely requires an extra IP address for each port. As a typical server will have two Ethernet ports on the baseboard (one for the front-end network and one for the private network) that means an extra two IP addresses (you want to use both interfaces for redundancy in case the problem which cripples a server is related to one of the Ethernet ports). But for people who don’t have spare IP addresses, SE Linux port labeling could really help.
The first thing you need to do to get started using the Amazon Elastic Compute Cloud (EC2) [1] is to install the tools to manage the service. The service is run in a client-server manner. You install the client software on your PC to manage the EC2 services that you use.
There are the AMI tools to manage the machine images [2] and the API tools to launch and manage instances [3].
The AMI tools come as both a ZIP file and an RPM package and contain Ruby code, while the API tools are written in Java and only come as a ZIP file.
There are no clear license documents that I have seen for any of the software in question, I recall seeing one mention on one of the many confusing web pages of the code being “proprietary” but nothing else. While it seems most likely (but far from certain) that Amazon owns the copyright to the code in question, there is no information on how the software may be used – apart from an implied license that if you are a paying EC2 customer then you can use the tools (as there is no other way to use EC2). If anyone can find a proper license agreement for this software then please let me know.
To get software working in the most desirable manner it needs to be packaged for the distribution on which it is going to be used, as I prefer to use Debian that means packaging it for Debian. Also when packaging the software you can fix some of the silly things that get included in software that is designed for non-packaged release (such as demanding that environment variables be set to specify where the software is installed). So I have built packages for Debian/Lenny for the benefit of myself and some friends and colleagues who use Debian and EC2.
As I can’t be sure of what Amazon would permit me to do with their code I have to assume that they don’t want me to publish Debian packages for the benefit of all Debian and Ubuntu users who are (or might become) EC2 customers. So instead I have published the .diff.gz files from my Debian/Lenny packages [4] to allow other people to build identical packages after downloading the source from Amazon. At the moment the packages are a little rough, and as I haven’t actually got an EC2 service running with them yet they may have some really bad bugs. But getting the software to basically work took more time than expected. So even if there happen to be some bugs that make it unusable in it’s current state (the code for determining where it looks for PEM files at best needs a feature enhancement and at worst may be broken at the moment) then it would still save people some time to use my packages and fix whatever needs fixing.
Currently we have a problem with the Debian list server and Gmail. Gmail signs all mail that it sends with both DKIM and DomainKeys (DomainKeys has been obsoleted by DKIM so most mail servers implement only one of the two standards although apart from space there is no reason not to use both). The Debian list servers change the message body without removing the signatures, and therefore send out mail with invalid signatures.
DKIM has an option to specify the length of the body part that it signs. If that option is used then an intermediate system can append data to the body without breaking the signature. This could be bad if a hostile party could intercept messages and append something damaging, but has the advantage that mailing list footers will not affect the signature. Of course if the list server modifies the Subject to include the list name in brackets at the start of the subject line then it will still break the signature. However Gmail is configured to not use the length field, and a Gmail user has no option to change this (AFAIK – if anyone knows how to make Gmail use the DKIM length field for their own account then please let me know).
I believe that the ideal functionality of a sending mail server would be to have the configuration of it’s DKIM milter allow specifying which addresses should have the length field used. For example I would like to have all mail sent to an address matching @lists\. have the length field used (as well as some other list servers that don’t match that naming scheme), and I would also like to be able to specify which recipient addresses should have no DKIM signatures (for example list servers that modify the subject line). I have filed Debian bug #500967 against the dkim-filter package requesting this feature [1].
For correct operation of a list server, the minimal functionality is implemented in the Mailman package in Lenny. That is to strip off DKIM and DomainKey signatures. The ideal functionality of a list server would be that for lists that are not configured to modify the Subject line it would leave a DKIM header that uses the length field and otherwise remove the DKIM header. I have filed Debian bug #500965 against lists.debian.org requesting that the configuration be changed to strip the headers in question in a similar manner [2] (the Debian list servers appear to use SmartList – I have not checked whether the latest version of SmartList does the right thing in this regard – if not it deserves another bug report).
I have also filed Debian bug report #500966 requesting that the list servers sign all outbound mail with DKIM [3]. I believe that protecting the integrity of the Debian mail infrastructure is important, preventing forged mail is a good thing, and that the small amount of CPU time needed for this is worth the effort.
Also the Debian project is in a position of leadership in the community. We should adopt new technologies that benefit society first to help encourage others to do the same and also to help find and fix bugs.
There is a lot of discussion and speculation about The Singularity. The term seems to be defined by Ray Kurzweil’s book “The Singularity Is Near” [1] which focuses on a near-future technological singularity defined by significant increases in medical science (life extension and methods to increase mental capacity) and an accelerating rate of scientific advance.
In popular culture the idea that there will only be one singularity seems to be well accepted, so the discussion is based on when it will happen. One of the definitions for a singularity is that it is a set of events that change society in significant ways such that predictions are impossible – based on the concept of the Gravitational Singularity (black hole) [2]. Science fiction abounds with stories about what happens after someone enters a black hole, so the concept of a singularity not being a single event (sic) is not unknown, but it seems to me that based on our knowledge of science no-one considers there to be a black hole with multiple singularities – not even when confusing the event horizon with the singularity.
If we consider a singularity to merely consist of a significant technological change (or set of changes) that change society in ways that could not have been predicted (not merely changes that were not predicted) then it seems that there have been several already, here are the ones that seem to be likely candidates:
0) The development of speech was a significant change for our species (and a significant change OF our species). Maybe we should consider that to be singularity 0 as hominids that can’t speak probably can’t be considered human.
1) The adoption of significant tool use and training children in making and using tools (as opposed to just letting them learn by observation) made a significant change to human society. I don’t think that with the knowledge available to bands of humans without tools it would have been possible to imagine that making stone axes and spears would enable them to dominate the environment and immediately become the top of the food chain. In fact as pre-tool hominids were generally not near the top of the food chain they probably would have had difficulty imagining being rulers of the world. I’m sure that it led to an immediate arms race too.
2) The development of agriculture was a significant change to society that seems to have greatly exceeded the expectations that anyone could have had at the time. I’m sure that people started farming as merely a way of ensuring that the next time they migrated to an area there was food available (just sowing seeds along traditional migration routes for a hunter-gatherer existence). They could not have expected that the result would be a significant increase in the ability to support children and a significant increase in the number of people who could be sustained by a given land area, massive population growth, new political structures to deal with greater population density, and then wiping out hunter-gatherer societies in surrounding regions. It seems likely to me that the mental processes needed to predict the actions of a domestic animal (in terms of making it a friend, worker, or docile source of food) differ from those needed to predict the actions of other humans (who’s mental processes are similar) and from those needed to predict the actions of prey that is being hunted (you only need to understand enough to kill it).
3) The invention of writing allowed the creation of larger empires through better administration. All manner of scientific and political development was permitted by writing.
4) The work of Louis Pasteur sparked a significant development in biology which led to much greater medical technology [3]. This permitted much greater population densities (both in cities and in armies) without the the limitation of significant disease problems. It seems that among other things the world-wars depended on developments in preventing disease which were linked to Louis’ work. Large populations densely congregated in urban areas permit larger universities and a better exchange of knowledge which permitted further significant developments in technology. It seems unlikely that a population suffering the health problems that were common in 1850 could have simultaneously supported large-scale industrial warfare and major research projects such as the Manhattan Project.
5) The latest significant change in society has been the development of the Internet and mobile phones. Mobile phones were fairly obvious in concept, but have made structural changes to society. For example I doubt that hand-writing is going to be needed to any great extent in the future [4], the traditional letter has disappeared, and “Dates” are now based on “I’ll call your mobile when I’m in the area” instead of meeting at a precise time – but this is the trivial stuff. Scientific development and education have dramatically increased due to using the Internet and business now moves a lot faster due to mobile phones. It seems that nowadays any young person who doesn’t want to be single and unemployed needs to have either a mobile phone or Internet access – and preferably both. When mobile phones were first released I never expected that almost everyone would feel compelled to have one, and when I first started using the Internet in 1992 I never expected it to have the rich collaborative environment of Wikipedia, blogging, social networking, etc (I didn’t imagine anything much more advanced than file exchange and email).
Of these changes the latest (Internet and mobile phones) seems at first glance to be the least significant – but let’s not forget that it’s still an ongoing process. The other changes became standard parts of society long ago. So it seems that we could count as many as six singularities, but it seems that even the most conservative count would have three singularities (tool use, agriculture, and writing).
It seems to me that the major factors for a singularity are an increased population density (through couples being able to support more children, through medical technology extending the life expectancy, through greater food supplies permitting more people to live in an area, or through social structures which manage the disputes that arise when there is a great population density) and increased mental abilities (which includes better education and communication). Research into education methods is continuing, so even without genetically modified humans, surgically connecting computers to human brains, or AI we can expect intelligent beings with a significant incremental advance over current humans in the near future. Communications technology is continually being improved, with some significant advances in the user-interfaces. Even if we don’t get surgically attached communications devices giving something similar to “telepathy” (which is not far from current technology), there are possibilities for significant increments in communication ability through 3D video-conferencing, better time management of communication (inappropriate instant communication destroys productivity), and increased communication skills (they really should replace some of the time-filler subjects at high-school with something useful like how to write effective diagrams).
It seems to me that going from the current situation of something significantly less than one billion people with current (poor) education and limited communications access (which most people don’t know how to use properly) to six billion people with devices that are more user-friendly and powerful than today’s computers and mobile phones combined with better education as to how to use them has the potential to increase the overall rate of scientific development by more than an order of magnitude. This in itself might comprise a singularity depending on the criteria you use to assess it. Of course that would take at least a generation to implement, a significant advance in medical technology or AI could bring about a singularity much sooner.
But I feel safe in predicting that people who expect the world to remain as it is forever will be proven wrong yet again, and I also feel safe in predicting that most of them will still be alive to see it.
I believe that we will have a technological singularity (which will be nothing like the “rapture” which was invented by some of the most imaginative interpretations of the bible). I don’t believe that it will be the final singularity unless we happen to make our species extinct (in which case there will most likely be another species to take over the Earth and have it’s own singularities).
Currently we have a huge husing crisis in the US which involves significant political corruption including the federal government preventing state governments from stopping predatory banking practices [1].
The corrupt plan to solve this is to simply give the banks a lot of taxpayer money, so the banking business model then becomes to do whatever it takes to make a short-term profit and then rely on federal funds for long-term viability. The bank employees who caused the problem by aggressively selling mortgages to people who could never repay them if the housing prices stabilised – let alone if the prices fell.
If the aim is to protect families, then the first requirement is that they not be evicted from their homes. The solution to this is to void the mortgage of anyone resident home owner who purchased a house on the basis of false advice from the bank or who unknowingly entered into a mortgage that any reasonable person who is good at maths can recognise as being impossible for them to repay. The bank would end up with clear title to the property and the ex-homeowner would end up with no debt. Then such properties could be set for a controlled rent for a reasonable period of time (say 5 years). The bank (or it’s creditors) would have the option of renting the property to the ex-mortgagee for a minimum of five years or selling the property to someone else who was willing to do so. Of course the ex-mortgagee (who would not be bankrupt) would have the option of seeking out a new mortgage at reasonable rates and then buying thier home again.
Also to benefit families the rent control period could be extended for as long as they have dependent children.
The losers in this would be the banks and the people who purchased multiple investment properties (the ones who caused all the problems).
Finally what is needed is a cultural shift towards austerity (as described by Juan Enriquez and Jorge Dominguez) [2].
Glen makes an interesting point about the irony of typical homeowners in the US demonstrating more financial literacy than the people who run banks [3].
|
|