unaligned access on IA64

I recently had some problems with unaligned access on IA64, messages about unaligned access were being logged via printk and I couldn’t determine the cause – or even how to track it down. To test what an unaligned access means (which wasn’t documented anywhere that a quick google search could find) I wrote the test program in the second half of this post. Below is the output of the test program which accesses an integer at various offsets. As you can see it’s addresses that are congruent to 5, 6, and 7 mod 8 that cause the errors. At the int is 4 bytes long it seems that the cause of an unaligned access error is an access to a data type that crosses an 8 byte boundary. So a pointer or long long would have to be aligned to an 8 byte boundary, an int has to be at an address that is congruent to 4 or less mod 8, and a character can be anywhere.

Also if the sysctl /proc/sys/kernel/ignore-unaligned-usertrap is set to 1 then these messages will be disabled. But you really don’t want to do that, such errors apparently cause a significant performance loss so you want to file bug reports against programs that do this.

# ./a.out
index: 0
index: 1
a.out(10393): unaligned access to 0x607fffffff34ee25, ip=0x4000000000000961
index: 2
a.out(10393): unaligned access to 0x607fffffff34ee26, ip=0x4000000000000961
index: 3
a.out(10393): unaligned access to 0x607fffffff34ee27, ip=0x4000000000000961
index: 4
index: 5
index: 6
index: 7
index: 8
index: 9
a.out(10393): unaligned access to 0x607fffffff34ee2d, ip=0x4000000000000961
index: 10
a.out(10393): unaligned access to 0x607fffffff34ee2e, ip=0x4000000000000961
index: 11
a.out(10393): unaligned access to 0x607fffffff34ee2f, ip=0x4000000000000961
index: 12
index: 13

Below is the test program I used:
Continue reading “unaligned access on IA64”

Planet Linux Jobs Victoria

As part of my ongoing plan to make things easier for Linux job applicants and advertisers I have created a Planet for Linux Jobs in Victoria, Australia.

The LUV President had suggested that I make a proposal to the LUV committee about this. I have offered them ownership of the Victorian aspects of this idea as well as volunteering to run the services for them.

Fragmenting Information about Jobs

A comment on my previous post about my Linux Jobs Blog suggested that I shouldn’t fragment the information.

However I believe that fragmenting the information is ideal due to the ability of RSS syndication to drive the cost of coalescing the information to almost zero!

Currently there is a Linux job web site run by Linux Australia. It doesn’t have many adverts and isn’t even linked from the main Linux Australia web site. I believe that we can do better for the people who want Linux jobs and the people who have such jobs to advertise.

If you have a central site the jobs have to be moderated (which takes work and delays listing), the larger the area that the site covers the greater the work is to do this.

The solution is to have a distributed system with different people running listings for various regions and a syndication service to aggregate them. To start this I have created a blog which will have categories for the states and territories of Australia. Someone who is only interested in one region can visit the category for that region. Then recruiting agencies and companies which regularly hire Linux people can start their own RSS feeds to be syndicated in a planet instance for each state and territory. This gives a faster and more efficient response (adverts will appear quickly, can be changed or removed at any time, and less effort for moderation. I expect that recruiting agencies will occasionally post off-topic entries – but when their feed gets removed from the Planet installation they will probably make a commitment to do the right thing in future.

Planet installations can syndicate other Planet installations, so we can easily have a Linux jobs Planet for Australia (possibly run by Linux Australia) that syndicates the feeds from each state and territory.

In the long term I think that the best way of running this is to have Linux Australia run the central Planet instance and a LUG in each region run the local site. I started running it myself because I didn’t get a positive response when suggesting it to the relevant people in Linux Australia and LUV. That didn’t deter me, so I decided to set it up myself. If the idea takes off and if Linux Australia and the LUGs want to take it over I’ll be happy to use HTTP redirects to send the traffic to them – and help them with the sys-admin work if asked.

Also there is nothing specific to Australia in this idea. I am only interested in Australia because if I was to attempt to do it in a larger area (such as the EU) then I could spend all my time on it without gaining a critical mass in any region.

If you are interested in running this in your region then you need to set up a blog (for adverts that are sent to you via email) and a Planet installation that feeds from your blog and any other Linux job advert blogs in your region. If your country doesn’t already have a central Planet for jobs then creating a separate Planet installation for your country would be a good idea too. I will be happy to run a Planet installation for world-wide Linux job adverts (at least until I can convince an organization such as Linux International to run it) if this idea takes off in other countries.

Some people have asked what the benefit of a Planet is over a mailing list. You have to subscribe to a mailing list while with a Planet you can immediately visit it if you suddenly decide to find a new job. Subscribing to mailing lists for jobs from all countries would never work, but visiting a Planet for Linux jobs world-wide and then visiting a sub-planet for regional jobs would be quick and easy.

Linux Job Ads

It seems to me that we need to have syndicated feeds for Linux job adverts. To start this I have created a new blog for Linux job adverts, it will have categories for the states and territories of Australia (with a feed for each category) and I will also create Planet installations that take feeds from the blog as well as any other Linux jobs RSS feeds. I will create the Planet installations when I have feeds to add.

If your company regularly advertises Linux jobs in Australia then please create an RSS feed of the adverts and I will syndicate it. If your company occasionally advertises Linux jobs then you can email them to me.

As of this moment there are no positions advertised, but I hope that to change soon.

Committing Data to Disk

I’ve just watched the video of Stewart Smith’s LCA talk Eat My Data about writing applications to store data reliably and not lose it. The reason I watched it was not to learn about how to correctly write such programs, but so that I could recommend it to other people.

Recently I have had problems with a system (that I won’t name) which used fwrite() to write data to disk and then used fflush() to commit it! Below is a section from the fflush(3) man page:

NOTES
       Note that fflush() only flushes the user space buffers provided by  the
       C  library.   To  ensure that the data is physically stored on disk the
       kernel buffers must be flushed too, e.g. with sync(2) or fsync(2).

Does no-one read the man pages for library calls that they use?

Then recently I discovered (after losing some data) that both dpkg and rpm do not call fsync() after writing package files to disk. The vast majority of Linux systems use either dpkg or rpm to manage their packages. All those systems are vulnerable to data loss if the power fails, a cluster STONITH event occurs, or any other unexpected reboot happens shortly after a package is installed. This means that you can use the distribution defined interface for installing a package, be told that the package was successfully installed, have a crash or power failure, and then find that only some parts of the package were installed. So far I have agreement from Jeff Johnson that RPM 5 will use fsync(), no agreement from Debian people that this would be a good idea, and I have not yet reported it as a bug in SUSE and Red Hat (I’d rather get it fixed upstream first).

During his talk Stewart says sarcastically “everyone uses the same filesystem because it’s the one true way“. Unfortunately I’m getting this reaction from many people when reporting data consistency issues that arise on XFS. The fact that Ext3 by default will delay writes by up to 5 seconds for performance (which can be changed by a mount option) and that XFS will default to delaying up to 30 seconds means that some race conditions will be more likely to occur on XFS than in the default configuration of Ext3. This doesn’t mean that they won’t occur on Ext3, and certainly doesn’t mean that you can rely on such programs working on Ext3.

Ext3 does however have the data=ordered mount option (which seems to be the default configuration on Debian and on Red Hat systems), this means that meta-data is committed to disk after the data blocks that it referrs to. This means that an operation of writing to a temporary file and then renaming it should give the desired result. Of course it’s bad luck for dpkg and rpm users who use Ext3 but decided to use data=writeback as they get better performance but significantly less reliability.

Also we have to consider the range of filesystems that may be used. Debian supports Linux and HURD kernels as main projects and there are less supported sub-projects for the NetBSD, FreeBSD, and OpenBSD kernels as well as Solaris. Each of these kernels has different implementations of the filesystems that are in common and some have native filesystems that are not supported on Linux at all. It is not reasonable to assume that all of these filesystems have the same caching algorithms as Ext3 or that they are unlike XFS. The RPM tool is mainly known for being used on Red Hat distributions (Fedora and RHEL) and on SuSE – these distributions include support for Ext2/3, ReiserFS, and XFS as root filesystems. RPM is also used on BSD Unix and on other platforms that have different filesystems and different caching algorithms.

One objection that was made to using fsync() was the fact that cheap and nasty hard drives have write-back caches that are volatile (their contents dissappear on power loss). As with such drives reliable operation will be impossible so why not just give up! Pity about the people with good hard drives that don’t do such foolishness, maybe they are expected to lose data as an expression of solidarity with people who buy the cheap and nasty hardware.

Package installation would be expected to be slower if all files are sync’d. One method of mitigating this is to write a large number of files (EG up to a maximum of 900) and then call fsync() on each of them in a loop. After the last file has been written the first file may have been entirely committed to disk, and calling fsync() on one file may result in other files being synchronised too. Another issue is that the only time package installation speed really matters is during an initial OS install. It should not be difficult to provide an option to not call fsync() for use during the OS install (where any error would result in aborting the install anyway).

Update: If you are interested in disk performance then you might want to visit the Benchmark category of my blog, my Bonnie++ storage benchmark and my Postal mail server benchmark.

Update: This is the most popular post I’ve written so far. I would appreciate some comments about what you are interested in so I can write more posts that get such interest. Also please see the Future Posts page for any other general suggestions.

LUG talks today

Today I gave three talks at my local LUG. The first was my latest SE Linux talk (I’ll put the notes online soon). The second was a talk about voting.

I asked for a show of hands, who has already decided which party they will vote for at the next federal election (about 12 people put their hands up). I then asked people to put their hands down if they were not a member of the party that they intend to vote for, including myself there were only two raised hands in the room (including mine)!

With the way party politics works nowadays the major parties are not very interested in representing their core voters. Why try to please for people who will vote for you anyway? Instead they try to appeal to swinging voters and pressure groups. If you have decided to vote for a party they have no reason to try and impress you. Therefore you should join the party and try and influence the policy decision making process from within.

The issues that I believe are most important to the Linux community are free software use in government, sane intellectual property laws, the right to a fair trial, and not pandering to the US (which is related to the previous two points).

If you have already decided who to vote for then you should join that party and make your vote count in the party room.

One member of the audience said that he had been a member of one of the major parties but that the internal politics turned him off. If that is your experience then I think you should ask yourself whether you want to vote for a group of people that you can’t work with.

The final talk I gave was about getting speakers for Linux Users’ Groups. There is always difficulty in finding speakers for clubs. Ideally we would have meetings planned a few months ahead of time so that they could be advertised in various ways. Newspapers often have columns dedicated to providing information about public meetings but the lead time is usually at least a week (and the meeting would have to be advertised at least two weeks in advance – so more than a month’s planning ahead is required).

Getting a larger number and variety of speakers will attract new members, encourage existing members to attend more meetings, and inspire members in their Linux work.

Talks can be given by almost anyone. There is a constant demand for speakers who have expert knowledge in the topic, but anyone who is a decent speaker and has the confidence to stand up at the podium can give a good talk. For expert speakers possibilities include academics, industry leaders, leaders of free software development projects, and journalists. But that’s not all, anyone who wants to spend the time researching a topic can give a talk on it. For example I’ve been learning about MySQL recently for my own servers and will probably offer a talk about MySQL aimed at sys-admins who don’t want to become DBAs but who just want to get a database running. I’m not a MySQL expert (and don’t plan to become one) but I believe that there are many people who want to do the things I do with MySQL and who could benefit from a talk that I might give.

The best place to find speakers is a conference or trade-show. If they give a talk that works well you can suggest that they give it again for your local LUG. You can also find speakers at conferences that you can’t attend. If someone visits your country for a trade-show in a different city you could send them an email saying “unfortunately I can’t attend your talk, but if you are interested in visiting my city in the same trip then there will be an audience of X people interested in seeing you”.

There’s no harm in asking, the worst that they can do is decline. Ask everyone who you think can do a good job. Also make sure that you don’t make any commitment (unless you are member of the LUG committee).

Linux Tour Bus

I have seen buses used for tours that contain bunk beds. If one or more such buses were hired then a group of Linux people could go on a moving Linux conference. This would have to take place in an area with many reasonable size cities in a close area and where there is a good number of Linux people in such cities. Probably the EU is the only area.

A bus (or several buses depending on demand) would then take a group of Linux people through the major cities and have a conference in each one.

Currently there are conferences such as the Debian conference DebConf which receive sufficient sponsorship money to pay for many speakers to attend. Having a similar conference traveling around Europe shouldn’t cost any more money and will give plenty of time for the people in the bus to do some coding along the way.

We already have the Geek Cruises, my idea is to do a similar thing but on the road. Also it isn’t practical to transport an entire conference, so it would probably just be speakers on buses and the audiences would vary from city to city.

BLUG

This weekend I went to the Ballarat install-fest, mini-conf, and inaugural meeting of the Ballarat Linux Users’ Group (BLUG).

This was the second install-fest, the first one was quite successful so it was decided that there was demand for a second. I suggested that what we should do is get some of the more experience members of LUV to attend and give talks about their areas of expertise and make a mini-conference. I also suggested that we
hire a large vehicle to take a number of people to the meeting. Both my suggestions were accepted.

So on Friday evening I was in a Kia XXX with five other people from LUV on our way to Ballarat.

On Saturday we had the install-fest. We started at about 10AM, there were about a dozen people getting help installing Linux and many more attending the mini-conf and just hanging out. For lunch we had a BBQ. In the afternoon I gave a talk on SE Linux and then a brief impromptu talk on Poly-Instantiated Directories while the next speaker was setting up their laptop.

At the end there was the inaugural meeting of BLUG. The president was appointed, and there were some brief discussions about when to schedule meetings. I suggested that BLUG meetings should be either the day before or the day after LUV meetings to increase the incidence of speakers from other regions attending both meetings, my suggestion was being seriously considered at the time the meeting adjourned – LUV is a larger group and has better ability to get speakers from other regions. It was also agreed that a
weekend combined LUV and BLUG meeting would be arranged twice a year.

I traveled back to Melbourne by train which was cheap at $9 and comfortable. There was even a power point in the carriage (which I didn’t use as my laptop was charged and the location was not convenient). For the next such event I’ll try and arrange a group to travel on the train together.

The next thing to do is to find other regional centers in Victoria where we can do the same thing. Bendigo might be a possibility.

Also if you are a member of a LUG in a city please consider the possibilities for helping form a LUG in a regional center that’s nearby. I would be happy to provide whatever advice I can to help people replicate this success in areas surrounging other cities, so please email me if you have any questions.

meeting people at Linux conferences

One thing that has always surprised me is how few people talk to speakers after they have finished their lecture. A lecture might have many questions and the questions may be cut off, but when the speaker leaves the room they will usually do so alone.

When I give lectures at conferences I’m always happy to spend more time talking to people who are interested in the topic and disappointed that so few people choose to do so. It seems that other people have similar experiences, there have been several occasions when I have invited speakers to join me for lunch and no-one else has shown interest in joining us.

Usually the most significant factor in making someone offer a talk at a Linux conference is the opportunity to teach other people about the technology that they are working on. People with that motivation will take the opportunity to teach people at lunch, dinner, whenever.

Linux Conf Au
has an event called the “Professional Delegates Networking Session” which is regarded by some people as the way to meet speakers (about half the delegates don’t attend so the ratio of speakers to delegates is significantly better than at other conference events). But it seems to me that it’s more efficient to just offer to buy them dinner. When I worked for Red Hat the maximum value for a gift I could accept was $100US, I expect that Red Hat has not changed this policy since then and that most companies that employ speakers at Linux conferences have similar policies. $100US is more than a meal costs at most restaurants that are near a Linux conference.

If I was a manager at a company that sent employees to a Linux conference I would first send email to some speakers who were working in areas of Linux development that were related to the projects that the employees were working on. I would ask the speakers if they would be interested in having dinner bought for them by my company and give them the option of bringing one or two friends along for a free meal (the friends would probably be people who work in similar areas).

some random Linux tips

  • echo 1 > /proc/sys/vm/block_dump
    The above command sets a sysctl to cause the kernel to log all disk writes. Below is a sample of the output from it. Beware that there is a lot of data.
    Jan 10 09:05:53 aeon kernel: kjournald(1048): WRITE block XXX152 on dm-6
    Jan 10 09:05:53 aeon kernel: kjournald(1048): WRITE block XXX160 on dm-6
    Jan 10 09:05:53 aeon kernel: kjournald(1048): WRITE block XXX168 on dm-6
    Jan 10 09:05:54 aeon kernel: kpowersave(5671): READ block XXX384 on dm-7
    Jan 10 09:05:54 aeon kernel: kpowersave(5671): READ block XXX400 on dm-7
    Jan 10 09:05:54 aeon kernel: kpowersave(5671): READ block XXX408 on dm-7
    Jan 10 09:05:54 aeon kernel: bash(5803): dirtied inode XXXXXX1943 (block_dump) on proc
  • Prefixing a bash command with ‘ ‘ will prevent a ! operator from running it. For example if you had just entered the command ” ls -al /” then “!l” would not repeat it but would instead match the preceeding command that started with a ‘l’. On SLES-10 a preceeding space also makes the command not appear in
    the history while on Debian/etch it does (both run Bash 3.1).
  • LD_PRELOAD=/lib/libmemusage.so ls > /dev/null

    The above LD_PRELOAD will cause a dump to stderr of data about all memory allocations performed by the program in question. Below is a sample of the output.

    Memory usage summary: heap total: 28543, heap peak: 20135, stack peak: 9844
             total calls   total memory   failed calls
     malloc|         85          28543              0
    realloc|         11              0              0   (in place: 11, dec: 11)
     calloc|          0              0              0
       free|         21          12107
    Histogram for block sizes:
        0-15             29  30% ==================================================
       16-31              5   5% ========
       32-47             10  10% =================
       48-63             14  14% ========================
       64-79              4   4% ======
       80-95              1   1% =
       96-111            20  20% ==================================
      112-127             2   2% ===
      208-223             1   1% =
      352-367             4   4% ======
      384-399             1   1% =
      480-495             1   1% =
     1536-1551            1   1% =
     4096-4111            1   1% =
     4112-4127            1   1% =
    12800-12815           1   1% =