Linux, politics, and other interesting things
Recently I’ve been unfortunate enough to be the sys-admin of some systems running CA software, the specific horror in this case is Siteminder.
The latest excitement was when an important machine stopped working abruptly and gave the error “ff ff ff ff” in the Apache error log. I have been familiar with the error message “ff ff ff” which means that the Siteminder policy server can not be contacted. But it took me a while to discover a message in the policy server logs indicating that a client was connecting to it with an invalid shared secret. It seems that the policy server had suddenly changed it’s shared secret for no reason I could determine.
A google search for this issue turned up a single blog entry about it, which reports the “ff ff ff ff” error message as appearing in the case where the “ff ff ff” error occurs on the machines I run. Maybe I’m running a newer version, or maybe drax0r wrote the wrong error message by mistake. My colleagues have seen the error message “ff ff“, we are still unsure of what that means.
For people who haven’t used Siteminder I’ll briefly describe how it works. There is a 2MB Apache module (larger than httpd and all the modules shipped in the RHEL package) that implements the access control and content management (compiled with -g, presumably because it will SEGV if compiled with -O2). This module spawns a daemon from Apache. Unfortunately the daemon code drops the root UID but does not drop the root GID (fun for security), I wrote a patch to the runuser program that can be used to address this by changing GID before running Apache. Then all communication between Apache and the policy server goes via the daemon process via sys-v IPC. Of course if the daemon crashes then the IPC resources are not freed and then it won’t restart unless the system is rebooted or the semaphores are manually removed.