How fork(2) ought to be

Richard Kettlewell

There are a couple of problems with the UNIX mechanism for managing processes.

Waiting For Processes

Firstly, you can't directly call select to wait for a process to terminate. Dan Bernstein describes a workaround for this, and I sometimes use something quite similar to that in my own code.


Secondly, pidfiles don't work. The reason that daemons create pidfiles is so that other processes can find out their process ID in order to signal them. But if the daemon terminates without removing the pidfile, or if it terminates after another process has read the pidfile but before it has acted on it, then it's quite possible that the PID read from the pidfile refers either to no process at all, or (since PIDs are re-used) to the wrong process entirely.

If this causes a signal to be delivered to the wrong process, the results can range between inconvenient and disastrous.

The races involving PID re-use are particularly annoying as there doesn't seem to be any sane way around them. This is true for the general case, not just pidfiles - even if you get a PID from the output of ps there is a small chance that the process will terminate and the PID be re-used by the time you've used that PID in a later system call.

In fact about the only thing you can safely do is wait for a particular PID that you got from fork; if it's not a child process of the current process (any more) then you will get an ECHILD error. (Actually it may be even worse than that - if you are writing library code and thus don't have full control over what the program you're running in will do then the process may be waited for outside your control, after which the PID might get re-used for another child of the same process. However your library can document that it won't work under such pathological conditions.)

Everything's A File

It's sometimes said of UNIX that everything is a file. Processes are an example of something that isn't; the problems above can be addressed by changing this.

A Better fork(2)

As Dan Bernstein observes, what you really want is for fork(2) to return a file descriptor. Then you can select on it to detect when the process terminates. Reading from the file would be the same as waiting for the process. It'd be possible with this interface to provide a mechanism where multiple different processes could pick up the exit status of some process. I don't know if this would be useful.

You might send the process signals by writing bytes into it.

flink(2) and frename(2)

That leaves the question of fixing pidfiles. The answer is to be able to turn this file descriptor back into a name in the filesystem. This would be a globally useful call; the usual name for it is flink(2).

When flink is discussed someone will usually claim that it's a security hole. If you have a file descriptor that was opened O_RDONLY, but the file permissions happen to provide write access, then by creating a new name for it you could open the file O_RDWR.

However, even ignoring the obvious silliness of a piece of software making its security dependent on this, you can achieve the supposedly insecure effect already via /proc/self/fd on (e.g.) Linux systems.

link(2) won't overwrite, so presumably nor should flink. But if you had an overwriting version, you could implement the common rename-into-place idiom by opening a file, unlinking it, writing it, then giving the FD the target name; if your program suffered a fatal error in the meantime, it wouldn't need to explicitly remove the temporary file (which might be impossible if the fatal error is a crash or SIGKILL). Presumbly such a call would be called frename.

(A potentially useful third call would be a version of open(2) that created an anonymous file on a nominated filesystem. But you can do without this if you're prepared to put up with a small time window in which your temporary file has a name.)

Better Than Pidfiles

Once you've got flink(2) then writing a pidfile is something the parent does, and it actually involves linking the file descriptor it got back from fork into the filesystem. Depending on the application it might chown and/or chmod it first.

If you want a process to be able to do this for itself, instead of relying on its parent, there could be a system call to open a controlling file descriptor for a given process - which might amount to opening /proc/PID/controlfd or /proc/self/controlfd (for example).

Catching Signals

Another related idea, inspired by that of sending signals by writing to a suitably magic file descriptor, is to provide a file descriptor from which a process can read signals sent to it (instead of having those signals delivered in the normal fashion). To work usefully it'd probably be sensible to select which signals were delivered this way and which handled in the traditional manner. (And of course, SIGKILL should remain untrappable.)

(Update: signalfd exists in Linux from 2.6.22 onwards.)

Copyright © 1999-2002, 2006 Richard Kettlewell.

RJK | Contents