Writing [Feed] About Pub

Crossed Signals, A 15 Year Old Bug in a Feature You’ve Never Heard Of

3 Nov 2014

Update — 11 Nov 14

My interpretation (as well as Stevens’) of POSIX was incorrect. Section B.1.3 states that the rationale guidance is for the application developer, not the systems developer. Setting the sa_mask, in the application, is the appropriate way to achieve cross platform priority delivery. See Stevens’ own errata entry (p. 102) and Geoff Clare’s reply (#211) to a bug filed against POSIX.1-2004. Sorry for not catching this sooner and for any confusion this mistake has caused.

Your production operating system does not implement realtime signals correctly. Don’t worry, no one’s does—not unless they are running HP-UX64 11.11i. And if you are one of those five people, congratulations, you can now justify the coin you dropped for that license.

We first run the program under Solaris 2.6, but the output is not what is expected. The nine signals are queued, but the three signals are generated starting with the highest signal number (we expect the lowest signal number to be generated first). Then for a given signal, the queued signals appear to be delivered in LIFO, not FIFO, order…We now run the program under Digital Unix 4.0B and see the expected results…The Solaris 2.6 implementation appears to have a bug. Ste99 §5.7

It appears to have a bug indeed! I admire the dry euphemism deployed by the great Richard Stevens, but I must be blunt: not only did Solaris 2.6 have a bug, it got delivery order completely wrong. Was it opposite day when this code was written? Did anyone bother to verify the POSIX semantics were met? It seems not.

And lest you think I’m picking on old man Solaris, you’ll be happy to know modern day kernels such as illumos, FreeBSD, and yes, even your coveted Linux, all get it wrong. If I were to give this bug a theme, in tradition with the recent Heartbleed and Shellshock bugs, I’d have to call it the EverybodyGotItWrongButNoOneNoticed bug. I’m still working on a logo.

Realtime signals were codified 21 years ago in POSIX.1b. Yet they have gone implemented incorrectly in most operating systems this entire time. I hope that my post may effect change, and we can finally close that bug Stevens reported 15 years ago.

POSIX.1b and Realtime Signals

Realtime signals are an extension to the familiar base signals used everyday, such as the SIGSEGV trap that my C programs like to produce. They were standardized by POSIX.1b, published in 1993 (The original name was POSIX.4, an eponym of the working group. But someone couldn’t leave good enough alone. There was a “Grand Renumbering.” All POSIX.1 extensions were renamed to POSIX.1x. Just know that POSIX.4, POSIX 1003.1b-1993, and POSIX.1b all refer to the same thing—the realtime extensions to POSIX.1 Gal95). Included in the same document that specified the ubiquitous mmap(2) and fsync(3C), and gave us better portable semaphores, timers, and IPC. Realtime signals extend base signals in three significant ways.

  1. QueueingWhen the same signal number is generated multiple times; it shall be delivered multiple times. E.g., if realtime signal S is being delivered while two more S signals are generated then those additional signals will form a queue. It is undefined whether base signals queue, implementations are free to collapse them.

  2. FIFO orderingMultiple pending signals of the same number must be delivered in the order they were generated.

  3. PriorityLower signal numbers must be delivered first. They preempt higher numbered signals. There is no defined order between base signals nor the case when both base and realtime signals are pending, it is up to the implementation. E.g., Linux gives base signals priority over realtime signals.

    Overloading the signal number and giving higher priority to lower numbers is not accidental. This allowed reuse of the existing delivery mechanism while remaining efficient and simple to implement IEEE13 §B.2.4.

The concern of this post is with the last two points.

A 15 Year Old Bug

In figure 1, the output of Stevens’ realtime signals test running on OmniOS r151012 (a descendant of Solaris 2.6). In figure 2, the expected output. In the generation-and-a-half since Stevens ran his test the LIFO bug has been fixed but priority inversion persists.

$ ./test1 SIGRTMIN = 42, SIGRTMAX = 73 sent signal 73, val = 0 sent signal 73, val = 1 sent signal 73, val = 2 sent signal 72, val = 0 sent signal 72, val = 1 sent signal 72, val = 2 sent signal 71, val = 0 sent signal 71, val = 1 sent signal 71, val = 2 received signal #73, code = -2, ival = 0 received signal #73, code = -2, ival = 1 received signal #73, code = -2, ival = 2 received signal #72, code = -2, ival = 0 received signal #72, code = -2, ival = 1 received signal #72, code = -2, ival = 2 received signal #71, code = -2, ival = 0 received signal #71, code = -2, ival = 1 received signal #71, code = -2, ival = 2
Fig. 1incorrect output
$ ./test1 SIGRTMIN = 42, SIGRTMAX = 73 sent signal 73, val = 0 sent signal 73, val = 1 sent signal 73, val = 2 sent signal 72, val = 0 sent signal 72, val = 1 sent signal 72, val = 2 sent signal 71, val = 0 sent signal 71, val = 1 sent signal 71, val = 2 received signal #71, code = -2, ival = 0 received signal #71, code = -2, ival = 1 received signal #71, code = -2, ival = 2 received signal #72, code = -2, ival = 0 received signal #72, code = -2, ival = 1 received signal #72, code = -2, ival = 2 received signal #73, code = -2, ival = 0 received signal #73, code = -2, ival = 1 received signal #73, code = -2, ival = 2
Fig. 2correct output

(The test, rtsignals/test1.c, can be obtained at the UNPV22e website).

The good news is FreeBSD and Ubuntu produce the same incorrect output. But three popular kernels being wrong in the same way doesn’t change the fact that they are wrong. Surely, in the time that has passed since Fight Club was #1 at the box office, someone else must have noticed POSIX.1b’s inflamed sense of rejection.

According to the POSIX standard, multiple real-time signals pending to a process should be delivered in a strict order. Specifically, the lowest-numbered signal should be delivered first and multiple occurrences of signals with the same number should be delivered in FIFO order.

Current Linux kernel delivers the highest-numbered signals pending to a process first, not the lowest-numbered ones. This contradicts to the requirement explained above. The problem can be demonstrated by the following test program… Sal07a

This LKML thread from 2007 is describing the same bug discovered by Stevens (along with an attached test and a patch for Linux 2.6.22.1). The author received exactly one response.

I believe you should check that you mask or signal in your signal handler. If you don’t the high-prio handler will be prempted by low-prio, and they will be executed in the reverse order. CAS07

On one hand, this person is correct, masking off the higher signals will prevent preemption. On the other hand, this solution is annoying. The operating system should enforce POSIX semantics, not the user! It turns out I’m not the only one who feels this way. The next reply mentions that two different Linux kernels showed different behavior.

When I ran the old test program using a vendor-specific heavily patched kernel the signals order was as the POSIX standard specified. Another kernel, which was closer to the vanilla kernel, did not show the expected behavior. Instead, the signals were handled in the reversed order. Sal07b

But enough with the history. Let’s get to the fun part, the actual reason for this bug.

Kernel and libc Sitting in a Tree: P-R-E-E-M-P-T

As a process exits a syscall the kernel checks for pending signals. If a handler is registered for the signal, by a previous call to sigaction(2), the kernel will invoke libc to perform delivery. In illumos this leads to call_user_handler(). Just before executing the handler, this function calls lwp_sigmask() to block the current signal being delivered and any signals specified in sa_mask. After setting the mask, but before returning, lwp_sigmask() checks for additional pending signals. If any exist it sets the t_sig_check flag; alerting the kernel to read pending signals on the next syscall exit. This is all fine and good, except for one small thing: lwp_sigmask() is a syscall.

Upon exiting lwp_sigmask(), the kernel notices t_sig_check is set and starts the delivery process all over again, preempting the delivery that was already in motion. This process repeats until all unique signal numbers are seen and masked off by the thread’s t_hold field. At this point the stack can begin to unwind, but the damage has already been done. The kernel delivers the the signals in the correct order only to have userland invert them on the stack. Below is the output of a DTrace script demonstrating this effect.

See the complete output.

# ./test1.d -c '/home/ryan/unpv22e/rtsignals/test1' ... USER STACK: libc.so.1`syscall+0x5 libc.so.1`thr_sigsetmask+0x1c2 libc.so.1`sigprocmask+0x52 test1`Sigprocmask+0x1f test1`main+0x139 test1`_start+0x83 => fsig(k_sigset_t: {0, 0, 448}, ...), kthread_t->t_hold {0, 0, 0} <= fsig returned signal: 71 => lwp_sigmask(3, 0, 0, 64) <= lwp_sigmask returned, t_sig_check: 1, kthread_t->t_hold {0, 0, 64} USER STACK: libc.so.1`syscall+0x5 libc.so.1`call_user_handler+0x1f1 libc.so.1`sigacthandler+0x77 libc.so.1`syscall+0x5 libc.so.1`thr_sigsetmask+0x1c2 libc.so.1`sigprocmask+0x52 test1`Sigprocmask+0x1f test1`main+0x139 test1`_start+0x83 => fsig(k_sigset_t: {0, 0, 448}, ...), kthread_t->t_hold {0, 0, 64} <= fsig returned signal: 72 => lwp_sigmask(3, 0, 0, 192) <= lwp_sigmask returned, t_sig_check: 1, kthread_t->t_hold {0, 0, 192} USER STACK: libc.so.1`syscall+0x5 libc.so.1`call_user_handler+0x1f1 libc.so.1`sigacthandler+0x77 libc.so.1`syscall+0x5 libc.so.1`call_user_handler+0x1f1 libc.so.1`sigacthandler+0x77 libc.so.1`syscall+0x5 libc.so.1`thr_sigsetmask+0x1c2 libc.so.1`sigprocmask+0x52 test1`Sigprocmask+0x1f test1`main+0x139 test1`_start+0x83 => fsig(k_sigset_t: {0, 0, 448}, ...), kthread_t->t_hold {0, 0, 192} <= fsig returned signal: 73 => lwp_sigmask(3, 0, 0, 448) <= lwp_sigmask returned, t_sig_check: 0, kthread_t->t_hold {0, 0, 448} USER STACK: libc.so.1`syscall+0x5 libc.so.1`call_user_handler+0x1f1 libc.so.1`sigacthandler+0x77 libc.so.1`syscall+0x5 libc.so.1`call_user_handler+0x1f1 libc.so.1`sigacthandler+0x77 libc.so.1`syscall+0x5 libc.so.1`thr_sigsetmask+0x1c2 libc.so.1`sigprocmask+0x52 test1`Sigprocmask+0x1f test1`main+0x139 test1`_start+0x83 => fsig(k_sigset_t: {0, 0, 448}, ...), kthread_t->t_hold {0, 0, 192} <= fsig returned signal: 73 => lwp_sigmask(3, 0, 0, 448) <= lwp_sigmask returned, t_sig_check: 0, kthread_t->t_hold {0, 0, 448} ...
Fig. 3test1.d output

Ironically, the fix is not only simple but is spelled out directly in POSIX.

Given the specified selection of the lowest numeric unblocked pending signal, preemptive priority signal delivery can be achieved using signal numbers and signal masks by ensuring that the sa_mask for each signal number blocks all signals with a higher numeric value. IEEE13 §B.2.4

If SIGRTMAX=73 and realtime signal 47 is being delivered then call_user_handler() should set sa_mask to block signals 48 through 73. That way only lower-numbered, higher-priority signals may preempt its delivery. But wait, what if multiple instances of 47 are pending? It would preempt itself and cause LIFO ordering. To honor both priority and FIFO semantics all signals greater or equal to the realtime signal being delivered must be blocked during delivery. That brings me to the topic of non-deferred signals.

All signals are deferred by default. The signal being delivered is blocked while being delivered, preventing preempting by the same signal number. When registering a handler, the option SA_NODEFER is used to disable this behavior. POSIX does not discuss mixing realtime and nodefer together, but it should have. It is nonsensical to mix the two. SA_NODEFER is in fundamental disagreement with FIFO ordering. Either sigaction(2) should fail with EINVAL or the delivery implementation should ignore the option when delivering a realtime signal.

Too Late to be Portable?

The only operating system I know to produce correct results is HP-UX64 11.11i. The other seven—OmniOS r151012, FreeBSD 10.0, NetBSD 6.1.5, Ubuntu 14.04, IRIX64 6.5.29, AIX 5.3, and AIX 6—managed to produce four unique orderings. The output of the rtsignal test for each operating system can be found in results.txt. Even if all kernels are patched tomorrow, realtime signal ordering cannot be truly portable without introducing some type of compile-time check. Until then, if you want maximum portability, you should manually block signals to achieve the proper ordering.

If you’d like to test your operating system then I have three options for you: 1) Stevens’s original test program, 2) my rtsignal test—a hybrid of the Stevens and LKML tests, or 3) rt_sig_test.c, based off the patch I wrote for illumos.

Acknowledgements

Thankyou to my friend Andrew Thompson for firing up his HP c8000 and SGI Octane to run tests on the proprietary Unices. Jared Morrow and Tom Santero for reviewing this post. Garret D’Amore for reviewing my illumos patch. And all those who provided input during my investigation: Bryan Cantrill, Bob Friesenhahn, Andrew Gabriel, Robert Mustacchi, and Rafel Vanoni.

References

CAS07
Gal95
IEEE13
Sal07a
Sal07b
Ste99