C11 Fails to Define Data Races Involving Interrupts

Posted on July 7th, 2017 by David Young
Posted in RV-Predict

Users of our data-race detector for C/C++ programs, RV-Predict/C, frequently tell us at Runtime Verification, Inc., (RV) that they would like to see a variant that detects data races between threads and interrupts in embedded systems. A solitary application thread may share data locations with one or more interrupt service routines (ISRs). Data accesses by the ISRs and the application thread may conflict. The conflicting accesses can lead to unexpected behavior in safety-critical software. Users hope to detect conflicting accesses using runtime verification, and fix their software to avoid the conflicts.

What does it mean to have a "data race" between a thread and an interrupt? We quickly realized at RV that we resorted to a vague analogy between threads and interrupts to define a data race. It was important to produce a formal definition. We are concerned with interrupts only in C programs, so we sought an authoritative definition in the C11 standard. We were surprised when we could not find one!

In the C11 standard1, data races are conflicting actions between threads. Section 5.1.2.4 paragraph 25 says:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

In the C standard, the closest thing to an ISR is an asynchronous signal handler. Nowhere does the standard speak of a data race between a asynchronous signal handler and the thread that it interrupts. Rather, the standard speaks of unspecified and undefined behavior by a signal handler. According to section 5.1.2.3 paragraph 5, the values of some objects are unspecified in a signal handler, other values are undefined, and with certain exceptions, modifying objects is undefined:

When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects that are neither lock-free atomic objects nor of type volatile sig_atomic_t are unspecified, and the value of any object that is modified by the handler that is neither a lock-free atomic object nor of type volatile sig_atomic_t becomes undefined.

Curiously, the standard repeats itself about unspecified/undefined behavior in signal handlers, in section 7.14.1.1 paragraph 5, adding some qualifications about storage duration:

If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static or thread storage duration that is not a lock-free atomic object other than by assigning a value to an object declared as volatile sig_atomic_t, or ...

The storage-duration qualification is perplexing. If referring to objects of static or thread storage duration in a signal handler is undefined, then surely referring to objects of allocated storage or to objects of automatic storage (other than the handler's own automatic storage), is undefined, too. §7.14.1.1 ¶ 5 probably just contains a typo.

Since the C11 standard provides no definition for data races between threads and interrupts, RV has let common sense and user needs guide us. Users are concerned with detecting accesses by both an interrupt and a thread when those accesses are both conflicting and asynchronous. Accesses by a thread and an interrupt conflict if both refer to the same object, and one access modifies the object. Every access by an interrupt is synchronous with respect to an access by a thread only if the interruption happens before the thread access, or the thread access happens before the interruption. Otherwise, the interrupt's accesses are asynchronous. When we speak of a "data race" involving an interrupt, we mean just these asynchronous, conflicting accesses. Figure 1 shows a signal handler that aborts the program if a shared variable, shared.count, exceeds 25. The signal handler is established on the alarm signal, SIGALRM. Figure 2 is a fragment of code that shows how a conflicting access to the shared variable can be synchronized with signals:

static struct { volatile int count; } shared = { .count = ATOMIC_VAR_INIT(false) }; static void alarm_handler(int signum __unused) { if (shared.count > 25) abort(); }

Figure 1: alarm_handler shares a global variable with the application

sigset_t blockingset, oldset; sigemptyset(&blockingset); sigaddset(&blockingset, SIGALRM); // save signal mask; block SIGALRM pthread_sigmask(SIG_BLOCK, &blockingset, &oldset); // modify shared variable shared.count++; // restore signal mask pthread_sigmask(SIG_SETMASK, &oldset, NULL);

Figure 2: synchronizing access between a thread and a signal

In recent work on RV-Predict/C, we have added new instrumentation and enhanced our causal model so that we can detect when programs that use interrupts or UNIX signals contain data races, according to our definition, above. In a future blog post, we will tell you more about this new development.

Notes

  1. Throughout this post I refer to the N1548 Committee Draft — December 2, 2010 ISO/IEC 9899:201x.