14% of SV-COMP’s “Correct Programs” are Undefined!

Capture

Last April (2016), I gave a tutorial on K at ETAPS'16 in Eindhoven, Netherlands, where I also demonstrated RV-Match.  During the week that I spent there, I heard several friends and colleagues who were involved with the Competition on Software Verification, SV-COMP, that some of the benchmark's correct programs appear to be undefined.  What?  So some of the assumed-correct C programs that are used to evaluate the best program verifiers in the world are actually wrong programs?  That immediately required for action, so I informed the RV-Match team, which took the challenge (special thanks to Manasvi Saxena, then an employee of RV, now a grad student at UIUC).  All their experiments are public and reproducible, you can find them on Github.

RV team's findings were almost unbelievable: in only a few hours of playing, they found that 14% of the supposedly correct programs were undefined! Recall that an undefined program is allowed, by virtue of the ISO C11 standard, to behave in any possible way when compiled and executed: it may or may not yield a compilation error (none of the SV-COMP programs had compilation errors), and then when executed may return 0, 1, 1000, etc., it can read and mail your password file, can erase your disk, etc., etc., it can literally do anything; see some funny examples of undefined programs in RV-Match's demo.  Therefore, 14% of the programs that the state-of-the-art program verifiers can prove correct, are in fact incorrect programs.  Just think about it ... Can we really trust any program verifiers for C these days?

The morale here is that a program verifier for a language, be it C or any other language, is only as good as the language model that it defines or encodes internally.  Defining C rigorously is a highly nontrivial task, but there is no other way around!  Program verifiers must define it if they want to claim they verify C and not some other toy-ish C-like language.  By allowing undefined programs in the SV-COMP benchmark the program verifiers are encouraged to take shortcuts wrt the correctness of their C model, which will ultimately lead to lack of trust in them.  The SV-COMP organizers should take undefinedness of their benchmark programs very seriously, if they expect their competition to be taken seriously.  In my view, they should have an undefinedness checking preliminary section, which each tool for C should be required to pass first, to make sure that the tool indeed speaks C.

matchIn the meanwhile, a simple solution is for the participating teams and organizers to simply run their benchmarks with RV-Match.  It is free for academia and research, after all.  And at our knowledge, it is the only tool of its kind that is based on a formal (operational) C semantics, so the participating teams can consult the semantics for answers if not sure about how C should behave in certain tricky cases.

 

Note (added after the article above was published): I was informed that the SV-COMP organizers do actually seriously consider eliminating the undefined behavior from their benchmark.  Also, that in the meanwhile the participating verifiers that check for undefined behavior have to switch this feature off in order to not experience a disadvantage (and that was criticized by the participants and needs to be addressed).  There also seems to be some confusion wrt what undefined behavior is, in that some participants think that it is about non-determinism: that is, undefined programs allow some non-deterministic behaviors that violate some property, so tools can simply disallow those non-deterministic paths/behaviors.  Note that undefined behavior is very tricky, to say the least, and has nothing to do with non-determinism.  Consider, for example, this basic undefined program:

int main() {
  int x = 0;
  return (x = 1) + (x = 2);
}

Believe it or not, when compiled with gcc this program returns 4 ('echo $?' to see the returned value):

$ gcc 1-unsequenced-side-effect.c
$ ./a.out ; echo $?
4

Now consider the following apparently equivalent program, at least in terms of non-determinism, which performs the assignment in a separate function:

int asgn(int *x, int v) {
  return (*x = v);
}

int main() {
  int x = 0;
  return asgn(&x,1) + asgn(&x,2);
}

This program is well-defined.  It has absolutely no problem whatsoever, albeit one could say that it allows the same number of paths as the previous program.  Why is this one defined?  Well, it is one of the particularities of C, of its ~200 different types of undefined behavior it allows by design (see the ISO C11 standard).

4 thoughts on “14% of SV-COMP’s “Correct Programs” are Undefined!

  1. Agreed, John, but note that kcc was run with the gcc profile enabled, so the C semantics used was customized to gcc wrt the implementation defined behaviors. Although some of the undefined behaviors were trickier (see https://github.com/runtimeverification/evaluation/tree/master/svcomp-benchmark/results for all of them) and thus it is not surprising that some of the tools missed them, most of them were in fact quite straightforward, such as using uninitialized variables. It is scary that the program verifiers evaluated on the benchmark didn't detect these trivial errors.

  2. It seems there are some false positives: reading an indeterminate value is only undefined behaviour when the object has automatic storage duration. Thus int *p=malloc(sizeof(int)); if(p && *p) { ... } is actually not undefined behaviour.

    • Thanks, that's an interesting case. It is a false positive with respect to GCC, but it's implementation-specified whether that code might have undefined behavior in ANSI C - also the detailed error message is bad, in not mentioning or citing the second source of undefined behavior that produces the resulting error code. You're quite right that merely reading an object of indeterminate value isn't inherently undefined behavior if it has allocated storage duration, but indeterminate values include trap representations (per 3.19.2), and it's implementation-specified whether int has trap representations (per 6.2.6.2); GCC specifies its integers have no trap representation ( https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Integers-implementation.html#Integers-implementation ). We aim to scrupulously reflect the C standard and find most potentially-undefined behavior, but moderating that by implementation specified behavior in the profiles is driven a bit more by customer demand - from a practical point of view, this sort of situation is probably unintentional even if not undefined.

Leave a Reply

Your email address will not be published. Required fields are marked *