Detecting popular data races in Java using RV-Predict

Posted on July 25th, 2015 by Yilong Li

Data races are a common kind of concurrency bug in multithreaded applications. A data race can be defined as two threads accessing a shared memory location concurrently and at least one of the accesses is a write. Data races are notoriously difficult to find and reproduce because they often happen under very specific circumstances. Therefore, you could have a successful pass of the tests most of the time but some test fails once in a while with some mysterious error message far from the root cause of the data race.

Despite all the effort on solving this problem, it remains a challenge in practice to detect data races effectively and efficiently. RV-Predict aims to change this undesired situation. In this blog post, we will summarize some of the most popular kinds of data races in Java and show you how to catch them using RV-Predict. The examples in this post are included in the RV-Predict distribution under the examples/ directory.

1. Simple race

The simplest data race is also the most frequent one in practice: two threads accessing a shared variable without any synchronization. In Java, a shared variable is either a field (instance or static) or an array element. See JLS section 17.4.1 for the precise definition.

Consider the following code:

package examples;

public class SimpleRace {

    static int sharedVar;

    public static void main(String[] args) {
        new ThreadRunner() {
            @Override
            public void thread1() {
                sharedVar++;
            }

            @Override
            public void thread2() {
                sharedVar++;
            }
        };
    }
}

Here, ThreadRunner (defined below) is a utility class containing boilerplate code that instantiates two threads with the defined tasks for us. We are going to use it through out this blog post to simplify our example code.

package examples;

public abstract class ThreadRunner {

    private final Thread thread1;

    private final Thread thread2;

    public abstract void thread1();

    public abstract void thread2();

    public ThreadRunner() {
        thread1 = new Thread(new Runnable() {
            @Override
            public void run() {
                thread1();
            }
        });
        thread2 = new Thread(new Runnable() {
            @Override
            public void run() {
                thread2();
            }
        });
        thread1.start();
        thread2.start();
    }

}

Running RV-Predict on this example gives the following race report immediately:

Data race on field examples.SimpleRace.sharedVar: {{{
    Concurrent write in thread T10 (locks held: {})
 ---->  at examples.SimpleRace$1.thread1(SimpleRace.java:11)
        at examples.ThreadRunner$1.run(ThreadRunner.java:17)
    T10 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:26)

    Concurrent read in thread T11 (locks held: {})
 ---->  at examples.SimpleRace$1.thread2(SimpleRace.java:16)
        at examples.ThreadRunner$2.run(ThreadRunner.java:23)
    T11 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:27)
}}}

Well, this one is trivial to spot. But even such simple mistake can become a real pain in the neck for debugging when you are working with thousands of lines of code. Luckily, RV-Predict can make your life much simpler by telling you the precise location where the conflicting accesses occur, the stack traces of the two threads, who created these threads, and the locks held by each thread. To fix this data race, you can either protect the shared variable with a lock or simply replace it with an atomic variable.

2. Using non-thread-safe class without synchronization

Many classes are not designed to be used in a multithreaded environment, e.g., java.util.ArrayList, java.util.HashMap, and many other classes in the Java Collections Framework.

Start with a simple example:

package examples;

import java.util.ArrayList;
import java.util.List;

public class RaceOnArrayList {

    static List<Integer> list = new ArrayList<>();

    public static void main(String[] args) {
        new ThreadRunner() {
            @Override
            public void thread1() {
                list.add(0);
            }

            @Override
            public void thread2() {
                list.add(1);
            }
        };
    }
}

Both threads are trying to add an element to the ArrayList without synchronization. Running it with vanilla java, you might get a ConcurrentModificationException, if you are lucky, letting you know something is wrong. But there is no guarantee. With RV-Predict, it catches the data race without effort:

Data race on field java.util.ArrayList.$state: {{{
    Concurrent read in thread T10 (locks held: {})
 ---->  at examples.RaceOnArrayList$1.thread1(RaceOnArrayList.java:14)
        at examples.ThreadRunner$1.run(ThreadRunner.java:17)
    T10 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:26)

    Concurrent write in thread T11 (locks held: {})
 ---->  at examples.RaceOnArrayList$1.thread2(RaceOnArrayList.java:19)
        at examples.ThreadRunner$2.run(ThreadRunner.java:23)
    T11 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:27)
}}}

Note that RV-Predict reports the race on an imaginary field ArrayList.$state rather than the actual racey fields inside class ArrayList. Indeed, RV-Predict's error messages deliberately abstract away from the low-level implementation details of the Java class library, because users found it much easier to identify the root cause of the data race. To fix this example, you can either synchronize the calls by yourself or, better, create the list inside a thread-safe wrapper using Collections.synchronizeList(new ArrayList<>()).

To make things more interesting, here is another example:

package examples;

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class RaceOnSynchronizedMap {

    static Map<Integer, Integer> map = Collections.synchronizedMap(new HashMap<>());

    public static void main(String[] args) {
        new ThreadRunner() {
            @Override
            public void thread1() {
                map.put(1, 1);
            }

            @Override
            public void thread2() {
                Set<Integer> keySet = map.keySet();
                synchronized (keySet) {
                    for (int k : keySet) {
                        System.out.println("key = " + k);;
                    }
                }
            }
        };
    }
}

Now map is already a synchronized map backed by a HashMap. When the first thread puts a key-value pair into the map, the second thread acquires the monitor of keySet and iterates over the key set of the map. Hmmm, is it thread-safe? Unfortunately, no. The documentation of Collections.synchronizedMap states that "it is imperative that the user manually synchronize on the returned map when iterating over any of its collection views". Thread 2 in the above example incorrectly synchronizes on keySet instead of map. As you can see in the following race report, the two threads are indeed holding different monitors.

Data race on field java.util.HashMap.$state: {{{
    Concurrent write in thread T10 (locks held: {Monitor@722c41f4})
 ---->  at examples.RaceOnSynchronizedMap$1.thread1(RaceOnSynchronizedMap.java:16)
        - locked Monitor@722c41f4 at examples.RaceOnSynchronizedMap$1.thread1(RaceOnSynchronizedMap.java:16)
        at examples.ThreadRunner$1.run(ThreadRunner.java:17)
    T10 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:26)

    Concurrent read in thread T11 (locks held: {Monitor@1f72ae1d})
 ---->  at examples.RaceOnSynchronizedMap$1.thread2(RaceOnSynchronizedMap.java:23)
        - locked Monitor@1f72ae1d at examples.RaceOnSynchronizedMap$1.thread2(RaceOnSynchronizedMap.java:22)
        at examples.ThreadRunner$2.run(ThreadRunner.java:23)
    T11 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:27)
}}}

The fix is as simple as replacing keySet at line#22 with map.

3. Broken spinning loop

Sometimes we want to synchronize multiple threads based on whether some condition has been met. And it's a common pattern to use a while loop that repeatedly checks that condition:

package examples;

public class BrokenSpinningLoop {

    static int sharedVar;

    static boolean condition = false;

    public static void main(String[] args) {
        new ThreadRunner() {
            @Override
            public void thread1() {
                sharedVar = 1;
                condition = true;
            }

            @Override
            public void thread2() {
                while (!condition) {
                    Thread.yield();
                }
                if (sharedVar != 1) {
                    throw new RuntimeException("How is this possible!?");
                }
            }
        };
    }
}

As you probably notice immediately, there is a data race on condition. But what harm does it do? After all, in Java, reads and writes for fields that are 32-bit or fewer and object references are atomic (JLS section 17.7). So it's not that you can have a corrupted value of condition that messes up the program. Well, the problem here is that when thread 2 passes the while loop, it could still read value 0 instead of 1 from sharedVar due to several reasons including instruction reordering and caching effect. To keep it simple, the Java Memory Model (JMM) only allows such counter-intuitive situations to happen when the program contains data races. To ensure that the write to sharedVar in thread 1 is visible to the read in thread 2, the easiest way is to declare variable condition as volatile. For a detailed explanation of what volatile means in Java, we strongly recommend this blog post by Jeremy Manson. In the meanwhile, be careful with the so-called "benign data races", because they can be very treacherous; better avoid them all-together in your code.

For a real life example, this stackoverflow question describes a concurrency bug that eventually costs 12 million dollars! Too bad there was no RV-Predict back in 2013.

4. Write under reader lock

This usually happens when you use a java.util.concurrent.locks.ReadWriteLock to increase the level of concurrency but mistakenly write to the protected data under read mode:

package examples;

import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public class WriteUnderReadLock {

    static int sharedVar;

    static ReadWriteLock lock = new ReentrantReadWriteLock();

    public static void main(String[] args) {
        new ThreadRunner() {
            @Override
            public void thread1() {
                lock.readLock().lock();
                sharedVar++;
                lock.readLock().unlock();
            }

            @Override
            public void thread2() {
                lock.readLock().lock();
                sharedVar++;
                lock.readLock().unlock();
            }
        };
    }
}

RV-Predict's race report clearly states that while the two threads are holding the same lock, the lock is under read mode:

Data race on field examples.WriteUnderReadLock.sharedVar: {{{
    Concurrent write in thread T12 (locks held: {ReadLock@3fb20f62})
 ---->  at examples.WriteUnderReadLock$1.thread1(WriteUnderReadLock.java:17)
        - locked ReadLock@3fb20f62 at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock(ReentrantReadWriteLock.java:n/a) 
        at examples.ThreadRunner$1.run(ThreadRunner.java:64)
    T12 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:73)

    Concurrent read in thread T13 (locks held: {ReadLock@3fb20f62})
 ---->  at examples.WriteUnderReadLock$1.thread2(WriteUnderReadLock.java:24)
        - locked ReadLock@3fb20f62 at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock(ReentrantReadWriteLock.java:n/a) 
        at examples.ThreadRunner$2.run(ThreadRunner.java:70)
    T13 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:74)
}}}

Note that RV-Predict also reports the acquisition event of the ReadLock together with the stack trace, just the same as intrinsic monitor locks.

5. Double-checked locking

Double-checked locking is a pattern for implementing lazy initialization in a multithreaded environment. However, it will not work reliably in Java when implemented in its original form such as in the following example:

package examples;

public class DoubleCheckedLocking {

    static class Helper {
        Object data;

        Helper() {
            data = new Object();
        }
    }

    public static void main(String[] args) {
        new ThreadRunner() {

            private Helper helper;

            private Helper getHelper() {
                if (helper == null) {
                    synchronized (this) {
                        if (helper == null) {
                            helper = new Helper();
                        }
                    }
                }
                return helper;
            }

            @Override
            public void thread1() {
                getHelper();
            }

            @Override
            public void thread2() {
                getHelper();
            }
        };
    }
}

Similar to the broken spinning loop example, the data race on the helper field can cause very confusing results: a thread can see a non-null helper field but also a null value of the data field in the Helper object. This is a rather rare situation, which is precisely what makes it hard to debug. Yet, RV-Predict easily detects the race from one execution:

Data race on field examples.DoubleCheckedLocking$1.helper: {{{
    Concurrent read in thread T11 (locks held: {})
 ---->  at examples.DoubleCheckedLocking$1.getHelper(DoubleCheckedLocking.java:19)
        at examples.DoubleCheckedLocking$1.thread2(DoubleCheckedLocking.java:36)
        at examples.ThreadRunner$2.run(ThreadRunner.java:23)
    T11 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:27)

    Concurrent write in thread T10 (locks held: {Monitor@3ae72373})
 ---->  at examples.DoubleCheckedLocking$1.getHelper(DoubleCheckedLocking.java:22)
        - locked Monitor@3ae72373 at examples.DoubleCheckedLocking$1.getHelper(DoubleCheckedLocking.java:20)
        at examples.DoubleCheckedLocking$1.thread1(DoubleCheckedLocking.java:31)
        at examples.ThreadRunner$1.run(ThreadRunner.java:17)
    T10 is created by T1
        at examples.ThreadRunner.<init>(ThreadRunner.java:26)
}}}

One way to patch this broken implementation is of course to declare helper as volatile. Yet there is a better way: making the Helper class immutable by declaring its data field as final. The semantics of final field guarantees that all threads will always see the correctly initialized value of the final field even if a data race is used to pass references to the immutable object between threads. If you go with the first fix, RV-Predict will, of course, report no races afterwards. However, if you go with the second fix, RV-Predict will still report data races on the helper field, because the program, despite being correct, indeed contains data races. Note that this is probably the only approved pattern of benign races in Java, it's very rare you want to take advantage of it in your code. In case you are absolutely certain about what you are doing, you can suppress the race reports on a certain field in RV-Predict with option "--suppress ", e.g., "--suppress examples.DoubleCheckedLocking$1.helper".

For the tiny examples in this post, it is easy enough to identify the data race pattern by simply reading the code. However, when you are dealing with a large project that can make use of all kinds of synchronization mechanisms, the debugging procedure becomes much much more difficult and often requires a thorough understanding of the system. Luckily, RV-Predict can help you with this task regardless of the code size of the project. In the future posts, we will show you how easy it is to use RV-Predict to find rare data races in real-world applications.