Formally Verifying Loops: Part 1

Posted on September 26th, 2024 by Raoul Schaffranek

Last updated on October 7th, 2024

Posted in Smart Contracts, Verification, Kontrol

DALL·E 2024-09-25 17.50.48 - A futuristic rollercoaster with multiple loopings, representing symbolic execution and bounded loop unrolling. The rollercoaster track is set against .webp

This is the first of two posts explaining the formal verification of loops in Solidity and EVM smart contracts. It's surprising to many programmers why loops impose such a challenge in the field of formal verification. In this first post, we’ll introduce the path explosion problem and bounded loop unrolling as an incomplete yet practical solution. We will use powerful tools, such as the Certora Prover, Halmos, hevm, and Kontrol to formally reason about an example smart contract. Through this examination, we aim to understand the limitations of bounded loop unrolling in verifying properties of smart contracts with unbounded loops.

The second post explains loop invariants and how they overcome the reasoning gaps left behind by bounded loop unrolling. To illustrate loop invariants, we will use Kontrol to formally verify programs with unbounded loops.

In this blog post, you will learn:

The path explosion problem
What is loop unrolling, and where are the limitations?
Why symbolic execution usually has better coverage than fuzz testing
Why fuzz testing sometimes has better coverage than symbolic execution
How the Certora Prover, Halmos, hevm, and Kontrol tackle loops
Why some loops are invisible
Why young Carl Friedrich Gauss was not a likable schoolboy

A Word of Caution

It’s surprising to many developers new to formal verification that loops are notoriously challenging. To fully understand the problem's origins, we must go back to one of the most exciting chapters of math history: the time of David Hilbert, Alonzo Church, Kurt Gödel, and Alan Turing. It’s simultaneously one of the most inspiring stories - marking the beginning of the Computer Science era - with one of the most tragic twists, in which David Hilbert (undoubtedly one of the most brilliant logicians in the world) resigned. This blog post is not about that story. After all, one of the greatest stories of mathematics ever deserves to be told properly.

We only provide this background as a warning to avoid oversimplifying the complexity of loops. In fact, reasoning about loops is considered one of the most difficult challenges of formal verification. Heed our advice: do not go easy on loops; stay alert, and be cautious! Question your reasoning, don’t trust your intuition, and don’t blindly trust your formal verification tools.

The Path Explosion Problem

There are strong theoretical reasons that explain the difficulty of reasoning about loops, most notably Rice’s theorem which tells us that all non-trivial semantic properties of a program (such as termination) are undecidable, as well as practical limitations such as the path explosion problem. In this article, we focus on the more practical issues.

The path explosion problem occurs during symbolic execution when there are too many execution paths to consider. Let’s see how this problem usually unfolds.

First, we should consider the difference between concrete and symbolic execution of an if-statement to set the foundation for what happens in loop execution. When we execute an if-statement concretely, we evaluate the condition and then descend into the then-branch or the else-branch. When symbolically executing a program, things are a little different. Sometimes we encounter if-statements with symbolic conditions that we cannot simplify to true or false. In these cases, we must consider both alternatives - the then-branch and the else-branch. In other words, the concrete interpreter only looks at a single execution path, while the symbolic execution engine looks at both paths.

Similarly, when we interpret a loop concretely, we evaluate the condition and then descend into the loop body or we continue after the loop. When we are done executing the loop body, we repeat the process. When symbolically executing a loop with a condition that does not simplify to true or false, we descend into the loop body and continue after the loop. At the end of the loop iteration, we repeat the process. While the interpreter only looks at a single path (with potentially many loop iterations), the symbolic execution engine looks at all possible paths (with potentially many loop iterations). For a loop that can iterate between 0 and 10 times, we must explore 11 paths. For unbounded loops, we must explore infinitely many paths.

To make matters worse, nested ifs, loops, and recursive functions exponentially magnify the problem. For example, two nested if-statements result in 2^2 paths, three nested if-statements in 2^3 paths, and so on.

How can we solve this problem? The sobering truth is that theory forbids a solution that is both fully automated and guarantees 100% path coverage. In other words, all tools aiming to mechanize formal verification must compromise on either automation or the completeness of their reasoning. Of the four tools covered in this article, Kontrol is the only tool that offers two modi: One for automation - and one for coverage. All other tools only support automated reasoning.

A Running Example Inspired by Gauss

For the history fans we want to tell another story that is so small is practically an anecdote. Our protagonists are school-aged Carl Friedrich Gauss and his teacher Büttner. Even at this young age, Gauss showed signs of being a mathematical genius. The trouble with geniuses is that they get bored, and bored kids have ideas, and ideas - as all teachers know - tend to turn into trouble. So one day, when Gauss was showing signs of genius again, Büttner tasked him with an exercise that would hopefully keep him busy until noon (1). The task was summing up all the numbers from 1 to 100. But to Büttner's surprise, Gauss could tell him the correct result after just a few minutes: 5050 (another unlikable thing about geniuses - they tend to outsmart their teachers). How did Gauss arrive at this result so quickly? The myth says that Gauss discovered the following closed formula to sum up the first n numbers:

$\forall n \in \mathbb{N}: \sum_{i = 0}^{n}i = \frac{n(n+1)}{2}$

So he only needed to substitute n for 100 to arrive at the result:

$\sum_{i=0}^{100} i = \frac{100(100+1)}{2} = 5050$

(1) The adult Gauss, who’s coincidentally the only source of this anecdote, would probably disagree with this framing

Gauss’ Looping Problem in Solidity

How does all this relate to smart contracts and formal verification? The key insight here is that Gauss bypassed an iterative loop (sum up N numbers) by reducing it to a closed formula. In other words, Gauss identified a loop invariant. We make this connection more precise in the second article of this series. For now, we’re just interested in this example due to its simplicity.

Let’s translate Gauss’ observation into Solidity. We start by encoding the iterative algorithm, to sum up the first N natural numbers. The algorithm that Gauss never executed to 100.

function sumToN(uint256 n) public pure returns (uint256) {
    uint256 result = 0;
    uint256 i = 0;
    while (i < n) {
        i = i + 1;
        result = result + i;
    }
    return result;
}

This is a classic textbook example of an unbounded loop. Notice that unbounded is not the same as an infinite loop. Unbounded means that the number of loop iterations is unknown at compile-time and depends on some inputs.

Next, we translate Gauss’ closed formula to Solidity - the shortcut Gauss used when outsmarting his teacher. This function is also known as the triangle number.

function triangleNumber(uint256 n) public pure returns (uint256) {
    return n * (n + 1) / 2;
}

Finally, let’s write a property test to show the equivalence of these functions.

function check_equivalence(uint256 n) public pure returns (bool) {
    vm.assume(n < 2**128); // prevent overflow
    assert(sumToN(n) == triangleNumber(n));
}

The property test is directly executable with Halmos, hevm, and Kontrol. Later, we will also translate this property to the Certora Verification Language (CVL).

The full source code used in this blog post series can be found on GitHub.

An Automated Approach: Bounded Loop Unrolling

Formal Verification is often associated with mathematical proofs. The goal is to prove that a program (smart contract) satisfies a formal specification. However, it should be noted that in the presence of loops, most formal verification tools do not produce irrefutable proofs in the mathematical sense, but fall back to weaker forms of reasoning - one of which is bounded loop unrolling. For the sake of this article, we maintain a clear distinction between the words proving, which we reserve for irrefutable mathematical proofs, and reasoning, which we use more broadly for less rigorous arguments.

As we previously mentioned, no fully automated and exhaustive solution exists to overcome the path-explosion problem, and most currently available tools favor automation over coverage. Applied to looping programs, they will limit the number of loop iterations, thereby enforcing termination and potentially cutting off reachable program paths.

Let’s look at a visualization of loop unrolling:

A visualization of loop unrolling

As you can see the control flow graph on the right-hand side branches on every loop iteration. The left branch jumps into the loop body, the right branch jumps out of the loop. For unbounded loops, this process could be repeated infinitely - even though the loop is guaranteed to terminate for each concrete input. Bounded loop unrolling just aborts this process after a finite number of iterations.

This technique is practiced by the Certora Prover, Halmos, and hevm. Kontrol does not bound loops by default - but has opt-in support. The preferred usage of Kontrol is based on loop invariants (addressed in part two of this series). Bounded loop unrolling is a prime example of a formal reasoning technique that does not produce irrefutable proofs.

Let’s take a closer look at the existing EVM symbolic execution engines and see how they respond to unbounded loops.

Halmos

The check_equivalence function is directly executable with Halmos.

$ halmos --match-contract GaussSpec --match-test equivalence

Running 1 tests for src/GaussSpec.sol:GaussSpec
[PASS] check_equivalence(uint256) (paths: 8, time: 5.56s, bounds: [])
WARNING:halmos:check_equivalence(uint256): paths have not been fully explored due to the loop unrolling bound: 2
(see https://github.com/a16z/halmos/wiki/warnings#loop-bound)
Symbolic test result: 1 passed; 0 failed; time: 5.56s

Halmos tells us that it only executed the loop twice and then stopped exploring, although it found some inputs that would require more loop iterations. We can increase the loop unrolling limit with the --loop parameter. For larger values of --loop the Halmos team recommends using the ycies-smt2 solver:

$ halmos --match-contract GaussSpec --match-test equivalence --solver-command yices-smt2 --loop 256

Running 1 tests for src/GaussSpec.sol:GaussSpec
Generating SMT queries in /tmp/check_equivalence-324ee4ff94f748c49ec4e3c24616c426
[PASS] check_equivalence(uint256) (paths: 540, time: 3.98s, bounds: [])
WARNING:halmos:check_equivalence(uint256): paths have not been fully explored due to the loop unrolling bound: 256
(see https://github.com/a16z/halmos/wiki/warnings#loop-bound)
Symbolic test result: 1 passed; 0 failed; time: 4.00s

Pay attention to the explored paths (8 vs 540) and wall-time (5.56s vs 4.00s).

hevm

hevm uses an eager approach to symbolic execution - which means it first explores all paths and only later filters out unreachable ones. Because hevm doesn't come with a default loop unrolling limit, we must explicitly set a boundary with the --max-iterations parameter or we will run out of resources.

$ BYTECODE=$(jq .deployedBytecode.object -r out/GaussSpec.sol/GaussSpec.json)
$ hevm symbolic --sig "check_equivalence(uint256)" --code $BYTECODE --max-iterations 2

Exploring contract
Simplifying expression
Explored contract (11 branches)

WARNING: hevm was only able to partially explore the given contract due to the following issues:

  - Max Iterations Reached in contract: symbolic(entrypoint) pc: 860

Checking for reachability of 3 potential property violation(s)

QED: No reachable property violations discovered

And just like halmos, hevm warns us because it cut off reachable paths.

Kontrol

Kontrol also does not cap loops by default, but we can specify the --bmc-depth parameter to enforce it. The preferred way to use Kontrol is based on loop invariants instead of loop unrolling (covered in the next episode).

$ kontrol build
$ kontrol prove --match-test GaussSpec.check_equivalence --bmc-depth 3

🏃 Running Kontrol proofs 🏃 
 Add `--verbose` to `kontrol prove` for more details!
Selected functions: src%GaussSpec.check_equivalence(uint256)
Running setup functions in parallel: 

Running test functions in parallel: src%GaussSpec.check_equivalence(uint256)
WARNING 2024-09-25 15:21:54,573 pyk.proof.reachability - Bounded node src%GaussSpec.check_equivalence(uint256):0: 21 at bmc depth 3                                 
  0:01:42 src%GaussSpec.check_equivalence(uint256):0 Finished PASSED: 21 nodes: 0 pending|4 passed|0 failing|0 vacuous|0 refuted|0 stuck
✨ PROOF PASSED ✨ src%GaussSpec.check_equivalence(uint256):0
⏳ Time: 1m 43s ⏳

Kontrol also warns us about the cut-off program path.

Certora Prover

The Certora Prover uses a custom specification language called Certroa Verification Language (CVL) instead of Solidity tests as a specification format. This means we must first rewrite our test in CVL before feeding it into the Certora Prover. That said, the translation of our example to CVL is straightforward:

rule check_equivalence(uint n) {
    env e;
    require e.msg.value == 0;
    require n < 2^128; // Prevent overflow
    mathint sum_to_n = sumToN(e, n);
    mathint triangle_number = triangleNumber(e, n);
    assert sum_to_n == triangle_number;
}

We can then invoke the Cetrora Prover as follows:

$ certoraRun src/Gauss.sol --verify Gauss:src/GaussCertora.spec --solc solc

Connecting to server...

Job submitted to server

Manage your jobs at https://prover.certora.com
Follow your job and see verification results at https://prover.certora.com/output/471186/f2396ce7d4434238a6ccd5f76bf805ff?anonymousKey=7d7fe223476b6bfc5216d1fd8e7b7099614ea5c5

The command queues a job on the Certora server and prints a link to the console. After the job is executed, we can use the link to inspect the result via a convenient web interface:

The Certora Prover informs us about a failed loop has terminated-assertion. This is an artificial assertion that Certora Prover injects to warn us about reachable paths being cut off. It’s important to note that this is not a user-supplied assertion and doesn’t mean it found a counter-example that falsifies the original claim. It’s the way the Certora Prover tells us it gave up after 1 iteration. The limit can be increased using the --loop_iter parameter.

Comparison to Property Testing

Bounding the loop at X iterations means excluding all inputs N > X from our reasoning. In other words, the symbolic execution engines we used here give us very weak correctness guarantees. Setting the loop bound to 2 (halmos' default) means we only cover the input values 0, 1, and 2. If we set the loop bound to 1 (Certora's default) we only covered inputs values 0 and 1. In both cases, we could just have used Foundry to arrive at the same level of confidence, only much faster.

But this is not the general case. Usually, a program has more than one input or state variable, and only some variables cause unbounded loops. Let's say, for example, we have a function foo(uint256 a, uint256 b, uint256 c) with an unbounded loop that only depends on c. In this case, bounded symbolic execution can usually explore 100% of the values for a and b, but only a handful of values for c. In contrast, a fuzzer will sample a small fraction of possible values for a, b, and c - yet in absolute numbers, it might get better coverage for c with some thousand samples. That's why it’s highly advisable to use formal verification always in conjunction with unit and random testing. Halmos, hevm, and Kontrol make that particularly easy, as they can consume Foundry tests. Conversely, foundry now also supports Kontrol-specific cheatcodes - increasing the compatibility between the two tools even further. If we want to get to 100% coverage for c we must resort to more rigorous methods based on loop invariants - a feature that is currently only offered by Kontrol.

Hidden Loops

Be aware that your EVM bytecode might contain loops, even if your Solidity code does not. The compiler frequently injects bytecode-level loops to copy data from one memory location to another. As a rule of thumb, if your contract makes use of strings, arrays, bytes or some other dynamically sized data structure, then there will be byte-code level loops! Further indicators of hidden loops are recursive and external calls.

Finally, there is another profound implication: Provers that solely depend on bounded loop unrolling generally cannot prove contract invariants. Any attempt to prove the contract invariant must eventually prove a subgoal about the unbounded loop, so the reasoning gap expands from loops to contract invariants.

Conclusion

Symbolic execution engines use bounded loop unrolling to terminate the reasoning process forcefully. The engine stops executing loops after a pre-defined amount of iterations. All tools presented here support loop unrolling, allow us to configure the number of iterations, and warn us when they reach the iteration bound. Forcefully bounding loops preserves automation but can cut off reachable program paths from the reasoning and hence miss bugs and vulnerabilities in unexplored paths.

In the next article in this series, we will cover loop invariants as a complementary technique to bounded loop unrolling. Loop invariants allow us to explore the full state space of looping programs, and not just a fraction of it. This gives us much stronger correctness guarantees but also requires some additional work.