Using Kontrol to Tackle Complexities Caused by Dynamically-Sized Constructs

Posted on August 19th, 2024 by Petar Maksimović

opt2 twitter.png

This is the second post of a three-part series about our recent Optimism engagement, in which we verified their pausability mechanism for L1 contracts. This installment explains how Kontrol can be used to tackle the complexities caused by dynamically-sized constructs and the challenges associated with the loops that result from them.

In real-world scenarios, Solidity programmers commonly use dynamically-sized constructs, such as strings, bytes, and arrays. Establishing properties of code involving these constructs comes with an additional dimension of complexity, as the associated algorithms normally use loops to traverse these structures. For example, if we are working with an array of buffers (`bytes[]`), we could easily end up having two nested loops, the outside one on the number of elements and the inside one on the length of each element. This further weakens the assurances that can be obtained from concrete unit testing or fuzzing, and even advanced techniques such as symbolic analysis find this reasoning highly challenging.

The remainder of this blog focuses on symbolic execution, as this is the main program analysis technique offered by Kontrol. In this context, the literature offers two approaches to addressing the looping challenges, both of which are supported by Kontrol:

  • unrolling loops to a given bound, which is easy to implement but increases the number of paths symbolic execution has to traverse and limits the correctness guarantees provided, and
  • using loop invariants and circular reasoning, which establishes full correctness but is difficult to implement and reason about automatically.

The sections that follow showcase how we combined these two approaches to perform formal verification of real-world Optimism code, focusing on the two main challenges we encountered and giving a perspective on the limits of unbounded reasoning about EVM bytecode.

The Setting: Proving correctness of the OptimismPortal pausability mechanism

One of the key tasks in our engagement with Optimism was to prove the correctness of the pausability mechanism of the Optimism portal: specifically, that when the portal is paused, no withdrawal can be proven. We will use this task to demonstrate what it means to reason symbolically about dynamically-sized constructs. In particular, we will focus on the test that captures the desired correctness property, which we slightly stylize for the purposes of this presentation:

function test_proveWithdrawalTransaction_paused( Types.WithdrawalTransaction memory _tx, uint256 _l2OutputIndex, Types.OutputRootProof calldata _outputRootProof, bytes[] memory _withdrawalProof ) internal { setUpInlined(); // When: the Optimism Portal is paused vm.prank(optimismPortal.guardian()); superchainConfig.pause("identifier"); // Then: proving a withdrawal transaction reverts vm.expectRevert(abi.encodeWithSelector(CallPaused.selector)); optimismPortal.proveWithdrawalTransaction(_tx, _l2OutputIndex, _outputRootProof, _withdrawalProof); }

and the associated WithdrawalTransaction structure that is used by the test:

struct WithdrawalTransaction { uint256 nonce; address sender; address target; uint256 value; uint256 gasLimit; bytes data; }

Proving the above test in Kontrol for the appropriate values of the function parameters using symbolic execution poses two technical challenges related to the handling of the dynamic components of these parameters; specifically:

  1. the copying of the `data` field of the `WithdrawalTransaction` structure, which can be of arbitrary length; and
  2. the overall handling of the `bytes[] memory _withdrawalProof` array of buffers, which, by construction, may have at most 10 elements, each of which can be at most of length 600.

Challenge 1: memory-to-memory copying of symbolic bytes

When Kontrol runs a symbolic test, such as test_proveWithdrawalTransaction_paused shown above, it does so by making all of the parameters symbolic. For struct parameters, this works by making all of the fields symbolic. In particular, this means that the Types.WithdrawalTransaction memory _tx parameter will hold a byte array bytes data, with symbolic content of arbitrary length. This, in itself, does not pose an issue for Kontrol, as it is able to reason automatically and efficiently about symbolic byte arrays.

However, in preparation of the function call to optimismPortal.proveWithdrawalTransaction, _tx, has to be copied into the memory that will serve as the corresponding call data. The Solidity compiler 1 handles this copying for the data field of _tx by using a loop that copies 32 bytes at a time, the Yul code for which is

function copy_memory_to_memory(src, dst, length) { let i := 0 for { } lt(i, length) { i := add(i, 32) } { mstore(add(dst, i), mload(add(src, i))) } if gt(i, length) { // clear end mstore(add(dst, length), 0) } }

meaning that we have on our hands a loop on the unbounded length of data, which creates an infinite loop in symbolic execution.

The way that we resolve this issue is by writing the appropriate loop invariant, showcasing the expressiveness of K and maintaining full generality of the correctness guarantees provided with respect to _tx. For the interested reader proficient in the low-level workings of the EVM and KEVM, we give the loop invariant below, written as a KEVM rule, inlining the (still highly technical) explanations:

rule [copy-memory-to-memory-summary]: <k> #execute ... </k> // Gas calculations are turned off and the Ethereum fork used is Shanghai <useGas> false </useGas> <schedule> SHANGHAI </schedule> // The jump destinations, program, and program counter are all symbolic <jumpDests> JUMPDESTS </jumpDests> <program> PROGRAM </program> // The program counter starts from PCOUNT when the loop starts and ends // at PCOUNT +Int 53 when the loop finishes <pc> PCOUNT => PCOUNT +Int 53 </pc> // The word stack has the appropriate form, as per the compiled code <wordStack> LENGTH : _ : SRC : DEST : WS </wordStack> // The program copies LENGTH bytes of memory from SRC +Int 32 to DEST +Int OFFSET, // padded with zeros in case LENGTH is not divisible by 32. The `notMaxUInt5` // macro denotes the binary number `11111111111111111111111111100000`, which is // effectively a mask isolating the first 27 bits of a 32-bit integer and is used // for rounding the length of the copied memory up to a multiple of 32. <localMem> LM => LM [ DEST +Int 32 := #range ( LM, SRC +Int 32, LENGTH ) +Bytes #buf ( ( ( notMaxUInt5 &Int ( LENGTH +Int 31 ) ) -Int LENGTH ) , 0 ) +Bytes #buf ( ( ( ( 32 -Int ( ( notMaxUInt5 &Int ( LENGTH +Int 31 ) ) -Int LENGTH ) ) ) modInt 32 ), 0 ) ] </localMem> requires // The current program we are executing differs from the original one (written as the // Bytes literal b"..."), only in the hardcoded jump addresses, now relative to PCOUNT #range(PROGRAM, PCOUNT, 53) ==K b"`\x00[\x81\x81\x10\x15b\x00\x81`W` \x81\x85\x01\x81\x01Q\x86\x83\x01\x82\x01R\x01 b\x00\x81BV[\x81\x81\x11\x15b\x00\x81sW`\x00` \x83\x87\x01\x01R[P" [ 08 := #buf(3, PCOUNT +Int 32) ] [ 28 := #buf(3, PCOUNT +Int 2) ] [ 38 := #buf(3, PCOUNT +Int 51) ] // Various well-formedness constraints that ensure that the overflow checks introduced // by the compiler pass. The program counter has to be non-negative, the word stack must // not go beyond its maximum size of 1024, and the indices into the memory have to be // smaller than 2^63, which is a reasonable constraint as EVM memories never reach // that size. andBool 0 <=Int PCOUNT andBool 0 <=Int LENGTH andBool LENGTH <Int 2 ^Int 63 andBool 0 <=Int SRC andBool SRC <Int 2 ^Int 63 andBool 0 <=Int DEST andBool DEST <Int 2 ^Int 63 andBool #sizeWordStack(WS) <=Int 1015 // No overlap between source and destination andBool SRC +Int LENGTH <=Int DEST // Destination starts within current memory andBool DEST <=Int lengthBytes(LM) // All jump destinations in the program are valid andBool (PCOUNT +Int 2) in JUMPDESTS andBool (PCOUNT +Int 32) in JUMPDESTS andBool (PCOUNT +Int 51) in JUMPDESTS // and fit into two bytes andBool PCOUNT +Int 51 <Int 2 ^Int 16

While we are able to construct and use this invariant successfully in our proofs, there still remain a number of issues that make invariant inference and general reasoning about them difficult. Chief among them is the instability of compiled code, in that there are no guarantees that if the Solidity compiler compiled a certain piece of code one way it would not compile it in a different way if, for example, the rest of the code changes or if it uses more or fewer optimization cycles. This remains the case even if that code is low-level internal Yul code, such as the copy_memory_to_memory function. Luckily, loops of this particular form no longer occur in the Dencun Ethereum fork given the introduction of the MCOPY instruction, which is able to copy arbitrary memory ranges and which is used starting from version 0.8.25 of the Solidity compiler.

Challenge 2: bytes[] memory _withdrawalProof

The second challenge we faced was to run the above-mentioned `test_proveWithdrawalTransaction_paused` test for all possible values of the `bytes[] memory _withdrawalProof` parameter, which, according to the specification given to us by Optimism, may have between 0 and 10 elements, each of which may be at most 600 bytes long.

For this part, we opted to use bounded loop unrolling and fix the length of each element of the array; in particular: 

  • we run 11 separate proofs, corresponding to the number of elements of `_withdrawalProof` ranging from 0 to 10 inclusive; and
  • we set the length of each element in each of the proofs to the maximal possible length, 600; and
  • we make the contents of each of these elements fully symbolic.

In support of our strategy, we leveraged devdoc annotations for easy customization of byte arrays and arrays of byte arrays. For example, providing the following annotations:

/// @custom:length _withdrawalProof: 5, /// @custom:element-length _withdrawalProof: 600, function prove_proveWithdrawalTransaction_paused

will run a version of the test in which _withdrawalProof has 5 elements of fixed length 600.

We chose this approach for the following reasons:

  • it allowed us to extend Kontrol with basic symbolic reasoning about dynamic data structures, a feature that, to our knowledge, is not supported by the other tools;
  • we were confident that we would be able to implement it in the time allocated for the engagement and that the resulting proofs would run in acceptable times as part of the Optimism CI; and
  • it allowed us to strike a balance between the fully concrete approach, which would not be expressive enough, and the fully symbolic approach, which was not feasible at this point due to the instability of compiled EVM loop code and the amount of loop invariants that would have to be provided in absence of the `MCOPY` instruction.

Towards fully abstract reasoning about dynamic data structures

As part of the future development of Kontrol, we plan to develop abstractions that would allow us to handle dynamic data structures fully symbolically in our tests. These abstractions will need to capture the exact shape of these structures in memory, and will require numerous additional lemmas to be operational, but are nonetheless supported in principle by the expressivity of the K framework. As an illustration, we give a sketch of one potential such abstraction:

#bytesArray(M, START, SIZE, CONTENTS)

which states that the memory M contains an array of byte arrays starting from offset START, with SIZE elements that are stored at offsets OFFSETS and hold contents CONTENTS. This abstraction could be defined as follows:

syntax Bool ::= #bytesArray ( Bytes, Int, Int, List, List ) [function] rule #bytesArray(M, START, SIZE, OFFSETS, CONTENTS) => // Initial 32 bytes denote the length of the array SIZE ==Int #range(M, START, 32) // the next 32 *Int SIZE bytes are reserved for the offsets of the elements andBool #bytesArrayOffsets(M, START +Int 32, SIZE, OFFSETS) // and the element contents are captured by CONTENTS andBool #bytesArrayContents(M, START, OFFSETS, CONTENTS) [simplification]

where then the #bytesArrayOffsets and #bytesArrayContents abstractions would be defined recursively to capture the precise structure of the offsets and elements in memory, as described by the official EVM documentation.

If you have any questions or want to know more about Kontrol or the engagement, you can reach our team directly in Discord. We will publish our third and last part of this series really soon!

Footnotes

  1. More precisely, the version of the Solidity compiler that the Optimism code uses.