End-to-End Formal Verification of Ethereum 2.0 Deposit Smart Contract

Posted on January 20th, 2020 by Daejun Park

Posted in K, News, Smart Contracts, Verification

We, at Runtime Verification, are happy to report our successful completion of formal verification of the Ethereum 2.0 deposit contract, arguably one of the most important smart contracts to be deployed this year for the Ethereum community.

Deposit Contract

The deposit contract is a gateway to join Ethereum 2.0. To be validators who drive the entire Proof-of-Stake (PoS) chain, called Beacon chain, of Ethereum 2.0, you need to deposit a certain amount of Ether, as a "stake", by sending a transaction (over the Ethereum 1.x network) to the deposit contract. The deposit contract records the history of deposits, and locks all the deposits in the Ethereum 1.x chain, which can be later claimed at the Beacon chain of Ethereum 2.0. Note that at an early stage of Ethereum 2.0, the deposit contract is a one-way function; you can move your funds from Ethereum 1.x to Ethereum 2.0, but not vice versa.

The deposit contract, written in Vyper, employs the Merkle tree data structure to efficiently store the deposit history, where the tree is dynamically updated (i.e., leaf nodes are incrementally filled in order from left to right) whenever a new deposit is received. Here the Merkle tree employed in the contract is very large---a Merkle tree of height 32, which can store up to 2^32 deposits, is implemented in the contract. Since the size of the Merkle tree is huge, it is not practical to reconstruct the whole tree every time a new deposit is received.

To reduce both time and space complexities, thus saving the gas cost significantly, the contract implements an incremental Merkle tree algorithm. The incremental algorithm enjoys O(h) time and space complexities to reconstruct (more precisely, compute the root of) a Merkle tree of height h, while a naive algorithm would require O(2^h) time or space complexity.

The efficient incremental algorithm, however, leads to the deposit contract implementation being unintuitive, and makes it non-trivial to ensure its correctness. The correctness of the deposit contract, however, is critical for the security of Ethereum 2.0. Considering the utmost importance of the deposit contract, formal verification is demanded and the only known way to ultimately guarantee its correctness.

Scope and Methodology

The scope of our formal verification engagement was to ensure the correctness of the deposit contract bytecode within a single transaction, without considering transaction-level or off-chain behaviors. We took the compiled bytecode as the verification target to avoid the need to trust the compiler. (Indeed, we found several critical bugs of the Vyper compiler in the process of formal verification. We will explain more details about our findings later in this post.)

Methodologically, we adopted the well-known refinement-based verification approach. Our verification effort consists of the following two tasks:

Task 1: Verify that the incremental Merkle tree algorithm implemented in the deposit contract is correct w.r.t. the original full-construction algorithm.
Task 2: Verify that the compiled bytecode is correctly generated from the source code of the deposit contract.

Intuitively, the first task amounts to ensuring the correctness of the contract source code, while the second task amounts to ensuring the compiled bytecode being a sound refinement of the source code (i.e., translation validation of the compiler). This refinement-based approach allows us to avoid reasoning about the complex algorithm details, especially specifying and verifying loop invariants, directly at the bytecode level. This separation of concerns helped us to save a significant amount of verification effort.

In the previous blog post, we covered Task 1, with a brief introduction to the deposit contract and the incremental Merkle tree algorithm. In this post, we will focus on Task 2.

Check out our final report for the full details of the formal verification of the deposit contract.

Bytecode Verification of Deposit Contract

We verified the functional correctness of the compiled bytecode of the deposit contract. For each public function, we verified that its return value and its storage state update, if any, are correct.

Also, we carefully specified and verified its byte manipulation behavior. It includes:

return values being correctly serialized to byte sequences according to the contract ABI specification,
correct zero-padding for the 32-byte alignment,
correct conversion from big-endian to little-endian,
event log data being correctly encoded according to the contract event ABI specification,
input bytes of the SHA2-256 hash function being correctly constructed, and
deposit data hashes (called Merkleization) being correctly computed according to the SimpleSerialize (SSZ) specification.

We also verified the liveness property that the contract is always able to accept a new (valid) deposit as long as a sufficient amount of gas is provided. This liveness is not trivial since it needs to hold even in any future hard-fork where the gas fee schedule is changed. Indeed, we found a bug of the Vyper compiler that could make the deposit contract non-functional in a certain future hard-fork, where the contract would always fail due to the out-of-gas exception no matter how much amount of gas is supplied. (We will come back to this later in this post.)

We verified both positive and negative behaviors. The positive behaviors describe the desired behaviors of the contracts in a legitimate input state. The negative behaviors, on the other hand, describe how the contracts handle exceptional cases (e.g., when benign users feed invalid inputs by mistake, or malicious users feed crafted inputs to take advantage of the contracts). The negative behaviors are mostly related to security properties.

The full specification of the verified bytecode behaviors of the deposit contract can be found here.

Findings

In the course of our formal verification effort, we found subtle bugs (Issues 26, 1341, and 1357) of the deposit contract, which has been fixed in the latest version, as well as a couple of refactoring suggestions (Issues 27, 28, and 38) that can improve the code readability and reduce the gas cost. The subtle bugs of the deposit contract are partly due to another hidden bugs of the Vyper compiler (Vyper Issues 1563, 1599, 1610, and 1761) that we revealed in the verification process.

Below we elaborate on the bugs we found. We note that all the bugs of the deposit contract have been reported, confirmed, and properly fixed in the latest version (v0.10.0).

ABI Standard Conformance of `get_deposit_count()` Function

In the previous version of the deposit contract, the get_deposit_count() function does not conform to the ABI standard, where its return value contains incorrect zero-padding, due to a Vyper compiler bug.

Specifically, in the buggy version of the compiled bytecode, the get_deposit_count() function, whose return type is bytes[8], returns the following 96 bytes (in hexadecimal notation):

0x0000000000000000000000000000000000000000000000000000000000000020
  0000000000000000000000000000000000000000000000000000000000000008
  deadbeefdeadbeef000000000000000000000000000000000000000000000020
                                                                ^^

where:

the first 32 bytes (in the first line) denote the offset (32 = 0x20) to the byte array,
the second 32 bytes (in the second line) denote the size of the byte array (8 = 0x8), and
the next 8 bytes "deadbeefdeadbeef" (at the beginning of the third line) denotes the actual content of the return byte array (of type bytes[8]), i.e., the (little-endian) byte representation of the deposit count.

Here the problem is that the last byte (marked with ^^) is 0x20 while it should be 0x00. According to the ABI specification, the last 24 bytes must be all zero, serving as zero-pad for the 32-byte alignment. Thus the above return value does not conform to the ABI standard, which is problematic because any contract (written in either Solidity or Vyper) that calls to (the buggy version of) the deposit contract, expecting that the get_deposit_count() function conforms to the ABI standard, could have misbehaved.

This buggy behavior is mainly due to a subtle Vyper compiler bug that fails to correctly compile a function whose return type is bytes[n] where n is less than 16, which leads to the compiled function returning an incorrect byte sequence that does not conform to the ABI standard, having insufficient zero-padding as shown above.

We note that this bug could not have been detected if we did not take the bytecode as the verification target. This reconfirms that the bytecode-level verification is critical to ensure the ultimate correctness (unless we formally verify the underlying compiler), because we cannot (and should not) trust the compiler.

Checking Well-Formedness of Calldata

The calldata decoding process in the previous version of the compiled bytecode does not have sufficient runtime-checks for the well-formedness of calldata. As such, it fails to detect certain ill-formed calldata, causing invalid deposit data to be put into the Merkle tree. This is problematic especially when clients make mistakes and send deposit transactions with incorrectly encoded calldata, which may result in losing their deposit fund.

For example, consider the following ill-formed calldata for the deposit() function (in hexadecimal notation):

0xc47e300d
  0000000000000000000000000000000000000000000000000000000000000060
  0000000000000000000000000000000000000000000000000000000000000080
  00000000000000000000000000000000000000000000000000000000000000a0
  0000000000000000000000000000000000000000000000000000000000000030
  0000000000000000000000000000000000000000000000000000000000000020
  0000000000000000000000000000000000000000000000000000000000000060

The first four bytes (in the first line) denote the signature hash of the deposit() function, and the remaining bytes correspond to the tuple of three arguments, pubkey, withdrawal_credentials, and signature. This calldata, however, is clearly ill-formed thus not valid, simply because the size (196 bytes) is much less than that of valid calldata (356 bytes).

The problem, however, is that the deposit() function does not reject the above ill-formed calldata, but simply inserts certain invalid (garbage) deposit data in the Merkle tree. Since the invalid deposit data cannot pass the signature validation later, no one can claim the deposited fund associated with this, and the deposit owner loses the fund.

Note that this happens even though the deposit() function contains the following length-checking assertions at the beginning of the function, which is quite unintuitive:

assert len(pubkey) == PUBKEY_LENGTH
assert len(withdrawal_credentials) == WITHDRAWAL_CREDENTIALS_LENGTH
assert len(signature) == SIGNATURE_LENGTH

This problem would not exist if the Vyper compiler thoroughly generated runtime checks to ensure the well-formedness of calldata. However, since it was not trivial to fix the compiler to generate such runtime checks, we suggested several ways to improve the deposit contract to prevent this behavior without fixing the compiler. After careful discussion with the deposit contract development team, we together decided to employ a checksum-based approach where the deposit() function takes as an additional input a checksum for the deposit data, and rejects any ill-formed calldata using the checksum. The checksum-based approach is the most non-intrusive and the most gas-efficient of all the suggested fixes. (More details of other suggested fixes can be found here.)

We note that this issue was found when we were verifying the negative behaviors of the deposit contract. This shows the importance of having the formal specification to include not only positive behaviors but also negative ones.

Liveness

The previous version of the deposit contract fails to satisfy the liveness property in that it may not be able to accept a new deposit, even if it is valid, in a certain future hard-fork that updates the gas fee schedule. This was mainly due to another subtle Vyper compiler bug that generates bytecode where a hard-coded amount of gas is supplied when calling to certain precompiled contracts. Although this hard-coded amount of gas is sufficient in the current hard-fork (codenamed Istanbul), it may not be sufficient in a certain future hard-fork that increases the gas fee schedule of the precompiled contracts. In such a future hard-fork, the buggy version of the deposit contract will always fail due to the out-of-gas exception, regardless of how much amount of gas is initially supplied. This issue has been fixed in the latest version.

We admit that we could not find this issue until the deposit contract development team carefully reviewed and discussed with us about the formal specification of the bytecode. Initially, we considered only the behaviors of the bytecode in the current hard-fork, without identifying the requirement that the contract bytecode should work in any future hard-fork. We identified the missing requirement, and found this liveness issue, at a very late stage of the formal verification process. Because of that, we did not have enough time to follow the standard process of fixing compiler-introduced bugs, that is, waiting for the next compiler release that fixes the bugs, re-generating the bytecode using the new compiler version, and re-verifying the newly generated bytecode. In order to speed up the process, we asked the Vyper compiler team to release a custom hotfix version (named 1761-HOTFIX-v0.1.0-beta.13) to minimize the time to wait for the bugfix release. Employing the custom hotfix version also minimized the time to re-verify the newly generated bytecode, because the difference between the previous bytecode and the newly generated bytecode becomes minimal, involving only changes for fixing the bug.

This anecdotal experience essentially illustrates the well-known problem caused by the gap between the intended behaviors (that typically exists only informally) by developers, and the formal specification written by verification engineers. To reduce this gap, the two groups should work closely together, or ideally, they should be in the same team. For the former, the formal verification process should involve developers more frequently. For the latter, the formal verification tools should become much easier to use without requiring advanced knowledge of formal methods.

Conclusion

Upon the completion of our end-to-end formal verification of the Ethereum 2.0 deposit contract, we conclude that the latest deposit contract bytecode is correct, that is, more precisely, that it will behave as specified in the formal specification.

We adopted the refinement-based verification approach to ensure the end-to-end correctness of the contract while minimizing the verification effort. Specifically, we first proved that the incremental Merkle tree algorithm is correctly implemented in the contract, and then verified that the compiled bytecode is correctly generated from the source code.

Although we found several critical issues of the deposit contract during the formal verification process, some of which were due to subtle hidden Vyper compiler bugs, all of the issues of the deposit contract have been properly fixed in the latest version (v0.10.0) of the deposit contract, compiled by the Vyper compiler version 1761-HOTFIX-v0.1.0-beta.13.

We note that this formal verification result is established without trusting the Vyper compiler, which means that the formally verified bytecode is correct even if the Vyper compiler is buggy. Indeed, the Vyper compiler has been improved enough to generate a correct bytecode from the deposit contract. In other words, remaining Vyper compiler bugs, if any, have not been triggered when generating the specific bytecode we formally verified. So, you do not need to worry about all of the concerns about the Vyper compiler, regarding the safety of the deposit contract.

Acknowledgments

We would like to warmly thank the Ethereum Foundation for funding this effort, and Danny Ryan, Carl Beekhuizen, Justin Drake, Martin Lundfall, Nicholas Lin, Hsiao-Wei Wang, Bryant Eisenbach, Jacques Wagener, Charles Cooper, and David Sanders for their discussion and promptly addressing the reported issues.