Posted on June 12, 2019 by Daejun Park
Ethereum 2.0 is coming. And rest assured, it will be formally specified and verified!
Ethereum 2.0 is a new sharded PoS protocol that, at its early stage (called Phase 0), lives in parallel with the existing PoW chain (called Eth1 chain). While the Eth1 chain is powered by miners, the new PoS chain (called Beacon chain) will be driven by validators.
In the Beacon chain, the role of validators is to create (called propose) and validate (called attest) new blocks. The consensus protocol of the Beacon chain is built on top of important gadgets, Casper FFG for finalizing blocks, LMD GHOST for a fork-choice rule, and RANDAO for generating random numbers. (Check out our previous engagement in formal verification of Casper FFG and statistical verification of RANDAO.) The consensus protocol guarantees (under certain mild assumptions) the desired safety and liveness properties as long as the majority of validators faithfully follow the protocol when creating and validating new blocks.
The set of validators is dynamic, meaning that new validators can join and existing validators can exit over time. To be a (new) validator, one needs to deposit a certain amount of Ether, as a "stake", by sending a transaction (over the Eth1 network) to a designated smart contract (called deposit contract). The deposit contract records the history of deposits, which is retrieved by the Beacon chain to maintain the dynamic validator set. (This deposit process will change at a later stage, though.)
The deposit contract, written in Vyper, employs the Merkle tree data structure to efficiently store the deposit history, where the tree is dynamically updated (i.e., leaf nodes are incrementaly added in order from left to right) whenever a new deposit is received. Here the Merkle tree employed in the contract is expected to be very large. Indeed, a Merkle tree of height
32, which can store up to
2^32 deposits, is implemented in the current version of the contract. Since the size of the Merkle tree is huge, it is not practical to reconstruct the whole tree every time a new deposit is received.
To reduce both time and space requirements, thus saving the gas cost, the contract implements the incremental Merkle tree algorithm. The incremental algorithm enjoys
O(h) time and space complexity to reconstruct (more precisely, compute the root of) a Merkle tree of height
h, while a naive algorithm would require
O(2^h) time or space complexity. Specifically, the algorithm maintains two arrays of length
h, and each reconstruction of an updated tree requires to compute only a chain starting from the new leaf (i.e., the new deposit) to the root, where the computation of the chain requires only the two arrays, thus achieving the linear time and space complexity in the height of a tree.
The efficient incremental algorithm, however, leads to the deposit contract implementation being unintuitive, and makes it non-trivial to ensure its correctness. Considering the utmost importance of the deposit contract, formal verification is demanded and the only known way to ultimately guarantee the correctness of the contract.
Funded by Ethereum Foundation, we, at Runtime Verification, embarked on formal verification of the deposit contract, and today we are happy to announce our achievement of the first milestone, formal verification of the incremental Merkle tree algorithm.
Specifically, we first rigorously formalized the incremental Merkle tree algorithm. Then, we extracted a pseudocode implementation of the algorithm employed in the deposit contract, and formally proved the correctness of the pseudocode implementation. This means that the deposit contract is correct at the source-code level, that is, it will correctly compute the Merkle tree root incrementally, provided that there are no compile-time errors introduced by the Vyper compiler or EVM-bytecode-level functional correctness bugs. (Indeed, our next task is to ensure this bytecode-level correctness.) Check out our report for the formalization of the algorithm and the correctness proof.
In the course of our formalization and correctness proof effort, we found a subtle bug of the deposit contract, which has been fixed in the latest version, as well as a couple of refactoring suggestions that can improve the code readability and reduce the gas cost.
Let us elaborate on the subtle bug. In the version of the contract that we were asked to verify, the bug is triggered when all of the leaf nodes of a Merkle tree are filled with deposit data, in which case the contract (specifically, the
get_deposit_root function) incorrectly computes the root hash of a tree, returning the zero root hash (i.e., the root hash of an empty Merkle tree) regardless of the content of leaf nodes. For example, suppose that we have a Merkle tree of height 2, which has four leaf nodes, and every leaf node is filled with certain deposit data, say
D4, respectively. Then, while the correct root hash of the tree is
get_deposit_root function returns
hash(hash(0,0),hash(0,0)), which is incorrect.
Due to the complex logic of the code, it is non-trivial to properly fix this bug without significantly rewriting the code, and thus we suggested a workaround that simply forces to never fill the last leaf node, i.e., accepting only
2^h - 1 deposits at most, where
h is the height of a tree. We note that, however, it is infeasible to trigger this buggy behavior in the current setting, since the minimum deposit amount is 1 Ether and the total supply of Ether is less than
130M which is much smaller than
2^32, thus it is not feasible to fill all the leafs of a tree of height 32. Nevertheless, this bug has been fixed by the contract developers as we suggested, since the contract may be used in other settings in which the buggy behavior can be triggered and an exploit may be possible. See here for more details.
We also want to note that this bug was quite subtle to catch. Indeed, we had initially thought that the original code was correct until we got stuck when writing down a formal proof of the correctness theorem. The failure of our initial attempt to prove the correctness led us to identify a missing premise (i.e., an additional pre-condition) that was needed for the theorem to hold, from which we could find the above buggy behavior scenario, and suggested the bugfix. This experience reconfirms the importance of formal verification. Although we were not "lucky" to find this bug when we had eyeball-reviewed the code, which is all traditional security auditors do, the formal verification process thoroughly guided and even "forced" us to find it in the end.
So, now that we are convinced that the incremental Merkle tree algorithm and its implementation of the deposit contract are correct at the source-code level, we are moving forward to formally verify that its compiled EVM bytecode behaves as expected. We will post the second part of the story upon its completion. Stay tuned!
We would like to warmly thank the Ethereum Foundation for funding this effort, and Danny Ryan, Carl Beekhuizen, and Nicholas Lin for their discussion and promptly addressing the reported issues.