With the upcoming major update to version 2.0 (code-named Serenity), Ethereum is transitioning into a sharded, proof-of-stake (PoS) consensus mechanism. It brings better energy-efficiency, security and scalability. The specific PoS protocol of Ethereum 2.0 is known as the Beacon Chain.
We are happy to report the first milestone in an ongoing collaboration between Runtime Verification and Ethereum Foundation, to build a formal framework for modeling and verifying the Beacon Chain. We have completed the executable formal model of Beacon Chain in the K Framework. The K specification, along with the technical report describing this work, are both available online.
So what’s the Beacon Chain? How was its model in K developed? Why is this development important?
The Beacon Chain is the PoS protocol layer of the upcoming Ethereum 2.0. The protocol is primarily responsible for achieving consensus on the state of the system among all nodes in the network participating in the protocol.
A participating node, called a validator in the protocol, contributes to the distributed operation of the system mainly by proposing new Beacon blocks or by voting for an existing Beacon block, depending on the current state of the node. A Beacon block encapsulates primarily a set of votes published to the network. The protocol manages how validators are chosen to propose and vote for blocks in a way that ensures that every validator is given a fair chance to contribute.
A vote for a Beacon block is called an attestation. Attestations constitute the fundamental building blocks of the consensus mechanism. Through an attestation for a Beacon block:
Finally, validators who follow the protocol and make sensible decisions, are rewarded with Ether, distributed as dividends, to encourage proper behavior. On the other hand, a validator that deviates from the protocol or behaves erratically can potentially be penalized by denying it dividends, or in some severe cases, by having its staked Ether slashed. This reward-penalty system helps make it economically infeasible for a malicious user to mount a successful attack on the system (see the Serenity Design Rationale notes).
The Beacon Chain is currently defined by its reference implementation in Python "Ethereum 2.0 Phase 0 – The Beacon Chain" developed by the Ethereum Foundation.
The main component of the implementation that defines the operation of the protocol is the Beacon Chain state transition function,
state_transition. An excerpt of the relevant part of the function's implementation is shown below:
def state_transition(state: BeaconState, block: BeaconBlock ...) -> BeaconState: # Process slots (including those with no blocks) since block process_slots(state, block.slot) # Process block process_block(state, block) ... # Return post-state return state
Processing begins at the genesis state (the state having the genesis Beacon block already processed). Given the next Beacon block to be processed, and assuming the block is valid, the Beacon Chain state transition function transforms a given Beacon Chain state (pre-state) into a new state (post-state). This post-state reflects the results of:
A more detailed description of the transition function and the operation of the Beacon Chain can be found here.
Our goal in this work is to build a formal model of the Beacon Chain that corresponds as closely as possible to the reference implementation given by “Ethereum 2.0 Phase 0 Specification”), so as to enable the following tasks:
K is a very suitable framework for this goal as it enables the development of a formal model of the Beacon Chain that is characterized by being:
The K model also paves the ground for more elaborate verification tasks, such as reachability analysis and deductive verification, but these are part of ongoing and future work, and will be described elsewhere.
Very generally, the Beacon Chain is modeled in K as a state transition system whose states are Beacon Chain states and whose transitions are defined by the main Beacon Chain state transition function.
In K, a Beacon Chain state is specified as a configuration consisting of (possibly nested) cells, where each cell represents a semantic element of the configuration that is needed to define the protocol. For example, the excerpt below shows two cells of the
<k> cell, which is a special cell that holds the program (computations) to be executed, and the
<state> cell, which contains all the structural elements of the Beacon Chain state (only three are shown below, the three dots denote omitted cells):
configuration <beacon-chain> <k> ... </k> <state> <slot> N:Int </slot> <eth1-data> E:Eth1Data </eth1-data> <validators> V:Map </validators> ... </state> ... </beacon-chain>
The Beacon Chain state transition function is specified by an operator in K that transforms a Beacon Chain pre-state, modeled by the current Beacon Chain configuration, into a Beacon Chain post-state, modeled by the resulting K configuration. The operator is declared as follows:
syntax KItem ::= "state_transition" "(" lock: BeaconBlock ")"
As described above, there are two main consecutive steps involved:
process_block. Sequencing of commands in K is naturally specified as stacking a computation on top of a continuation, using the operator
~>. For example, the state transition function is defined using the following K rule:
rule state_transition(BLOCK) => process_slots(BLOCK.slot) ~> process_block(BLOCK)
process_slots terminates successfully will the next computation defined by
process_block take place, which captures the intended semantics.
One of the challenges we faced while developing semantic specifications of the transition function was the mismatch between Python’s semantically rich expressions and mostly imperative programming style with K’s language-defining constructs and declarative specification style. During this development, we have identified a few patterns and how they may elegantly be specified in K, such as stacking computation structures for sequencing as described above. Other patterns are more involved and the mismatch is more pronounced, e.g. list comprehension expressions, which are quite heavily utilized in the Beacon Chain’s reference implementation. In these cases, this encoding is generally quite verbose, but is inescapable without defining an intermediate language construct in K. The advantage, however, is that the encoding readily enables a more detailed coverage analysis. The technical report details these patterns and how they are specified.
The Ethereum Foundation provides a rich test suite for the Beacon Chain. In addition to serving as a debugging tool for the Python implementation, the test suite is used by third-party production client developers to ensure conformance with the reference implementation. The test suite consists of over 3,000 different unit tests.
Given the level of abstraction at which the K model is developed, tests in the standard test suites of the Beacon Chain can be executed in the model directly without the need for any instrumentation. This proved invaluable as it provided a mechanism for validating the model and making sure that the model conforms with the reference implementation in the same way other production implementations are validated. All the tests can be run automatically by following the instructions given in the project’s repository.
The standard Beacon Chain test suite was designed so that code coverage, using available code coverage analysis tools for Python, is maximized. However, Python’s coverage is generally rather coarse-grained; it doesn't distinguish individual branches of execution of semantically rich Python constructs, such as list comprehension expressions.
Therefore, the plan was to evaluate how well the standard Python code coverage ensures semantic coverage and to see if rule-based coverage analysis could expose any non-covered functionality that is not detected by standard Python code coverage tools. Indeed, the analysis revealed execution paths that are not covered or not sufficiently covered by the tests, which are not detected by Python coverage. Most of these detections were in the specification of complex behaviors, like the ones typically encountered in list comprehension expressions and complex loops.
Below is a snapshot from the coverage analysis report generated by the K coverage tool showing a rule that was not applied in any of the tests in the test suite.
The executable K model of the Beacon Chain is a first but crucial step towards achieving the ultimate goal of formally verifying fundamental safety and liveness properties of the Beacon Chain and its reference implementation. Indeed, the problem of formal verification is meaningless without a trusted, formal model of the system to verify. Besides the ability to execute the state transition function, run tests and analyze test coverage, which were demonstrated in this work, the K formal model presented here can also be used to state and verify low-level invariants expressible at this low level of abstraction.
However, the Beacon Chain is a very complex protocol. Directly verifying high-level properties like safety and liveness at this low level of abstraction is usually infeasible. Instead, abstraction refinement techniques are commonly employed. Consequently, our plan is to follow the following approach:
We look forward to continuing our collaboration with the Ethereum Foundation on this work. For this engagement, we have had the great pleasure of working with the following experts at the Ethereum Foundation: Danny Ryan, Carl Beekhuizen and Martin Lundfall.