A Formal Model in K of the Beacon Chain: Ethereum 2.0’s Primary Proof-of-Stake Blockchain
With the upcoming major update to version 2.0 (code-named Serenity), Ethereum is transitioning into a sharded, proof-of-stake (PoS) consensus mechanism. It brings better energy-efficiency, security and scalability. The specific PoS protocol of Ethereum 2.0 is known as the Beacon Chain.
We are happy to report the first milestone in an ongoing collaboration between Runtime Verification and Ethereum Foundation, to build a formal framework for modeling and verifying the Beacon Chain. We have completed the executable formal model of Beacon Chain in the K Framework. The K specification, along with the technical report describing this work, are both available online.
So what’s the Beacon Chain? How was its model in K developed? Why is this development important?
The Beacon Chain in a Nutshell
The Beacon Chain is the PoS protocol layer of the upcoming Ethereum 2.0. The protocol is primarily responsible for achieving consensus on the state of the system among all nodes in the network participating in the protocol.
A participating node, called a validator in the protocol, contributes to the distributed operation of the system mainly by proposing new Beacon blocks or by voting for an existing Beacon block, depending on the current state of the node. A Beacon block encapsulates primarily a set of votes published to the network. The protocol manages how validators are chosen to propose and vote for blocks in a way that ensures that every validator is given a fair chance to contribute.
A vote for a Beacon block is called an attestation. Attestations constitute the fundamental building blocks of the consensus mechanism. Through an attestation for a Beacon block:
- A validator testifies that block is valid and that it should be appended to the chain;
- A validator votes for the “canonical chain of blocks” by identifying where the block should be appended, if the chain has forked into multiple branches (according to the fork choice rule);
- A validator contributes to deciding finality of blocks, which is a process that tells us when a Beacon block can be considered final and thus shouldn’t be reverted (according to Casper FFG);
- A validator votes to the block's shard, if the block does not belong to the main chain. Intuitively, a shard is a separate chain linked to the Beacon Chain that can be processed in parallel with other shards in the state by a subset of validators in the system, increasing significantly the system’s ability to process more transactions at a time, and hence its scalability (see sharding and crosslinks).
Finally, validators who follow the protocol and make sensible decisions, are rewarded with Ether, distributed as dividends, to encourage proper behavior. On the other hand, a validator that deviates from the protocol or behaves erratically can potentially be penalized by denying it dividends, or in some severe cases, by having its staked Ether slashed. This reward-penalty system helps make it economically infeasible for a malicious user to mount a successful attack on the system (see the Serenity Design Rationale notes).
The Beacon Chain is currently defined by its reference implementation in Python "Ethereum 2.0 Phase 0 – The Beacon Chain" developed by the Ethereum Foundation.
The main component of the implementation that defines the operation of the protocol is the Beacon Chain state transition function, state_transition
. An excerpt of the relevant part of the function's implementation is shown below:
def state_transition(state: BeaconState, block: BeaconBlock ...) -> BeaconState: # Process slots (including those with no blocks) since block process_slots(state, block.slot) # Process block process_block(state, block) ... # Return post-state return state
Processing begins at the genesis state (the state having the genesis Beacon block already processed). Given the next Beacon block to be processed, and assuming the block is valid, the Beacon Chain state transition function transforms a given Beacon Chain state (pre-state) into a new state (post-state). This post-state reflects the results of:
- Accounting for (potentially) missed blocks; (
process_slots
); - Processing the block's contents (
process_block
).
A more detailed description of the transition function and the operation of the Beacon Chain can be found here.
Modeling the Beacon Chain in K
Our goal in this work is to build a formal model of the Beacon Chain that corresponds as closely as possible to the reference implementation given by “Ethereum 2.0 Phase 0 Specification”), so as to enable the following tasks:
- Simulating or animating the execution of the Beacon Chain state transition function;
- Running existing tests from the Beacon Chain test-suites;
- Analyzing the code coverage of existing test-suites and improve it with new tests.
K is a very suitable framework for this goal as it enables the development of a formal model of the Beacon Chain that is characterized by being:
- Executable by pattern rewriting in the K tool, so that an interpreter for the Beacon Chain state transition function is obtained directly from the specification (at no extra cost);
- Concrete, in that its specification corresponds directly to the Python implementation of the system (modulo some specific abstractions, such as signature verification).
The K model also paves the ground for more elaborate verification tasks, such as reachability analysis and deductive verification, but these are part of ongoing and future work, and will be described elsewhere.
The full specification of the model in K is available at the project’s Github repository, along with a technical report describing this work in more detail.
How is the Beacon Chain Modeled in K?
Very generally, the Beacon Chain is modeled in K as a state transition system whose states are Beacon Chain states and whose transitions are defined by the main Beacon Chain state transition function.
In K, a Beacon Chain state is specified as a configuration consisting of (possibly nested) cells, where each cell represents a semantic element of the configuration that is needed to define the protocol. For example, the excerpt below shows two cells of the
configuration: The <k>
cell, which is a special cell that holds the program (computations) to be executed, and the <state>
cell, which contains all the structural elements of the Beacon Chain state (only three are shown below, the three dots denote omitted cells):
configuration <beacon-chain> <k> ... </k> <state> <slot> N:Int </slot> <eth1-data> E:Eth1Data </eth1-data> <validators> V:Map </validators> ... </state> ... </beacon-chain>
The Beacon Chain state transition function is specified by an operator in K that transforms a Beacon Chain pre-state, modeled by the current Beacon Chain configuration, into a Beacon Chain post-state, modeled by the resulting K configuration. The operator is declared as follows:
syntax KItem ::= "state_transition" "(" lock: BeaconBlock ")"
As described above, there are two main consecutive steps involved: process_slots
and process_block
. Sequencing of commands in K is naturally specified as stacking a computation on top of a continuation, using the operator ~>
. For example, the state transition function is defined using the following K rule:
rule state_transition(BLOCK) => process_slots(BLOCK.slot) ~> process_block(BLOCK)
Only when process_slots
terminates successfully will the next computation defined by process_block
take place, which captures the intended semantics.
One of the challenges we faced while developing semantic specifications of the transition function was the mismatch between Python’s semantically rich expressions and mostly imperative programming style with K’s language-defining constructs and declarative specification style. During this development, we have identified a few patterns and how they may elegantly be specified in K, such as stacking computation structures for sequencing as described above. Other patterns are more involved and the mismatch is more pronounced, e.g. list comprehension expressions, which are quite heavily utilized in the Beacon Chain’s reference implementation. In these cases, this encoding is generally quite verbose, but is inescapable without defining an intermediate language construct in K. The advantage, however, is that the encoding readily enables a more detailed coverage analysis. The technical report details these patterns and how they are specified.
Validating the Model
The Ethereum Foundation provides a rich test suite for the Beacon Chain. In addition to serving as a debugging tool for the Python implementation, the test suite is used by third-party production client developers to ensure conformance with the reference implementation. The test suite consists of over 3,000 different unit tests.
Given the level of abstraction at which the K model is developed, tests in the standard test suites of the Beacon Chain can be executed in the model directly without the need for any instrumentation. This proved invaluable as it provided a mechanism for validating the model and making sure that the model conforms with the reference implementation in the same way other production implementations are validated. All the tests can be run automatically by following the instructions given in the project’s repository.
Extending Test Coverage
The standard Beacon Chain test suite was designed so that code coverage, using available code coverage analysis tools for Python, is maximized. However, Python’s coverage is generally rather coarse-grained; it doesn't distinguish individual branches of execution of semantically rich Python constructs, such as list comprehension expressions.
K provides a different rule-based coverage analysis tool. It detects whether a rule is applied (and if so, how many times the rule is applied) in an execution. Semantic coverage using this K-based tool has already proved useful in the context of other languages, like JavaScript, where we found new bugs in all browsers (these results are described in this PLDI’15 paper).
Therefore, the plan was to evaluate how well the standard Python code coverage ensures semantic coverage and to see if rule-based coverage analysis could expose any non-covered functionality that is not detected by standard Python code coverage tools. Indeed, the analysis revealed execution paths that are not covered or not sufficiently covered by the tests, which are not detected by Python coverage. Most of these detections were in the specification of complex behaviors, like the ones typically encountered in list comprehension expressions and complex loops.
Below is a snapshot from the coverage analysis report generated by the K coverage tool showing a rule that was not applied in any of the tests in the test suite.
Moving Forward
The executable K model of the Beacon Chain is a first but crucial step towards achieving the ultimate goal of formally verifying fundamental safety and liveness properties of the Beacon Chain and its reference implementation. Indeed, the problem of formal verification is meaningless without a trusted, formal model of the system to verify. Besides the ability to execute the state transition function, run tests and analyze test coverage, which were demonstrated in this work, the K formal model presented here can also be used to state and verify low-level invariants expressible at this low level of abstraction.
However, the Beacon Chain is a very complex protocol. Directly verifying high-level properties like safety and liveness at this low level of abstraction is usually infeasible. Instead, abstraction refinement techniques are commonly employed. Consequently, our plan is to follow the following approach:
- Build an abstraction of this concrete model that retains the core consensus mechanism of the protocol;
- Prove the desired properties on the abstract model;
- Show that these properties are preserved in the concrete model.
We look forward to continuing our collaboration with the Ethereum Foundation on this work. For this engagement, we have had the great pleasure of working with the following experts at the Ethereum Foundation: Danny Ryan, Carl Beekhuizen and Martin Lundfall.