Earlier this year, Runtime Verification was engaged by Algorand to verify its consensus protocol. We are happy to report that the first part of the effort, namely modeling the protocol and proving its safety theorem, has been successfully completed. Specifically, we have used a proof assistant (Coq) to systematically identify assumptions under which the protocol is mathematically guaranteed to not fork.
Before we dive into a brief summary of the engagement, let’s begin with a short introduction of Algorand. Algorand was founded by MIT Professor and Turing Award Winner, Silvio Micali, and has built one of the strongest teams in the blockchain space. At the heart of Algorand is the Algorand consensus protocol, a pure Proof-of-Stake protocol that promises to provide efficient, secure and scalable operation while remaining truly decentralized. The basic idea in pure PoS is to make the security of the system dependent only on a supermajority of the entire stake being held by honest users, without having to rely on any specific small subset of nodes or having to lock up assets and penalize users. The three highlights of the Algorand consensus protocol are scalability, decentralization and security. The reader interested in the internals of Algorand is encouraged to review the informative articles published by the Algorand team on Medium.
The first step in the engagement was to build a formal model of the protocol and pin down the assumptions under which the protocol design satisfies its requirements. To achieve the highest levels of assurance, we selected deductive verification, in which systems are modeled and specified inside expressive formal logical systems, and verified in a similar way to how mathematicians prove theorems. The figure below depicts the various approaches to system analysis, as a function of effort required vs assurance.
The Algorand consensus protocol proceeds in a series of rounds. Each round, the nodes certify a block to add to the blockchain. If the network is partitioned, several attempts--called periods--may be necessary to certify a block. For more details, see the Algorand Blockchain Features Specification and this journal paper on Algorand. The protocol itself encompasses both node-level behavior and network-level behavior (including the behavior of the adversary), which meant that a fair amount of detail had to be modeled to be able to express and verify relevant properties of the protocol.
We developed our model of the Algorand consensus protocol in Coq, and proved a slew of its properties that we then used to ultimately show the asynchronous safety property: no two honest nodes certify two different blocks, even when the network is partitioned (i.e., message delays are arbitrarily large). At the heart of the model is the global state transition relation. This relation defines inductively how the protocol’s global state transitions into another state in one step.
Most statements in the model asserting that some kind of event has taken place are specified as propositions on traces, possibly satisfying certain conditions. A trace is a nonempty sequence of global states that is built up using the global transition relation, i.e., if state
g(i) is the i'th state in the sequence and
g(i+1) is the (i+1)'th state, then the ordered pair
g(i+1) belongs to the global transition relation. By specifying path properties as propositions on traces, we are able to define generically what the property is without having to assume a concrete initial state (or a set of initial states). Any conditions required for the property to hold can be specified as constraints on the states of the trace being considered.
A major result of the engagement is the rigorous formalization and proof of the asynchronous safety theorem. The statement of the theorem asserts that any two certificates from the same round must be for the same block; that is, no forks. Or, formally in Coq:
Note the preconditions
state_before_round r g0, which states that the trace goes back long enough in history when no user has yet entered round
is_trace g0 trace, which states that the trace is a valid sequence of global states beginning at
g0, and in which every pair of consecutive states satisfies the global transition relation. The proof of the theorem is structured into individual results that are themselves broken down into smaller lemmas (with a total of about 170 lemmas).
A second major result of the engagement is the precise specification of the assumptions needed and showing that these assumptions are sufficient for the safety property to hold. Assumptions are extremely important to consider when deploying an implementation of a design and for systems built on top of an implementation. Assumptions are specified either as model parameters, such as a finite set of nodes and a finite set of values used in message payloads, or as more elaborate hypotheses (or axioms), such as assumptions on how message delays are bounded and assumptions on honesty of the nodes holding a supermajority of the stake.
The model we developed as part of this effort is generic, in that it captures the dynamics of the Algorand consensus protocol in a way that is orthogonal to the properties that we verify about it. This means the model can be readily used to verify other properties of the system beyond asynchronous safety, including most importantly liveness (i.e., that a block is certified every round). In fact, we anticipate that many of the smaller results shown about the protocol and used in the proof of safety will also constitute essential ingredients of the liveness-proving effort.
For this engagement, we have had the great pleasure of working with the following experts at Algorand: Jing Chen, Nickolai Zeldovich and Victor Luchangco.