K-Michelson: a Case Study on Formal, Executable Language Specification (Part 2)

Posted on December 11th, 2020 by Stephen Skeirik

RV and Tezos

The Power of Reachability Logic

In a previous blog post, we introduced the K-Michelson project, a project developed by Runtime Verification and funded by a generous grant from the Tezos Foundation. So what is K-Michelson? Simply put, it is a formal verification framework for Michelson smart contracts; see our previous post for details. In this two-part series, we will investigate:

  1. What an ideal programming language specification looks like.
  2. How this ideal framework powers nextgen program testing.

In the first part of the series, we considered point how the K Framework is an example of point (1) using our K-Michelson tool as a concrete example. In this second part, we take a closer look at how K addresses point (2), again using K-Michelson as an example.

As a quick refresher, last time we discussed how K language specifications have data, stored using XML-like schemas called configurations composed of cells, and data transformation rules, which map a configuration with variables to a different configuration over those variables. A quick example we saw last time:

K configuration example

This rule describes the operation of the Michelson ADD instruction over the integers, i.e., consume the ADD instruction in the current continuation and the top two stack elements which have type int and then produce a new int value which is their sum. The ellipsis (...) represents the rest of the continuation/stack that is left untouched by this rule and variables V have optional type annotations of the form V:VarType which we add for clarity. As we argued last time, the use of such semantic rules lets us describe the intended operation of our programming language more conveniently and precisely than any human language, while still remaining fairly readable.

The question that we seek to answer here is: how does representing our programming language using this scheme (i.e., for the academically-inclined as a trusted operational semantics) empower us to do verification more easily? The answer lies in the power of our underlying verification which we call reachability logic.

The K-Framework Program Verification Approach

The traditional approach to program verification involved two or three separate specifications of the same programming language ― an operational semantics, a Hoare logic, and an axiomatic semantics ― which all served different purposes and which all had to be shown consistent with each other. Building each distinct representation (and the associated tooling to use them) required a lot of work.

Using reachability logic, the K Framework allows to take a K programming language specification as input (i.e. a trusted operational semantics) and derive a correct-by-construction program verifier with no additional effort. Thus, the three separate representations and associated tooling we had to build before are all subsumed by a single representation and tool that lets us "do it all," so to speak.

In addition to unifying these three separate representations, reachability logic goes further and unifies the typically distinct concepts of system representation (how I represent my system, e.g., a programming language) and property representation (how I represent what I want to know about my system, e.g., how a particular program behaves). Let us see an example. Here is a Michelson program called loopsum that adds up the numbers from 1 to 10:

Michelson loopsum program

This program is part of a property specification. It is incomplete, because we have not defined what the program is supposed to do. Similarly, the integer addition rule we saw above is part of a system specification for the Michelson smart contract language. It too is incomplete, since we will need many more rules to formally define the Michelson language. Let us show the other semantic rules needed to run this example:

Michelson semantics rule

The first rule describes how we can push an immediate constant onto the value stack. The second and third rules describe the action of the ITER instruction over a loop in two cases: when the iterated list is non-empty and when it is empty. Of course, this system specification is still incomplete, but for simplicity, we elide the other details.

Let us now revisit our loopsum program and see how we can finish our property specification, i.e., let us define a rule which describes how the loopsum program is supposed to behave. Looking at the program, we see that it is self-contained, in the sense that, this program can execute on any well-formed Michelson stack and it will produce the same Michelson stack with an additional integer pushed on top such that the entire resulting stack is well-formed. This can be expressed using the following well-formedness claim:

K loopsum

Note that, aside from the word claim, which we use purely for disambiguation, there is nothing different between a system specification (a set of reachability rules) and a property specification (a set of reachability claims). Reachability logic entirely unifies these two concepts. Note also that the above claim precisely states the stack transformation property that we wrote in words above.

But we can do better. How? The integer variable I:Int that we used in our claim above is somewhat imprecise. We can strengthen our property by replacing this integer variable with a constant, i.e., the sum of integers from 1 to 10 or 55. Thus, we write the following total correctness claim.

K loopsum 2

This claim now captures totally the intended effect of this program. As a side-effect, we can clearly see that this program is constant, so that an optimizing compiler could replace the entire above program with the program:

loopsum optimized

In any case, being able to write such claims without a program verifier that can prove them is not very useful. Fortunately, using the kprove tool, we can do just that. This is a core tool provided by the K Framework for program verification purposes. A tutorial on using the tool is out of scope for this blog post; for more details, see the K Framework website.

Verifying Michelson Programs with K-Michelson

K-Michelson is not just a semantics of the Michelson smart contract language, it is also a formal verification framework for the Michelson language.

In essence, we can view K-Michelson as a kind of wrapper on top of the K Framework that selectively packages some useful functionality in an interface that will be more comfortable for Michelson developers.

In fact, both properties that we expressed above as reachability logic claims can also be expressed using the .tzt unit test format (the input file format for K-Michelson). The advantage of this format is that it is much closer in spirit to standard Michelson programs. Here is how we would rewrite our well-formedness claim:

loopsum 2

Writing the total correctness claim is even more straightforward, since we don't need to introduce any variable syntax, e.g., the $I we saw above:

loopsum 3

We could then run our two tests using K-Michelson by downloading and building the tool and then doing:

kmich 1

And:

kmich 2

Here the command symbtest stands for symbolic test. It just means that we may want to use symbolic variables (like $I).

There is much to say about how we can build these .tzt input files and use them; we refer interested readers to the K-Michelson documentation site.

If you made it this far, thanks for staying with us to the end. Hopefully, this blog post provided a small taste of what verification looks like using reachability logic, the K Framework, and our tool K-Michelson. We plan to keep updating the tool to make it more powerful and user-friendly. If you have any thoughts on how to do that, please feel free to join the discussion and development at our GitHub repository.