K-Michelson: a Case Study on Formal, Executable Language Specification (Part 1)

Posted on November 2nd, 2020 by Stephen Skeirik

RV and Tezos

In a previous blog post, we introduced the K-Michelson project, a project developed by Runtime Verification and funded by a generous grant from the Tezos Foundation. So what is K-Michelson? Simply put, it is a formal verification framework for Michelson smart contracts; see our previous post for details. In this two-part series, we will investigate:

  1. What an ideal programming language specification looks like.
  2. How this ideal framework powers nextgen program testing.

In this first part, we limit our focus to question (1). So, what should an ideal programming language specification look like? Several properties come to mind:

  • Clarity: it should be unambiguous and straightforward.
  • Completeness: It should be self-contained and describe how all possible programs execute.
  • Testability: arbitrary programs should be executable against the specification to see whether their actual and intended results agree.

Most popular language specifications do not satisfy at least one of these criteria.

Completeness: At the time of writing, Perl and Rust have no complete, self-contained specification, aside from their reference implementation (but using this as specification is circular, due to bootstrapping, unless you trace their development back to their original non-bootstrapped implementation).

Testability: C++, C, Java, Go, and Haskell all have (assumed) complete English language specifications (some include their standard library specification as well), but since they are in English, none are testable in the sense above, since the English language is not executable in any meaningful sense. Instead, we require our specification language to be executable.

Clarity: More fundamentally, we argue any natural language specification of a programming language fails to satisfy the clarity criterion in the following sense: human languages are inherently ambiguous. Instead, we require our specification language to be as precise as possible.

Of course, this discussion is not very helpful unless we can suggest a replacement specification language that does satisfy our criteria.

We argue that a logical theory of computation (in other words, a kind of mathematical code) should serve as our specification language. Since it is a logical theory, it is as precise as possible¹. Since our theory represents an abstract computer, it has a natural notion of executability.

In fact, this approach is exactly the approach taken by the K Framework. The K Framework is a toolkit for designing programming languages based on matching logic.

Another way of saying all this is we demand a formal---rigorous and logical---definition underpins our programming language specifications. Of course, providing such a formal definition requires us to define what it means to compute. Let us first tackle this issue of defining a logical theory of computation and then come back to the issue of testability/execution.

Computation is about data transformation. So we have two abstract kinds of objects: data (representing the state of the program, such as the stack, program counter, and gas remaining) and data transformation rules, which define how data is transformed (representing the semantics of each program construct, for example Michelson’s ADD instruction) ². In the K Framework, we have the following equivalence:

Data = Logical Term
Data Transformation Rule = Logical Axiom

Since K is based on matching logic, we specifically mean matching logic terms and axioms. Syntactically, K uses a concept called configurations, a hierarchical, XML-like notation, to represent data. Here is an example from Michelson:

Michelson example 1

This configuration represents a Michelson program that consists of one instruction, ADD, on a stack of size two with the integers 2 and 3. This configuration can be transformed by the following rule (a configuration with arrows representing data transformations):

Michelson example 2

where (+Int) represents true integer addition. As an example, using rules like this, we have specified the semantics of the Michelson language unambiguously.

There is much more that could be said about this, but suffice it to say, this simple formalism of configurations and rules allows for remarkably precise and modular executable programming language specifications, with clear mathematical and logical semantics derived from K’s foundations in matching logic.

This leads back to the question of testability/execution. Using the K Framework's LLVM backend, we can derive efficient interpreters that let us test our language specification against concrete programs, unlike natural language (or even paper-based mathematical) specifications. This means we can freely experiment with different designs until we find something satisfactory.

There are other advantages as well which we will only highlight here. The K Framework has other backends that let us derive theorem provers for reasoning about program execution and even bounded model checkers.

In conclusion, it is our conviction that (1) formal, executable language specification that helps us clarify what programs mean, test language designs, test programs, and verify that programs are correct; (2) as an instantiation of this approach, the K Framework is an efficient and powerful tool for programming language specification.

Next time, we will zoom in how the K Framework approach empowers us to do formal verification.

¹ Note that all logical theories must be defined and taught using natural language, so we cannot escape these fundamental ambiguities completely; however, the simplicity of logical languages can reduce ambiguity because they require a much smaller set of concepts to define.
² Note that data transformation rules can always be encoded as data. However, data itself has no primitive notion of execution; we must bootstrap ourselves with some basic transformation rules.