Introducing KMIR: concrete and symbolic execution of Rust MIR

Posted on November 20th, 2023 by Daniel Cumming

Non-audit banners (1).png

Runtime Verification is excited to announce the open public alpha of KMIR. KMIR defines the formal operational semantics of the Middle Intermediate Representation (MIR) of Rust in K, giving developers access to the K framework tool suite for MIR programs. The K framework includes a parser, interpreter, and symbolic execution engine. This allows MIR programs to be directly executed and brings them a step closer to formal verification via the K framework. While this work has received funding through a grant from The Web3 Foundation, the development of the syntax and semantics was carried out independently. We want to thank the Web3 Foundation for its generous grant and its unwavering support throughout this project. In this blog, we will show how to extract MIR from the Rust compilation process, and how to execute MIR programs with KMIR.

The project source code can be viewed on the KMIR Github, however KMIR is easily accessible through our package manager kup. kup invokes nix to build the latest version of KMIR on your local machine, adding it to the PATH. To install kup, run the following in your terminal (this will also install nix if it is not already on your system):

bash <(curl https://kframework.org/install)

From here you can see all of the Runtime Verification tools available for download through:

kup list

To install the latest version of KMIR run:

kup install kmir

You can verify the install by running kup list again and you should see that kmir is installed. To see the list of commands for kmir, run:

kmir --help

Congratulations! You have now installed KMIR on your machine and are ready to work with some MIR. Let’s move onto the demo.

Demo

In this demo, we'll use KMIR to directly interpret and execute a MIR program. Note that this is different from using the compiled binary from rustc, because rustc first translates MIR into LLVM, which is used to the binary.

First we need an example Rust program, from which we will extract the MIR representation

sum-to-n.rs

The example we will use is a program that calculates the sum of natural numbers from zero to n. In particular we will choose n = 5 and assert that the result is 15. Copy the code below into a file named sum-to-n.rs.

fn main () {
    let mut sum = 0;
    let mut i = 0;

    while i <= 5 {
        sum += i;
        i += 1;
    }
    assert!(sum == 15);
}

To verify that this example is indeed correct, you can run the following and see that no error is produced (the assertion passed).

rustc sum-to-n.rs --o sum-to-n
./sum-to-n

Now that we have our source code, we need to extract the compiled MIR program.

rustc –emit mir

Before we begin, the current version of KMIR is pinned to a specific version of nightly rust (46514218f 2023-06-20), so we are going to need to install that. If you are using rustup, this can be done with the default command and using the date one day after the date you wish to install:

rustup default nightly-2023-06-21

To confirm you have the correct version you can check with:

rustc --version

Once you are using the correct version, we can extract the MIR by calling the Rust compiler again, but this time with some additional flags:

rustc --emit mir -C overflow-checks=off -Zmir-enable-passes=-ConstDebugInfo,-PromoteTemps -o sum_to_n.mir sum_to_n.rs

Here's a brief explanation of these additional flags:

Codegen flag -C overflow-checks=off will remove calls to functions that check for overflow from the MIR.
The MIR flag -Zmir-enable-passes provides options -ConstDebugInfo and -PromoteTemps however these will not affect this particular program. But -ConstDebugInfo will stop MIR expanding type information in debug scope for constants, and -PromoteTemps will stop the compiler from promoting temporary MIR places to have ’static lifetimes.

Now that we have extracted the MIR, let’s look into it and understand what it does.

sum_to_n.mir

At a high level, a MIR program can be thought of as a Control Flow Graph (CFG) representation of the original Rust source program. This MIR program is one main function that can be thought of as 3 distinct segments:

Place and Local Declarations - The first thing that happens in a function is the declaration of Places and Locals that will be used to store information during the life of the function with their type information. Places and Locals are always numbered with an underscore prefix. Note that _2 was skipped here because it will be declared in the Scope Tree. Places and Locals look the same in the output MIR, their distinction is that Places refer to data stored on the heap, and Locals as data stored on the stack. The distinction between them is not important for this example, so we will refer to both Locals and Places as Places. Place _0 indicates the return data location for the encapsulating function.

Scope Tree - After Places are declared, the scope blocks will use debug statements to tie the original Rust variables to Places. scope 1 links Place _1 to sum, and is followed by the declaration of _2. scope 2 is contained within scope 1, and links Place _2 to i.

Basic Blocks - These contain the logic of the function. Each Basic Block contains a non-branching sequence of instructions to be executed, with a terminator at the end. The terminator will indicate if the block terminates execution, or specify which other Basic Block should continue the execution. These Basic Blocks together with their terminators for the nodes and edges of the CFG.

Basic Block names have the form “bb” with bb0 being the function’s entry point. For our example:

bb0 initialises the _1 and _2 Places to 0.
The while loop from the Rust source translates to bb1 and bb2:
- bb1 is the guard of the loop, the terminator branches after the check, and can move to the body of the loop bb2, or exit the loop to bb3.
- bb2 performs the loop additions and returns to bb1.
bb3 translates the assert! in the Rust source, i.e., it checks that the result of the computation is equal to 15 and transfers the execution to either bb4 or bb5.
bb4 and bb5 are leaf nodes. If the assertion that sum == 15 fails, execution reaches bb4 which terminates in an error. Otherwise, bb5 terminates successfully.

Now that we have an understanding of the contents of sum_to_n.mir, let’s use KMIR to execute the program.

kmir run

Before executing kmir run sum-to-n.mir, let’s understand two flags: --output pretty will change the output from the default kast to a human readable representation. --depth will tell KMIR how many steps to take before stopping, which allows us to inspect the state throughout the execution. Let’s look at the configuration of the program before any execution has taken place, so --depth 0.

kmir run --output pretty --depth 0 sum_to_n.mir

The state of the program is printed in XML-like cells. The most interesting cell is the <k> cell, which currently contains the K representation of the entire sum_to_n.mir program! As the program is interpreted, the <k> cell will transform to show the list of internal functions that are queued for execution. The other cells are pretty empty at the moment, but a notable mention is the <returncode> cell which is currently 4 (our initial dummy value), and which will be updated to either 2 for an error such as a panic!, or 0 if termination ends successfully.

By looking at the phase cell, we can see that we are in the Initialisation phase. In this phase the entire sum_to_n program will be interpreted to fill out a <function> entry in the <functions> cell with the main function. Here is a truncated example of the main <function> entry:

The <localDecls> cells will record the Place declarations, and the <basicBlocks> cells will record the basic blocks. To have a look at this information all filled out, let’s skip to the start of the Execution phase by running again with --depth 22.

kmir run --output pretty --depth 22 sum_to_n.mir

The output is much larger now, but you should be able to go through and match up the configuration to the sum_to_n.mir file. The next thing to note is that the <k> cell is not about to execute to the main function.

It takes 160 steps to execute the whole function and terminate the program. By increasing the depth on the way to 160 you can see the state update as the basic blocks are executed. We will leave that for you, however let’s have a look at the final state of the program - we can omit the --depth flag entirely for this.

kmir run --output pretty sum_to_n.mir

We can see that the <k> cell is now empty, and the <returncode> was 0.

Success! We can further convince ourselves that the result is correct by examining Place _1 which if you recall, was linked to sum in our scope tree. We can see that _1 is 15 as we expect!

Conclusion

Thank you for your interest in KMIR, this project is still in active development as we push towards better coverage of the MIR syntax and semantics, with the end goal of having full formal verification. We aim to improve the experience by streamlining the process from extracting the MIR to verifying program properties. If you have any questions, or run into any problems during this tutorial, feel free to join the Runtime Verification discord to ask and chat anytime! We thank the Web3 Foundation for their support.

About Runtime Verification

Runtime Verification is a technology startup based in Champaign-Urbana, Illinois. The company uses formal methods to perform security audits on virtual machines and smart contracts on public blockchains. It also provides software testing, verification services and products to improve the safety, reliability, and correctness of software systems in the blockchain field.

About Web 3 Foundation

Web3 Foundation funds research and development teams building the technology stack of the decentralized web. It was established in Zug, Switzerland by Ethereum co-founder and former CTO Gavin Wood. Polkadot is the Foundation's flagship project.