Introducing KMIR: concrete and symbolic execution of Rust MIR
Runtime Verification is excited to announce the open public alpha of KMIR. KMIR defines the formal operational semantics of the Middle Intermediate Representation (MIR) of Rust in K, giving developers access to the K framework tool suite for MIR programs. The K framework includes a parser, interpreter, and symbolic execution engine. This allows MIR programs to be directly executed and brings them a step closer to formal verification via the K framework. While this work has received funding through a grant from The Web3 Foundation, the development of the syntax and semantics was carried out independently. We want to thank the Web3 Foundation for its generous grant and its unwavering support throughout this project. In this blog, we will show how to extract MIR from the Rust compilation process, and how to execute MIR programs with KMIR.
The project source code can be viewed on the KMIR Github, however KMIR is easily accessible through our package manager kup. kup invokes nix to build the latest version of KMIR on your local machine, adding it to the PATH. To install kup, run the following in your terminal (this will also install nix if it is not already on your system):
bash <(curl https://kframework.org/install)
From here you can see all of the Runtime Verification tools available for download through:
kup list
To install the latest version of KMIR run:
kup install kmir
You can verify the install by running kup list
again and you should see that kmir is installed. To see the list of commands for kmir, run:
kmir --help
Congratulations! You have now installed KMIR on your machine and are ready to work with some MIR. Let’s move onto the demo.
Demo
In this demo, we'll use KMIR to directly interpret and execute a MIR program. Note that this is different from using the compiled binary from rustc, because rustc first translates MIR into LLVM, which is used to the binary.
First we need an example Rust program, from which we will extract the MIR representation
sum-to-n.rs
The example we will use is a program that calculates the sum of natural numbers from zero to n
. In particular we will choose n = 5
and assert that the result is 15
. Copy the code below into a file named sum-to-n.rs
.
fn main () { let mut sum = 0; let mut i = 0; while i <= 5 { sum += i; i += 1; } assert!(sum == 15); }
To verify that this example is indeed correct, you can run the following and see that no error is produced (the assertion passed).
rustc sum-to-n.rs --o sum-to-n ./sum-to-n
Now that we have our source code, we need to extract the compiled MIR program.
rustc –emit mir
Before we begin, the current version of KMIR is pinned to a specific version of nightly rust (46514218f 2023-06-20
), so we are going to need to install that. If you are using rustup, this can be done with the default
command and using the date one day after the date you wish to install:
rustup default nightly-2023-06-21
To confirm you have the correct version you can check with:
rustc --version
Once you are using the correct version, we can extract the MIR by calling the Rust compiler again, but this time with some additional flags:
rustc --emit mir -C overflow-checks=off -Zmir-enable-passes=-ConstDebugInfo,-PromoteTemps -o sum_to_n.mir sum_to_n.rs
Here's a brief explanation of these additional flags:
- Codegen flag
-C overflow-checks=off
will remove calls to functions that check for overflow from the MIR. - The MIR flag
-Zmir-enable-passes
provides options-ConstDebugInfo
and-PromoteTemps
however these will not affect this particular program. But-ConstDebugInfo
will stop MIR expanding type information in debug scope for constants, and-PromoteTemps
will stop the compiler from promoting temporary MIR places to have’static
lifetimes.
Now that we have extracted the MIR, let’s look into it and understand what it does.
sum_to_n.mir
At a high level, a MIR program can be thought of as a Control Flow Graph (CFG) representation of the original Rust source program. This MIR program is one main
function that can be thought of as 3 distinct segments:
- Place and Local Declarations - The first thing that happens in a function is the declaration of
Places
andLocals
that will be used to store information during the life of the function with their type information.Places
andLocals
are always numbered with an underscore prefix. Note that_2
was skipped here because it will be declared in the Scope Tree.Places
andLocals
look the same in the output MIR, their distinction is thatPlaces
refer to data stored on the heap, andLocals
as data stored on the stack. The distinction between them is not important for this example, so we will refer to bothLocals
andPlaces
asPlaces
.Place
_0
indicates the return data location for the encapsulating function.
- Scope Tree - After
Places
are declared, thescope
blocks will usedebug
statements to tie the original Rust variables toPlaces
.scope 1
linksPlace _1
tosum
, and is followed by the declaration of_2
.scope 2
is contained withinscope 1
, and linksPlace _2
toi
.
- Basic Blocks - These contain the logic of the function. Each Basic Block contains a non-branching sequence of instructions to be executed, with a terminator at the end. The terminator will indicate if the block terminates execution, or specify which other Basic Block should continue the execution. These Basic Blocks together with their terminators for the nodes and edges of the CFG.
Basic Block names have the form “bbbb0
being the function’s entry point. For our example:
bb0
initialises the_1
and_2
Places
to0
.- The
while
loop from the Rust source translates tobb1
andbb2
:bb1
is the guard of the loop, the terminator branches after the check, and can move to the body of the loopbb2
, or exit the loop tobb3
.bb2
performs the loop additions and returns tobb1
.
bb3
translates theassert!
in the Rust source, i.e., it checks that the result of the computation is equal to15
and transfers the execution to eitherbb4
orbb5
.bb4
andbb5
are leaf nodes. If the assertion thatsum == 15
fails, execution reachesbb4
which terminates in an error. Otherwise,bb5
terminates successfully.
Now that we have an understanding of the contents of sum_to_n.mir
, let’s use KMIR to execute the program.
kmir run
Before executing kmir run sum-to-n.mir
, let’s understand two flags: --output pretty
will change the output from the default kast
to a human readable representation. --depth
will tell KMIR how many steps to take before stopping, which allows us to inspect the state throughout the execution. Let’s look at the configuration of the program before any execution has taken place, so --depth 0
.
kmir run --output pretty --depth 0 sum_to_n.mir
The state of the program is printed in XML-like cells. The most interesting cell is the <k>
cell, which currently contains the K representation of the entire sum_to_n.mir
program! As the program is interpreted, the <k>
cell will transform to show the list of internal functions that are queued for execution. The other cells are pretty empty at the moment, but a notable mention is the <returncode>
cell which is currently 4
(our initial dummy value), and which will be updated to either 2
for an error such as a panic!
, or 0
if termination ends successfully.
By looking at the phase cell, we can see that we are in the Initialisation
phase. In this phase the entire sum_to_n
program will be interpreted to fill out a <function>
entry in the <functions>
cell with the main
function. Here is a truncated example of the main
<function>
entry:
The <localDecls>
cells will record the Place
declarations, and the <basicBlocks>
cells will record the basic blocks. To have a look at this information all filled out, let’s skip to the start of the Execution
phase by running again with --depth 22
.
kmir run --output pretty --depth 22 sum_to_n.mir
The output is much larger now, but you should be able to go through and match up the configuration to the sum_to_n.mir
file. The next thing to note is that the <k>
cell is not about to execute to the main
function.
It takes 160 steps to execute the whole function and terminate the program. By increasing the depth on the way to 160 you can see the state update as the basic blocks are executed. We will leave that for you, however let’s have a look at the final state of the program - we can omit the --depth
flag entirely for this.
kmir run --output pretty sum_to_n.mir
We can see that the <k>
cell is now empty, and the <returncode>
was 0.
Success! We can further convince ourselves that the result is correct by examining Place _1
which if you recall, was linked to sum
in our scope tree. We can see that _1
is 15 as we expect!
Conclusion
Thank you for your interest in KMIR, this project is still in active development as we push towards better coverage of the MIR syntax and semantics, with the end goal of having full formal verification. We aim to improve the experience by streamlining the process from extracting the MIR to verifying program properties. If you have any questions, or run into any problems during this tutorial, feel free to join the Runtime Verification discord to ask and chat anytime! We thank the Web3 Foundation for their support.
About Runtime Verification
Runtime Verification is a technology startup based in Champaign-Urbana, Illinois. The company uses formal methods to perform security audits on virtual machines and smart contracts on public blockchains. It also provides software testing, verification services and products to improve the safety, reliability, and correctness of software systems in the blockchain field.
About Web 3 Foundation
Web3 Foundation funds research and development teams building the technology stack of the decentralized web. It was established in Zug, Switzerland by Ethereum co-founder and former CTO Gavin Wood. Polkadot is the Foundation's flagship project.