formal verification practice of mesatee safety

Posted by fierce at 2020-03-07

Author: Wang Minghua, Feng Qian, Zhang Yulong, Wei Tao

All kinds of tee (such as TPM and Intel SGX) provide a strong trusted computing hardware foundation, but they do not directly guarantee the security of the software. Memory security issues, such as buffer overflow, provide an opportunity for attackers to invade tee. Therefore, the security of software memory is very important. MesaTEE (derived from "Memory Safe TEE") not only follows the Hybrid Memory Safety model and reconstructs key components with memory security languages such as Rust, but also performs static scanning, dynamic testing and formal verification for other C/C++ codes.

Static scanning can quickly report shallow vulnerabilities by analyzing program coding mode, but there will be relatively high false positives and false positives; dynamic testing represented by fuzzy testing can trigger problems by providing randomly changed input, with high accuracy, but it is difficult to completely cover all States of the program, and it is difficult to prove that the program is flawless. Therefore, static scanning and dynamic testing are not enough in the high security demand scenario. Formal verification is needed to prove that the complete state space of the program deduced by strict mathematical methods conforms to the safety design specifications, which proves that the program will not have problems under the setting conditions. So formal verification is a very important part of security practice.

Baidu Security Lab has formal verification practice in the deployment of cloud computing, blockchain, automatic driving and other industrial scenarios. For example, in the project of mesatee TPM, formal verification verifies the TPM core software stack (tpm2-tss version 2.1.0), which ensures that about 67000 lines of code in the project will not have security problems in the scene of mesatee call. During the verification, 140 + security defects were found and fixed, including null pointer reference, cross-border read and write, memory leak, etc. the verified tpm2 TSS version was published on GitHub ( In addition, in the memory safe Python interpreter mesapy project, formal verification ensures that the code of C backend is memory safe in all usages of mesapy. During the verification, 14 security defects including null pointer reference and integer overflow were found and fixed. The verification methods and results were also published on GitHub (

Introduction to formal verification

Formal verification refers to the process of using strict mathematical methods to infer whether the verification program conforms to the safety specifications. This method includes source code preparation, value analysis based on abstract interpretation, weakest precondition and program defect report. As shown in the figure below.

1.1 source code preparation

Ideally, formal validation needs to be done on the complete code of the program. This includes not only the source code of the target program itself, but also the source code of the third-party library. However, in many cases, the three party library has not given the source code, or the third party Library's source code is too large, which is not concerned with the security of the main body of the program.

The next step is to determine the objective function to be verified and write test cases for the objective function. In the formal verification task, the function parameters in the test case are represented by interval to cover all possible values. At the same time, it also covers all scenarios of function calls, as shown in the following example:

We can start with the unit test and integration test that come with the project to write the test case of interval version. These two types of test cases will generally cover the key functions and key functional logic in the project. Starting from here, they can cover the verification objectives and ensure the quality of verification.

1.2 Abstract Interpretation Based Value Analysis

In this stage, the analyst annotates the source code, configures the analysis policy, and then relies on the abstract interpretation technology to perform value analysis. The analysis engine will analyze the annotated source code, insert the runtime error check statement annotation, analyze the program control flow and data flow, calculate all possible values of variables in the program, and verify the authenticity of annotation attributes. Statements corresponding to annotation attributes that cannot be confirmed as true in the analysis process are classified as undefined behavior. In the actual analysis process, analysts may need to add additional annotation attributes to undefined behavior to reduce false positives. Next, the weakest precondition analysis engine will verify these artificial extra annotations and present the finally found undefined behavior to the analyst for final confirmation.

a. Abstract Interpretation

Abstract interpretation refers to the use of computation on another abstract object domain (abstract domain [3]) to abstract the computation on the concrete domain (concrete domain [3]) of the approximation program. Abstract semantics covers all possible situations and is a superset of concrete program semantics. Therefore, if abstract semantics are safe, then concrete semantics are also safe. Each abstract object field specifies the operation rule system (lattice) [3] between abstract objects (or variables). The most common object field for drawing lines is interval domain [3], whose operation rules are similar to the specific numerical calculation rules (as shown in the following example). Because the interval domain is easy to understand and the computational complexity is not high, the interval domain is used to perform value analysis in the verification and analysis task of the actual program.

There may be over approval [3] in the calculation and deduction of variables by abstract interpretation. The reason is that in order to ensure the end of the program analysis in a reasonable time, the analysis engine often expands the value range of variables to reduce the computational complexity. In this way, due to the expansion of the variable value range, it may be reported as undefined behavior when the runtime error check is performed because it is not within the legal value range, but in fact, the real value of the variable does not exceed the legal range. In this case, the program annotation introduced later is needed to reduce false positives.

b. Analysis policy

AI based value analysis will record the value of variables during calculation and deduction. With the deepening of program analysis, the value of variables will be divided with program branches, and calculated and inferred in each branch. Finally, there will be many possibilities for variable values at a certain program point. Theoretically, for any program, it will produce the most accurate results to retain all the values of variables during the analysis, but with the huge space and time cost, it is difficult to ensure that the analysis ends in a reasonable time. At this time, it is necessary to set a reasonable analysis strategy to achieve the balance between the analysis time consumption and the accuracy of the analysis results.

We use state to represent all possible values of a variable. In practical analysis, a state level is usually set to indicate the maximum number of variable states allowed to be saved at any program point. If the number of all values of variables at a certain point exceeds the set value, the state needs to be merged. In addition, the level of detail recorded in the relationship between variables will also have an impact on the validation analysis results. This is mainly reflected in how to record the association relationship (state split) between multiple return variables (it is considered that there are multiple return values when the function has return parameters) when the functions call each other. The following example code shows the impact of state level and state split settings on the analysis results, respectively.

If the state level limit is very small (for example, 1), then at L, the value of X is x ∈ [1.. 9], and the value of Y is y ∈ [6.. 10]. However, if the state space is large enough (state level is greater than or equal to 2), then the values of X and y will be further refined and divided into two states: y = = 6 ⋀ x ∈ [1.. 4] or y = = 10 ⋀ x ∈ [5.. 9]. It can be seen that the larger the state level is, the finer the analysis granularity is, and the more accurate the result is.

x x ∈ [1..9] y y ∈ [6..10] x y y == 6 ⋀ x ∈ [1..4] y == 10 ⋀ x ∈ [5..9]

In the above code, when the granularity setting of state split is low, undefined behavior of "using danlging pointer HD" will be reported in (3), because it thinks that HD has been free in (2) and should not be used in (3). However, after analysis, we can find that the free of (2) is only executed when err at (2) is true. Under this condition, (2) after free execution, (3) the disposition branch would not have been executed. When state split granularity is increased, value analysis will record the relationship between err and HD, and no false positives will be generated.

hd free hd free err true free err hd

Although higher state level and more detailed state split granularity can bring us more accurate analysis results, in the case of huge program code and complex logic, these two strategies are set too high, which may lead to the analysis can not be completed in a reasonable time. Therefore, in the actual analysis, state level and state split need to be evaluated to achieve the balance of analysis effect and analysis time.

c. Annotation of program

Program annotation uses formal description language (such as ACSL [3]) to describe program semantics and assist analysis engine to generate validation constraints. In the actual verification and analysis process, sometimes due to the setting of analysis granularity, the value analysis stage cannot produce very fine results, resulting in the false alarm of undefined behavior. In this case, the analyst is required to add annotation to assist the analysis process to produce more accurate analysis results. The correctness of these annotations also needs to be verified in the weakest precondition verification below.

In the following example, assuming that a high analysis granularity is not configured, the size relationship between X and y at L1 will not be recorded, and a false alarm that "x may exceed the legal range of array T" will be generated at L3 in the program. If the constraint / / @ assert 0 < x < 10 is added at L3, the analysis process can be guided to eliminate this false alarm.

x y x T //@ assert 0 < x < 10

In addition to assert, program annotation also includes function contract and loop invariant. The function contract describes the input and output constraints of the function; the loop invariant describes the constraints of the loop, which helps value analysis better understand the semantics of the complex loop in the program, and helps reduce the false positives generated in the analysis process.

d. Runtime error check

The value analysis analysis analysis engine will add runtiem error check to require that the program conform to specific properties. For example, in the following example, before the memory data access operation, value analysis will add annotation attributes similar to / * @ assert RET: mem_access: \ valid_read(); * / to judge the validity of the access address. If the value range of the memory address is not within the legal range (such as null), undefined behavior will be reported.

/*@ assert ret: mem_access: \valid_read(); */ NULL

1.3 weakest precondition

The annotation properties related to the function contract and runtime error check are validated during value analysis. However, some assert annotations are added by analysts to assist value analysis, such as / / @ assert 0 < x < 10. These constraints are always considered true by value analysis. In order to ensure that these constraints are self-sufficient, they must be verified separately. The weakest precondition is used to verify these constraints.

assert //@ assert 0 < x < 10

The weakest precondition [5] can be formalized as WP (s, R). For a program location P, s is the statement executed from P, and R is the constraint (postcondition) reached after the statement s is executed from P. The meaning of WP (s, R) expression is the weakest precondition that P needs to satisfy after executing statement s in order to make R conform to a specific postcondition. In other words, the WP solution tool can solve the weakest precondition to be satisfied at P according to R and s.

wp(S, R) P S P R P S wp(S, R) S R P R S P

In this way, for each assert annotation provided to value analysis, its weakest precondition can be obtained through WP. If all the preconditions can meet the constraints of the context, we can get self-adaptive guarantee.


case analysis

2.1 TPM verification analysis

With the help of formal verification tool tis [6], we analyze and practice some industrial level projects. These projects involve cloud computing, blockchain, autonomous driving, etc. Taking tpm2 TSS (, the core software stack of TPM trusted computing, as an example, we verified the code of version 2.1.0 of the project, and verified about 1300 functions / about 67000 lines of code. We found and fixed more than 140 program defects, including null pointer reference, memory leak, cross-border reading and writing. The following are the program defects (left figure) and the corresponding repair scheme (right figure) found in the verification process. These program defects have been reported to the developer and finally confirmed, and have been repaired in the latest code.

Example 1: null pointer reference

Example 2: memory leak

Example 3: array read out of bounds due to loop condition error

Example 4: memory corruption due to free error

2.2 mesapy validation analysis

Mesapy ( is a memory safe Python interpreter based on pypy. We use smack [6], seahorn [7] and tis [5] to verify the C library function of mesapy backend (5044 lines of code in total). We find and fix 14 program defects including null pointer reference and integer overflow. The following are examples of security defects found in the validation process.

Example 1: null pointer reference

Example 2: integer overflow

3. Experience sharing of formal verification

Although formal verification method can detect and eliminate security defects, there are also limitations and considerations in specific applications. It is mainly reflected in the following aspects:

3.1 choice of analysis time and accuracy

As mentioned repeatedly before, the program verification task not only needs to ensure high accuracy, but also to ensure that the analysis task ends in a reasonable time. In the actual verification process, we found that for some verification objectives, setting a lower state level and state split strategy will not have any impact on the analysis accuracy. For some very time-consuming verification objectives, by adjusting the above two parameters, the verification objectives can be achieved in the ideal analysis time, supplemented by a small amount of manual confirmation.

In addition, if it is determined that the validation time is on a specific subfunction, you can write a function contract to this subfunction, and use contract in the validation, which can also improve the analysis efficiency. But as mentioned earlier, writing contracts to functions requires a complete understanding of the semantics of functions, otherwise it will backfire.

In addition, in order to improve the analysis efficiency, we can also rewrite the time-consuming sub functions. For example:

If it is very time-consuming to analyze the children function, it will return two ints through parameters through function definition, then we can use interval to rewrite. If it can be verified that y and Z are correct under all possible values, then they must also be correct under their respective true values.

children int y z

3.2 appropriate manual participation is required

Formal verification is difficult to complete automatically. Limited by the complexity of the constraints to be verified and the ability of the verifier to verify and solve them, some verification constraints cannot be verified and are reported in the form of undefined behavior. At this time, it is necessary to manually confirm whether these are real security defects. The workload of this step will be related to the setting of the analysis strategy. The larger the analysis granularity is, the less work this step needs to be done manually. In addition, over approval may also introduce false positives, which need to be eliminated manually.

3.3 test cases affect verification quality

The effectiveness of validation also depends on the quality of the test cases provided. The principle of writing test cases is to improve the coverage of branch verification as much as possible. In general, when writing test cases, you can use interval to assign values to parameters, covering all possible values. This method also helps to improve code verification coverage. For parameters with complex types (for example, nested structure variables contain other structures), sometimes it is also necessary to combine function implementation logic to decide which variables need to be assigned to interval and which can retain concrete assignment. Because not all variables use interval to bring the best verification effect, interval deduction calculation will bring analysis time band To increase the analysis time significantly.

3.4 guarantee provided by verification

Finally, it should be noted that the formal verification used in practice does not guarantee that the verified program does not have any security defects -- it can guarantee that for a specific verification object (specific version of the target code, and specific version of the dependency Library), the code that can be triggered by the verification of a given test sample must not have a specified verification strategy (such as buffer overflow Check, integer overflow check, and so on). Any statement that "software has been formally verified, so there will never be any security problems" is exaggerated and not rigorous.


This paper explains the principle of formal verification, and shares the experience of mesatee in formal verification of actual engineering projects, especially the verification of TPM core software stack and mesapy C backend. In order to make the industry less detours, we list the problems that need to be considered in adopting formal verification. It has been proved that although it is expensive to ensure code security by formal verification, it is feasible to design carefully. Compared with static analysis or dynamic test to evaluate program security, formal verification can find program security flaws more thoroughly, and ensure program security under given constraints.

[1]. Abstract Interpretation:

[2]. A gentle introduction to formal verification of computer systems by abstract interpretation.

[3]. Abstract Interpretation and Abstract Domains.

[4]. ACSL Mini-Tutorial.

[5] Weakest Precondition.

[6]. TIS.

[7]. Smack.

[8]. Seahorn.

For more information, please move to:

Tpm-tss2-verified project homepage:

Click "read original" to enter tpm-tss2-verified project Homepage