on the file analysis loopholes of loopholes mining

Posted by trammel at 2020-04-09


1 - Preface

2 - Introduction to file fuzzy

3 - fuzzy records of documents

4 - Outlook

5 - conclusion

[1] - Preface

Since Google admitted to suffering from serious hacker attacks in 2010, apt's high-level persistent threat has become a "buzzword" in the security circle. Of course, apt is undoubtedly a nightmare for companies like Google, RSA, Comodo and so on. Among them, file parsing vulnerability takes on the function of heavy weapon in APT attack.

Whether it's IE or office, they all have one thing in common, which is to use files as the main input of the program. Many programmers have the habit of assuming that the files they use are strictly in accordance with the data format specified by the software. However, attackers often challenge the assumption of programmers, try to modify the data format agreed by the software slightly, and observe whether the software will crash or overflow when parsing the "malformed file".

[2] – introduction to file fuzz

File fuzzy is a method to test the robustness of software by using "malformed file".

Filefuzzy can be divided into blind fuzzy and smart fuzzy. Blind fuzzy is commonly known as "blind test", which is to modify data at random locations to generate abnormal files. However, now the file format is more and more complex, and the code coverage of blind fuzzy is low, which will produce a large number of useless test cases. In view of the deficiency of blind fuzzy, smart fuzzy is proposed and applied more and more. Smart fuzzy is called intelligent fuzzy. It can generate abnormal files by analyzing file format, and then based on samples and variations. It can identify different data types and generate abnormal data according to different rules. Compared with blind fuzzy, smart fuzzy can greatly reduce the number of invalid malformed files.

[3] – file fuzz

Next, the author will introduce the common file fuzzy thought according to some of his own experience.  

[3.1] – Blind Fuzz

The representative of blind fuzzy is the filefuzzy produced by idefense lab.

There are four mutation strategies of filefuzzy: all bytes, range, depth and math. All bytes is to modify the whole file by bytes, and change it to 4x00 or 4xxff or 3xxff specified by us. Range is to modify a range by bytes, which is convenient for us to carry out targeted testing, such as testing only a certain file header. Depth refers to depth test, which is carried out on the data of a certain offset, such as changing the data with offset of 0x2 to the range of 0x00-0xff. Match will change the qualified data to the corresponding data.

The original version of filefuzz is a little crude. Fortunately, filefuzz is open-source. We can modify the corresponding generation strategy according to our own strategy. We can build our own "intelligent data". The modified data can be not only 0x00 and 0xff, but also 0x3f, 0x7F, 0x01, 0x02, 0x80, 0xFE, 0x10, 0x20, 0x40, 0x60, etc. If you want to test integer overflow, you can modify it to some boundary data, such as 0xffffffff-1, 0xffffffff-2, 0xffffffff-3, etc.

In my opinion, filefuzz is more suitable for experienced people. Filefuzz has the ability of "fast, accurate and ruthless". As long as the policy is set well and the file format vulnerability is found well, it's very comfortable to dig holes with filefuzzy. "Simple is the most beautiful". Although filefuzz is very simple, as long as the strategy is obscene enough, filefuzz is the artifact.

[3.2] – File FormatBased Smart Fuzz

The intelligent file fuzzy described in this section is based on the file format specification, and the general process is shown in the figure above. In this kind of file fuzzy system, the most famous one is peach. Peach adopts XML format to define file format specification, which is called peach pit file.

The difficulty of using peach lies in the compilation of the peach pit file. When encountering a more complex file format, it is a little weak to define the file format with XML. "Peach has abused me thousands of times. I treat peach like my first love.". We must use peach in this way. Even though we are completely abused by peach, we still love peach.

Based on peach, we implemented 25 file format specifications of common file formats, tested the company's products and external products, and found more than n vulnerabilities of the company's products. In addition, it also shows a good ability of historical vulnerability discovery and 0day discovery.

1) Known vulnerability test results

The main test target is the known vulnerabilities disclosed on the recent exploit dB and other vulnerability databases, which can be found as follows:


2) 0day discovery test results

The main test target is the products of large Internet companies at home and abroad, including six high-risk 0days of mainstream Internet manufacturers such as apple, 360, Baidu and Xunlei. The details have been synchronized to the relevant manufacturers and fixed.

Generally speaking, the effect of peach is OK. It's just that it's too cumbersome to write a peach pit, and it's too hard to use it.

In addition, I prefer the combination of filefuzz and peach. I use peach to generate test cases, and use filefuzz to run in parallel in multiple virtual machines to reduce the fuzzy time.

[3.3] – Smart Fuzz

Recently, more and more intelligent file fuzz is proposed to improve code coverage through symbol execution, path constraint solving and other technologies. Among them, the more famous one is fuzzy grind. Fuzzgrind is an open-source file type intelligent fuzzy tool, which runs on Linux platform. It uses Valgrind pile insertion tool and STP solver. Its execution process is as follows:

This type of file fuzzy system does not need to know the file format specification, only needs to provide sample files, and improves code coverage through symbol execution. At present, the bottleneck lies in efficiency. As long as the complexity of the target program is high, a slightly larger program can not be tested at all. As long as we can break through the efficiency problem, this type of file fuzzy system is the leader of file fuzzy in the future.

[4] – outlook

[4.1]-010 Editor File Fuzz

Compared with the writing of peach pit template, the file format parsing script of 010 editor is relatively simple. Among them, the BT script of 010 editor can only parse the sample file and cannot modify it, while the 1sc script can access the data structure in the BT script and modify the data.

Idea: write the mutator of 010 editor, and access the BT script through 1sc script to realize the construction of abnormal samples. [4.2] - beauty of Mathematics

[4.2] - beauty of Mathematics

Mathematics is the queen of science. I believe that we can mine the loopholes by studying the causes of loopholes and building mathematical models. Interested comrades can try.

[5]- conclusion

This paper mainly introduces the common idea of file fuzz, as well as some ideas and prospects of file fuzz. If you have new ideas, please discuss them with me.

"As far as thought is concerned, so far can we go.".


[1] 0day security: Software Vulnerability Analysis Technology (Second Edition)

[2] Fuzzy testing: compulsory vulnerability mining