teach you how to stack overflow from the beginning to the end (2)

Posted by fierce at 2020-04-08

0x00 write in front

First of all, let's broadcast the final results of 2017 pwn2own competition. A total of 51 loopholes were found in this competition, and Changting security laboratory contributed 11, with a total score of 26 points, ranking the third in 11 teams! At the same time, I would like to congratulate the domestic security team on taking the top three of this competition!

0x10 review of previous period

In the previous article, we introduced the principle of stack overflow and two execution methods, both of which are to execute the input instruction fragment (shellcode) or the function in the dynamic library (return2libc) by overwriting the return address. This article will continue to introduce the other two implementation methods. One is to overwrite the return address to execute the existing code fragment (ROP) in memory, the other is to replace the address of one function with the address of another function (hijack got).

0x20 related knowledge

0x21 register

In the previous background, we mentioned three registers related to function states - ESP, EBP, EIP. The following will cover more registers, so let's briefly introduce the different uses of registers in executing program instructions.

Registers in 32-bit x86 architecture can be simply divided into general registers and special registers. General registers can be used arbitrarily in most assembly instructions (although some instructions specify the specific purpose of some registers), while special registers can only be used by specific assembly instructions, and can not be used to store data arbitrarily.

General registers in 32-bit x86 architecture include general registers (eax, ebx, ECX, EDX), index registers (ESI, EDI), and stack pointer registers (ESP, EBP).

General registers are used to store run-time data. They are the most commonly used registers for instructions. In addition to storing general data, each general register has its own fixed and unique purpose. Eax is called an accumulator, which is used to perform arithmetic operations and return function results. Ebx is called the base register, which is used to store the base address in memory addressing (such as array operation). ECX is called a counter, which is used to count during a loop. EDX is called data register (data), which is often used with eax to store operation results and other data.

Index register is usually used in string operation. ESI points to the data address to be processed (source index) and EDI points to the data address (destination index) where the processing result is stored.

Stack pointer registers (ESP, EBP) are used to store the state of a function in the call stack, which has been described in detail in the previous article.

Special registers in 32-bit x86 architecture include segment address register (SS, CS, DS, ES, FS, GS), flag bit register (EFLAGS), and instruction pointer register (EIP).

Modern operating system memory usually stores different types of information in the form of segments. The function call stack we talked about in the previous article is a block segment. Memory segmentation also includes heap segment, data segment, BSS segment, and code segment. Code snippets store executable code and read-only constants, such as constant strings, and properties are readable, executable, but generally not writable. The data segment stores the initialized global variables and static local variables whose initial value is not 0. The BSS segment stores the uninitialized global variables and static local variables whose initial value is 0. These two pieces of data have writeable attributes. Heap is used to store dynamically allocated memory during program running. For example, malloc() and free() functions in C language allocate and release memory on the heap. The arrangement of segments in memory is shown in the figure below.

Fig1. Typical layout of memory segmentation

Segment address register is used to store memory segment address, where register SS stores the address of function call stack segment, register CS stores the address of code segment, register DS stores the address of data segment, ES, FS, GS are additional registers to store the address of data segment.

Most OF the 32 bits of EFLAGS are used to mark the status of data or programs, such as Overflow Flag, IF Interrupt Flag, ZF (Zero Flag) and CF (Carry Flag).

The instruction pointer register (EIP) stores the address of the next running instruction, which has been described in detail in the previous article.

0x22 assembly instruction

In order to better understand the later content, a little knowledge of assembly language is also necessary. This section will introduce the meaning of some basic instructions. The assembly language in 32-bit x86 architecture has two formats, Intel and at & T. the assembly instructions used in this paper are all in Intel format. The main differences between the two are as follows.

Intel format, unsigned before register name and value:

At & T format, register name is preceded by "%", and value is preceded by "$":

Some of the most commonly used assembly instructions are as follows:

After introducing the above background knowledge, we can continue the way of stack overflow technology.

0x30 ROP ( Return Oriented Programming )

- modify the return address to point to an existing instruction in memory

According to the subtitle above, the tasks to be completed include: determining the address of an instruction in memory and overwriting the return address with it. However, since the return address can be overwritten and located to the memory address, why not use the return2libc mentioned above directly? Because sometimes the target function can not be found in memory, sometimes the target operation does not have a specific function to fit perfectly. At this time, we need to find multiple instruction fragments in memory, and make up a series of operations to achieve the goal. If you want to execute an instruction (we call it "gadget", which means gadget), the overflow data should be constructed in the following way (see the previous section for the determination of padding length and content):

Figure 2. Overflow data containing a single gadget

If you want to execute several pieces of instructions in succession, you need to finish each gadget and give control to the next gadget. So the last step of a gadget should be a RET instruction, so that the program's control (EIP) can be switched, so this technology is called return oriented programming. To execute multiple gadgets, the overflow data should be constructed as follows:

Under such a structure, when the called function returns, it will jump to execute gadget 1. After execution, the RET instruction of gadget 1 will pop up the stack top data (that is, the address of gadget 2) to EIP, and the program will continue to jump to execute gadget 2, and so on.

Figure 3. Overflow data containing multiple gadgets

Now the task can be divided into: according to the effect of program stack overflow, find some instruction fragments ending with RET, and fill their addresses into overflow data according to the above structure. So we have to solve the following problems.

First, what is the effect of stack overflow?

The common patching effect of ROP is to implement a system call. The corresponding assembly instruction in Linux system is int 0x80. When executing this instruction, the number of the called function shall be stored in eax, and the call parameters shall be stored in ebx, ECX, EDX, ESI, EDI in sequence. For example, number 125 corresponds to a function

mprotect (void *addr, size_t len, int prot)

, you can use this function to change the stack's properties to executable, so you can use shellcode. If we want to use system call to execute this function, eax, ebx, ECX and EDX should be "125", the segment address of memory stack (can be determined by debugging tool), "0x10000" (the space length to be modified, maybe longer), "7" (RWX permission).

Secondly, how to find the corresponding instruction fragment?

There are several open-source tools that can search for instruction fragments ending in RET, including ropgadget, RP + +, ropeme, etc., and even use grep and other text matching tools to search RET in assembly instructions for further filtering. The detailed process of the search will not be described here, and interested students can refer to the description documents of the above tools.

Finally, how do I pass in parameters for system calls?

For the above-mentioned mprotect function, we need to transfer the parameters to the register, so we can use pop instruction to pop the stack top data into the register. If you can find directly available data in memory, you can also use mov instructions for transmission, but it's easier to write data and pop than to search and mov first, right? If you want to use the pop instruction to transfer the call parameters, you need to include these parameters in the overflow data, so the above overflow data format needs a little modification. For a single gadget, the data transmitted by the pop should be after the gadget address, as shown in the following figure.

Fig 4. gadget “pop eax; ret;”

After calling mprotect() to open the executable permission for the stack, we want to execute a segment of shellcode, so we need to add shellcode to the overflow data, and add the start address of shellcode to the gadget of int 0x80. But it's hard to determine the exact address of shellcode in memory , we can use push esp as a gadget.

Fig 5. gadget “push esp; ret;”

Let's assume that the following instructions can be found in memory:

pop eax; ret; # pop stack top into eax pop ebx; ret; # pop stack top into ebx pop ecx; ret; # pop stack top into ecx pop edx; ret; # pop stack top into edx int 0x80; ret; # system call push esp; ret; # push address of shellcode

For all the gadgets containing pop instructions, the transmission data of pop should be added after their addresses, and at the same time, a piece of shellcode should be included at the end of all the gadgets. Finally, the overflow data structure should be changed to the following format.

Figure 6. Overflow data containing multiple gadgets (after modification)

For simplicity, it is assumed that the input overflow data is not affected by the "\ X00" character, so the payload can directly contain "\ x7d \ X00 \ X00 \ X00" (parameter 125 passed to eax). If you want to achieve more realistic operation, you can use multiple gadgets to get the above parameters through operation. For example, you can pass parameters to eax through the following three gadgets.

pop eax; ret; # pop stack top 0x1111118e into eax pop ebx; ret; # pop stack top 0x11111111 into ebx sub eax, ebx; ret; # eax -= ebx

After solving the above problems, we can splice the overflow data and input it into the program to open the executable permissions for the program call stack and execute shellcode. At the same time, because of the flexibility of the ROP method, it is no longer necessary to painfully probe the shellcode start address. Looking back at the whole input data, only the segment address of the stack needs to be determined. If we use gadget to read the EBP value and add a proper value, we can ensure that the overflow data has executable rights, so that we no longer need to get the exact address, and it is possible to bypass the memory randomization.

For the purposes of the demonstration, we assumed (almost exactly) the existence of all the required gadgets. In the actual search and splicing of gadgets, it will not be as smooth as the above, there are two aspects to pay attention to.

First, many times, it is not possible to gather all the ideal instruction fragments at once. At this time, it is necessary to "save the nation by curve" by means of data address offset, data transmission between registers, etc. For example, suppose you can't find the following gadget

But if you can find the gadget below

So we can combine it with

Combined to realize the function of data transmission to ebx. As mentioned above, using multiple gadgets to avoid typing "\ X00" is also an instance application.

Second, be careful whether the gadget will break the implemented parts of the previous gadgets, such as modifying a register that has written a value. In addition, special attention should be paid to the operation of the gadget on EBP and ESP, because their changes will change the location of the return address, so that subsequent gadgets cannot be executed.

0x40 Hijack GOT

- modify the address of a called function to point to another function

According to the subtitle above, the tasks to be completed include: modifying the address of a function in memory to point to another function. For ease of understanding, let's assume that you change the address of the printf() function to point to system(), so that the call to printf() in the program executes the system() function after the modification. To implement this process, we need to find out how the program "finds" the called function when a function call occurs.

When a program calls an external function, it needs to link the external function to the program when it generates an executable file. The way of linking is divided into static linking and dynamic linking. The executable file obtained by static link contains all the code of external functions. The executable file obtained by dynamic link does not contain the code of external functions, but loads the dynamic link library (a collection of several external functions) to a certain location of memory at runtime, and then links the library to locate the required functions at the time of making the call.

How does the program navigate to the required function in the linked library? This process uses two tables - got and PLT. The full name of got is the global offset table, which is used to store the exact address of external functions in memory. The got is stored in the data segment and can be modified during program operation. The full name of PLT is the procedure linkage table, which is used to store the entry point of external functions. In other words, the program will always find the address of external functions in PLT. PLT is stored in the code segment, which has been determined and will not be modified before running, so PLT does not know the exact location where the DLL is loaded when the program is running. So what are the entry points stored in the PLT table? Is the address of the corresponding entry in the got table.

Figure 7. PLT and got table

Wait, we seem to find an unreasonable place. The memory address of the external function is stored in the got table instead of the PLT table, and the entry point of the PLT storage points to the corresponding entry of the got. Why does the program choose PLT instead of the got as the entry point of the call? Is it not more convenient to determine the memory address of all external functions and write it to the got table when the program starts, and then only use the got table? This design is for the efficiency of the program. The initial values of the got table all point to a segment in the corresponding entry of the PLT table, which is used to call a function address resolution function. When the program needs to call an external function, first find the corresponding entry point in the PLT table and jump to the got table. If this is the first time to call this function, the program will jump back to the PLT table through the got table again, run the address resolver to determine the exact address of the function, and use it to cover the initial value of the got table, and then execute the function call. When the function is called again, the program still jumps to the got table through the PLT table first. At this time, the got table already has the memory address to get the function, so it will jump to the address where the function is to execute the function directly. The whole process is shown in the following two figures.

Figure 8. When the function is called for the first time, the address of the function is parsed and stored in the got table

Figure 9. Read the address in the got directly when calling the function again

The above implementation follows a design idea called lazy, which leaves the operations to be completed (parsing the memory address of the external function) until the call actually occurs, rather than resolving all the function addresses at the beginning of the program. This process also enlightens us how to realize the function camouflage, that is to change the address of function a to the address of function B in the got table. In this way, all subsequent calls to function a will execute function B.

Then our goal can be divided into the following parts: determine the entry position of function a in the got table, determine the address of function B in memory, and write the address of function B into the entry of function a in the got table.

First, how to determine the entry position of function a in the got table?

When a program calls a function, it jumps to the corresponding entry of the got table through the PLT table, so the entry point position of the function in the PLT table can be found in the assembly instruction of the function call, so as to locate the entry of the function in the got.

for example

call 0x08048430 <[email protected]>

It means that the entry point of printf in PLT table is 0x08048430, so the entry address of printf in got table is stored at 0x08048430.

Secondly, how to determine the address of function B in memory?

If the system turns on Memory Layout Randomization, the loading position of dynamic link library is random every time the program runs, it is difficult to determine the function address directly through debugging tools. If function B has been called before the stack overflow, we can certainly get the address through the answer to the previous question. But our favorite attack function often does not meet the requirements of being called, that is, there is no real memory address in the got table. Fortunately, the relative position of the function in the DLL is fixed, which is determined when the DLL is packaged and generated. So if we know the runtime address of function a (read the contents of got table) and the relative position of function a and function B in the DLL, we can calculate the runtime address of function B.

Finally, how to modify the data in the got table?

It's hard to find the right function to do this, but we have a powerful ROP. If we can find the following several gadgets, it is not difficult to rewrite the data in the got table, so as to realize the camouflage of functions. Please refer back to the previous chapter for the specific implementation of ROP, which will not be covered here.

pop eax; ret; # [email protected] -> eax mov ebx [eax]; ret; # [email protected] -> ebx pop ecx; ret; # addr_diff = system - printf -> ecx add [ebx] ecx; ret; # [email protected] += addr_diff

As can be seen from the process of modifying the got table, this method can also bypass memory randomization to some extent.

0x50 defense measures

After introducing several basic methods of stack overflow, let's add some common measures to prevent stack overflow in the operating system. First of all, the program usually cancels the executable permission of the data on the stack under the default compilation setting, so that the simple shellcode overflow attack can not be realized. Secondly, memory layout randomization (ASLR) can be turned on in the operating system, which makes it more difficult to determine the memory address of data in the stack and functions in the dynamic library. When compiling a program, you can also set some compiling options so that the program will generate a special value between the EBP address and the return address on the function stack at runtime. This value is called "Canary" (about this allusion, please Google yourself). In this way, once the stack overflow occurs and the return address is covered, the value will be rewritten, thus realizing the function stack overrun check. Finally, it should be emphasized that safe and reliable code should be written as much as possible, and the possibility of writing out of bounds should not be provided for stack overflow.

0x60 summary

This paper briefly introduces stack overflow, an old and classic technology field, and describes four entry-level implementation methods. So far, the first lecture of our column is over. Next, more articles related to computer security will be published in the column, which will also cover network security, web penetration, cryptography, operating system, CTF competition solution, etc

Finally, thank you for your attention. Let's learn and make progress together!


《Hacking: Art of Exploitation》

2015 | sploitF-U-N