IMCAFS

Home

reverse analysis of ethereum smart contract (part1)

Posted by deaguero at 2020-03-07
all

Reverse analysis of Ethereum smart contract (Part2) portal: https://www.anquanke.com/post/id/106984

I. Preface

In this article, I will introduce the working principle of the Ethereum virtual machine (EVM) and how to reverse analyze the smart contract.

To disassemble the smart contract, I used the ethersplay plug-in developed by trail of bits for binary ninja.

2、 Ethereum virtual machine

Ethereum virtual machine (EVM) is a stack based, quasi Turing complete virtual machine.

1) Stack based: EVM does not rely on registers, and any operation will be completed in the stack. Operands, operators, and function calls are all on the stack, and EVM knows how to process data and execute smart contracts.

Ethereum uses postfix notation to implement stack based operation mechanism. In short, the last push of an operator can act on the previously pushed data.

For example: let's take a look at the 2 + 2 operation. In our mind, we know that the middle operator (+) means we want to perform the 2 + 2 operation. Put + between two operands is a way, we can also put it after two operands, that is, 22 +, which is the suffix representation.

2 + 2 + + 2 2 +

2) Quasi Turing complete: if all computable problems can be calculated, such programming languages or code execution engines can be called Turing complete. The concept doesn't care how long it takes to solve a problem, as long as the problem can be solved in theory. Bitcoin scripting language can not be called Turing complete language because of its limited application scenarios.

In EVM, we can solve all problems. But we still call it "quasi Turing complete", mainly because of the cost limitation. Gas is a calculable unit in EVM that can be used to measure the cost of an operation. When someone initiates a transaction on the blockchain, the transaction code and any subsequent code to be executed need to be executed on the miner's host. Because the code needs to be executed in the miner's memory, this process will consume the cost of the miner's host, such as power cost, memory and CPU calculation cost.

gas

In order to motivate the miners to ensure that the transaction goes smoothly, the person who initiated the transaction needs to declare gas price, or the price they are willing to pay for each computing unit. After taking this factor into account, for very complex problems, the amount of gas required will become very large. At this time, because we need to price gas, so in Ethereum, it is not cost-effective to consider complex transactions from an economic perspective.

gas price

3、 Bytecode and runtime bytecode

When compiling contracts, we can get contract bytecode or runtime bytecode.

Contract bytecode is the bytecode ultimately stored in the blockchain, and it is also the bytecode required for storing the bytecode in the blockchain and initializing the smart contract (running the constructor).

Runtime bytecode only corresponds to the bytecode stored in the blockchain, independent of the contract initialization and storage process.

Let's take the greeter.sol contract as an example to analyze the differences between the two.

Greeter.sol contract mortal { /* Define variable owner of the type address */ address owner; /* This function is executed at initialization and sets the owner of the contract */ function mortal() { owner = msg.sender; } /* Function to recover the funds on the contract */ function kill() { if (msg.sender == owner) selfdestruct(owner); } } contract greeter is mortal { /* Define variable greeting of the type string */ string greeting; /* This runs when the contract is executed */ function greeter(string _greeting) public { greeting = _greeting; } /* Main function */ function greet() constant returns (string) { return greeting; } }

When using the Solc -- bin greeter.sol command to compile the contract and get the contract bytecode, we can get the following results:

solc --bin Greeter.sol 6060604052341561000f57600080fd5b6040516103a93803806103a983398101604052808051820191905050336000806101000a81548173ffffffffffffffffffffffffffffffffffffffff021916908373ffffffffffffffffffffffffffffffffffffffff1602179055508060019080519060200190610081929190610088565b505061012d565b828054600181600116156101000203166002900490600052602060002090601f016020900481019282601f106100c957805160ff19168380011785556100f7565b828001600101855582156100f7579182015b828111156100f65782518255916020019190600101906100db565b5b5090506101049190610108565b5090565b61012a91905b8082111561012657600081600090555060010161010e565b5090565b90565b61026d8061013c6000396000f30060606040526004361061004c576000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff16806341c0e1b514610051578063cfae321714610066575b600080fd5b341561005c57600080fd5b6100646100f4565b005b341561007157600080fd5b610079610185565b6040518080602001828103825283818151815260200191508051906020019080838360005b838110156100b957808201518184015260208101905061009e565b50505050905090810190601f1680156100e65780820380516001836020036101000a031916815260200191505b509250505060405180910390f35b6000809054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff163373ffffffffffffffffffffffffffffffffffffffff161415610183576000809054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff16ff5b565b61018d61022d565b60018054600181600116156101000203166002900480601f0160208091040260200160405190810160405280929190818152602001828054600181600116156101000203166002900480156102235780601f106101f857610100808354040283529160200191610223565b820191906000526020600020905b81548152906001019060200180831161020657829003601f168201915b5050505050905090565b6020604051908101604052806000815250905600a165627a7a723058204138c228602c9c0426658c0d46685e1d9c157ff1f92cb6e28acb9124230493210029

If we use the Solc -- bin runtime greeter.sol command to compile, we can get the following results:

solc --bin-runtime Greeter.sol 60606040526004361061004c576000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff16806341c0e1b514610051578063cfae321714610066575b600080fd5b341561005c57600080fd5b6100646100f4565b005b341561007157600080fd5b610079610185565b6040518080602001828103825283818151815260200191508051906020019080838360005b838110156100b957808201518184015260208101905061009e565b50505050905090810190601f1680156100e65780820380516001836020036101000a031916815260200191505b509250505060405180910390f35b6000809054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff163373ffffffffffffffffffffffffffffffffffffffff161415610183576000809054906101000a900473ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff16ff5b565b61018d61022d565b60018054600181600116156101000203166002900480601f0160208091040260200160405190810160405280929190818152602001828054600181600116156101000203166002900480156102235780601f106101f857610100808354040283529160200191610223565b820191906000526020600020905b81548152906001019060200180831161020657829003601f168201915b5050505050905090565b6020604051908101604052806000815250905600a165627a7a723058204138c228602c9c0426658c0d46685e1d9c157ff1f92cb6e28acb9124230493210029

As shown above, we can see that runtime bytecode is a subset of contract bytecode.

4、 Reverse analysis

In this article, we use the ethersplay plug-in developed by trail of bits for binary Ninja to disassemble the Ethereum bytecode.

Our operation object is the greeter.sol contract provided by ethereum.org.

Greeter.sol

First, we can refer to the tutorial and add the ethersplay plug-in to binary ninja. As a reminder, we only reverse runtime bytecode, because this process is enough to tell us what the contract does.

Overview of tools

The etherplay plug-in can identify all functions in runtime bytecode and divide them logically. For this contract, etherplay found two functions: kill() and greet(). We'll show you how to extract these functions later.

kill() greet()

Article 1 instructions

When we launch a transaction to a smart contract, the first thing we encounter is the dispatcher of the contract. Dispatcher processes the transaction data and determines the specific functions we need to interact with.

The first instruction we see in dispatcher is:

PUSH1 0x60 // argument 2 of mstore: the value to store in memory PUSH1 0x40 // argument 1 of mstore: where to store that value in memory MSTORE // mstore(0x40, 0x60) PUSH1 0x4 CALLDATASIZE LT PUSH2 0x4c JUMPI

The push instruction has 16 different versions (push1 PUSH16). EVM uses different numbers to see how many bytes we push onto the stack.

PUSH PUSH1 PUSH16

The first two instructions (push1 0x60 and push1 0x40) represent pushing 0x60 and 0x40 into the stack, respectively. After these instructions are executed, the layout of the runtime stack is as follows:

PUSH1 0x60 PUSH1 0x40 0x60 0x40 1: 0x40 0: 0x60

According to the official document of solidity, mstore is defined as follows:

MSTORE

EVM will read the function parameters from the top to the bottom of the stack, that is to say, it will execute mstore (0x40, 0x60). This instruction has the same effect as MEM [0x40... 0x40 + 32]: = 0x60.

mstore(0x40, 0x60) mem[0x40...0x40+32] := 0x60

Mstore takes two elements out of the stack, so the stack is now empty. The next instructions are:

mstore PUSH1 0x4 CALLDATASIZE LT PUSH 0x4c JUMPI

After push1 0x4 is executed, there is only one element in the stack:

PUSH1 0x4 0: 0x4

The calldatasize function pushes the size of calldata (equivalent to MSG. Data) onto the stack. We can send any data to any smart contract, and calldasize will check the size of the data.

CALLDATASIZE msg.data CALLDATASIZE

After calling callatasize, the stack layout is as follows:

CALLDATASIZE 1: (however long the msg.data or calldata is) 0: 0x4

The next instruction is LT (i.e. "less than"), and its function is as follows:

LT

If the first parameter is smaller than the second, lt pushes 1 onto the stack, otherwise it pushes 0. In our example, according to the stack layout at this time, this instruction is LT ((hover long the MSG. Data or calldata is), 0x4) (judge whether the size of MSG. Data or calldata is less than 0x4 bytes).

lt lt((however long the msg.data or calldata is), 0x4)

Why does EVM need to check if the calldata size we provide is at least 4 bytes? This involves the function identification process.

EVM identifies the function by the first four bytes of the hash of the function keccak256. In other words, the function prototype (function name and required parameters) needs to be handed over to the keccak256 hash function. In this contract, we can get the following results:

keccak256 keccak256 keccak256("greet()") = cfae3217... keccak256("kill()") = 41c0e1b5...

Therefore, the function identifier of greet() is cfae3217, and that of kill() is 41c0e1b5. Dispatcher checks that the calldata (or message data) we send to the contract is at least 4 bytes in size to make sure we really want to interact with a function.

greet() cfae3217 kill() 41c0e1b5 calldata

The function identifier size is always 4 bytes, so if the data we send to the smart contract is less than 4 bytes, we cannot interact with any function.

In fact, we can see in assembly code how smart contracts reject non compliant behavior. If the calldatasize is less than 4 bytes, bytecode will immediately jump to the end of the code and end the contract execution process.

calldatasize

Let's see the judgment process in detail.

If LT ((hover long the MSG. Data or calldata is), 0x4) is equal to 1 (true, that is, calldata is less than 4 bytes), then after taking 2 elements from the stack, lt will press 1 into the stack.

lt((however long the msg.data or calldata is), 0x4) 1 lt 0: 1

The next instructions are push 0x4c and jumpI. After the push 0x4c instruction is executed, the stack layout is as follows:

PUSH 0x4c JUMPI PUSH 0x4c 1: 0x4c 0: 1

JumpI stands for "jump if" (jump if the condition is met), and jump to a specific label or location if the condition is met.

JUMPI

In this example, label is the 0x4c offset address in the code, and cond is 1, so the program will jump to the 0x4c offset.

label 0x4c cond 0x4c

Function scheduling

Take a look at how to extract the required functions from calldata. After the execution of the last jump I instruction, the stack is in the empty state.

calldata JUMPI

The commands in the second code block are as follows:

PUSH1 0x0 CALLDATALOAD PUSH29 0x100000000.... SWAP1 DIV PUSH4 0xffffffff AND DUP1 PUSH4 0x41c0e1b5 EQ PUSH2 0x51 JUMPI

Push1 0x0 pushes 0 to the top of the stack.

PUSH1 0x0 0: 0

The calldataload instruction accepts a parameter, which can be used as the index of the calldata data sent to the smart contract, and then reads another 32 bytes from the index. The instructions are as follows:

CALLDATALOAD

Calldataload pushes the 32 bytes read to the top of the stack. Since the index value received by the instruction is 0 (from push1 0x0 command), calldataload will read 32 bytes of calldata starting from 0 bytes, and then press it into the top of the stack (0x0 in the stack will be popped first). The new stack layout is:

CALLDATALOAD PUSH1 0x0 CALLDATALOAD 0x0 0: 32 bytes of calldata starting at byte 0

The next instruction is push29 0x100000000.

PUSH29 0x100000000.... 1: 0x100000000.... 0: 32 bytes of calldata starting at byte 0

The swapi instruction exchanges the top element of the stack with the ith element in the stack. In this case, the instruction is swap1, so the instruction exchanges the top element of the stack with the next first element.

SWAPi i SWAP1 1: 32 bytes of calldata starting at byte 0 0: PUSH29 0x100000000....

The next hop instruction is div, that is, div (x, y), or X / y. In this example, X is 32 bytes of calldata starting at byte 0, y is 0x100000000.

DIV div(x, y) 32 bytes of calldata starting at byte 0 0x100000000....

The size of 0x100000000.... is 29 bytes, the first is 1, and the last is 0. Previously, we read 32 bytes from calldata, divide the 32 bytes of calldata by 10000... And the result is the first 4 bytes of calldataload starting from index 0. These four bytes are actually function identifiers.

0x100000000.... 10000...

If you don't understand, you can compare the division of decimal system, 123456000 / 100 = 123456, and the operation process is similar in hexadecimal system. Divide a value of 32 bytes by a value of 29 bytes, and only the first 4 bytes are retained.

123456000 / 100 = 123456

The results of the div operation are also pushed onto the stack.

DIV 0: function identifier from calldata

The next instructions are push4 0xFFFFFF and and. In this example, the corresponding operation is to and 0xFFFFFF with the function identifier sent by calldata. This clears the last 28 bytes of the function identifier in the stack.

PUSH4 0xffffffff AND 0xffffffff AND

This is followed by a dup1 instruction that copies the first element of the stack (in this case, the function identifier) and pushes it to the top of the stack.

DUP1 1: function identifier from calldata 0: function identifier from calldata

Next is the PUSH4 0x41c0e1b5 instruction, which is the function identifier for kill (). We pushed the identifier onto the stack to compare it with the function identifier of calldata.

PUSH4 0x41c0e1b5 kill() 2: 0x41c0e1b5 1: function identifier from calldata 0: function identifier from calldata

The next instruction is EQ (that is EQ (x, y)), which will pop X and y out of the stack. If they are equal, press 1, otherwise press 0. This process is the dispatcher's "scheduling" process: the calldata function identifier is compared with all the function identifiers in the smart contract.

EQ eq(x,y) 1: (1 if calldata functio identifier matched kill() function identifier, 0 otherwise) 0: function identifier from calldata

After executing push2 0x51, the stack layout is as follows:

PUSH2 0x51 2: 0x51 1: (1 if calldata functio identifier matched kill() function identifier, 0 otherwise) 0: function identifier from calldata

The reason for pressing 0x51 is that when the condition is satisfied, you can jump to this offset through the jumpI instruction. In other words, if the function identifier sent by calldata matches kill(), the execution process will jump to the 0x51 offset in the code (that is, the location of kill() function).

0x51 JUMPI kill() 0x51 kill()

After the execution of the jumpI, we either jump to the 0x51 offset position or continue to execute the current process.

JUMPI 0x51

Now there is only one element in the stack:

0: function identifier from calldata

Careful readers will notice that if we don't jump to the kill() function, the dispatcher will still use the same logic to compare the calldata function identifier with the greet() function identifier. Dispatcher will check each function in the smart contract, and if it cannot find a matching function, it will lead us to the program exit code.

kill() greet()

Five, summary

The above is a brief introduction to the working principle of Ethereum virtual machine. If you want to know about the security of Ethereum or blockchain, please pay attention to my twitter.