android reverse -- so decompilation analysis from shallow to deep (reply reward)

Posted by lipsius at 2020-03-02

If you can, write your own so file library, and then decompile to learn by yourself. Such progress is the fastest.

This analysis is only one. I have to work hard.

Hai UU user manual ~ learn about it


1. I used to run too fast when I was learning how to reverse Android. I didn't chew a lot of things and swallowed them directly. Although it seems to have some achievements, it quickly entered the bottleneck state. So it's better to settle down slowly.

2. Most of the analysis articles on the Internet are direct F5. Although they are quick to solve problems, they are not conducive to learning. Depending on the machine is not a good way to learn. Maybe we can adjust the efficiency in our work.

3. Consolidate the basic knowledge to climb higher.

4. Don't advise when you see the assembly. It's all paper tigers. It's just a little bit of a detour.


1. Timeless helloword

2. Interesting + - */

3. On the complexity of arm processing%

4. Discuss whether replacing% will improve operation efficiency

4. Calculation of EOR logic implemented by Python

How to get so files? Let me mention this a little. Android studio choose C + +. Then create a project. Baidu will not be invited. Some things don't repeat.

A lot of learning programming starts from HelloWorld. Of course, the reverse can also start from HelloWorld. Of course, because it is too simple, it will shorten the length.

1. Program code

No explanation.

void Hello() {     printf("HelloWorld"); }

2. Reverse so file

This is our arm assembly file.

PUSH            {R7,LR} MOV             R7, SP SUB             SP, SP, #8

Arm preparation stage can be ignored selectively.

ADD             SP, SP, #8 POP             {R7,PC}

The arm end field can also be optionally ignored.

But when you see this, you can simply understand that it is the first time to open up memory, and then release memory after use.

What else is left

LDR             R0, =(aHelloworld - 0x430C) ADD             R0, PC  ; "HelloWorld" BLX             printf STR             R0, [SP,#0x10+var_C] LDR             R0, =(aHelloworld - 0x430C)

The LDR takes a piece of data from the enclosure. Is our HelloWorld string.

ADD             R0, PC  ; "HelloWorld"

This is equivalent to return

BLX             printf

BL: jump with link. First, the address of the next instruction of the current instruction is saved in the LR register, and then the skipped label. Usually used to call subroutines, it can be returned by adding mov PC at the end of subroutines, LR. BX: jump with state switch. When the lowest bit is 1, switch to thumb instruction execution; when it is 0, it is interpreted as arm instruction execution. BLX is a combination of the two.

STR             R0, [SP,#0x10+var_C]

Take out the things at stack 4 to R0. Generally, R0 is the data that needs return.

3. summary

There is a big gap between C code and arm. There

1. Program code

void youqu() {     int a,b,c,d;     a=1+1;     b=2-1;     c=2*2;     d=4/2; }

Simply write one.

2. Reverse so file

Here we see interesting places.

SUB             SP, SP, #0x10 MOVS            R0, #2 STR             R0, [SP,#0x10+var_4] MOVS            R1, #1 STR             R1, [SP,#0x10+var_8] MOVS            R1, #4 STR             R1, [SP,#0x10+var_C] STR             R0, [SP,#0x10+var_10] ADD             SP, SP, #0x10 BX              LR SUB             SP, SP, #0x10

This is to allocate a stack space of 0x10, which is 16 bits, resulting in stack 16.

MOVS            R0, #2

Give immediate 2 to R0

STR             R0, [SP,#0x10+var_4]

Give the content of R0 to the location of stack C

MOVS            R1, #1

Give immediate 1 to R0

STR             R1, [SP,#0x10+var_8]

Store R1 in stack 8

MOVS            R1, #4

Store immediate 4 in R1.

STR             R1, [SP,#0x10+var_C]

Put immediate 4 in stack 4.

STR             R0, [SP,#0x10+var_10]

Put immediate 4 in stack 0.

ADD             SP, SP, #0x10

Stack ends.

3. summary

Do you see anything here? Why are they interesting? Do they feel like they haven't seen the + - * / work in arm assembly, which is the final result directly. Why do you deal with it like this. Arm assembly will optimize the program and directly become the result. It will not put operations in assembly language.

1. Program code

        int a,b;     a=10;     b=5;     a=a%b;

2. Reverse so file

.text:000042CC                 PUSH            {R7,LR} .text:000042CE                 MOV             R7, SP .text:000042D0                 SUB             SP, SP, #0x10 .text:000042D2                 MOVS            R0, #0xA .text:000042D4                 STR             R0, [SP,#0x18+var_C] .text:000042D6                 MOVS            R0, #5 .text:000042D8                 STR             R0, [SP,#0x18+var_10] .text:000042DA                 LDR             R0, [SP,#0x18+var_C] .text:000042DC                 LDR             R1, [SP,#0x18+var_10] .text:000042DE                 BL              sub_13F38 .text:000042E2                 STR             R1, [SP,#0x18+var_C] .text:000042E4                 STR             R0, [SP,#0x18+var_14] .text:000042E6                 ADD             SP, SP, #0x10 .text:000042E8                 POP             {R7,PC} .text:000042CC                 PUSH            {R7,LR} .text:000042CE                 MOV             R7, SP .text:000042D0                 SUB             SP, SP, #0x10

Arm preparation stage.

MOVS            R0, #0xA

R0 is assigned as 0xa, that is, 10

STR             R0, [SP,#0x18+var_C]

Put the contents of R0 in stack C.

MOVS            R0, #5

Assign immediate 5 to R0.

STR             R0, [SP,#0x18+var_10]

Put the contents of R0 in stack 8.

LDR             R0, [SP,#0x18+var_C]

Take out the contents of stack C R0

LDR             R1, [SP,#0x18+var_10]

Take the contents of stack 8 to R1

BL              sub_13F38

Jump to sub 13f38 and store BL.

Let's take a look at sub f38. And confirm that R0 = 10, R1 = 5.

.text:00013F38                 CMP             R1, #0 .text:00013F3A                 BEQ             loc_13F26 .text:00013F3C                 PUSH.W          {R0,R1,LR} .text:00013F40                 BL              sub_13E78 .text:00013F44                 POP.W           {R1,R2,LR} .text:00013F48                 MUL.W           R3, R2, R0 .text:00013F4C                 SUB.W           R1, R1, R3 .text:00013F50                 BX              LR CMP             R1, #0

Compare R1 with immediate 0

BEQ             loc_13F26

If BEQ is equal or 0, jump. Here, if R1 = 0, jump.

PUSH.W          {R0,R1,LR}

Stack R0 and R1. R0=10, R1=5

BL              sub_13E78

Go to sub e78 and save LR.

Let's take a look at sub e78

EOR.W           R12, R0, R1

Here is the exclusive or operation of R0 and R1. Use our script for the calculation. The result is 15.


N = = 1, which means negative.

NEGMI           R1, R1

Complement R1. Since this is Mi, complement - 5

BEQ             loc_13EF6

After the same jump.


3. summary

This is the% logic, involving three if logic and one loop logic. Although it is only a simple% operation, it can only be processed by running a lot in the arm machine. More operation will involve a problem, loss problem.

First we write a code like this

        int a,b;     a=10;     b=5;     a=a/b;     a=a*b;     a=10-a;

This is an alternative to taking the remainder.

I thought it would improve efficiency. But.

This logic is the logic of division. Does this logic look more complex than the algorithm of redundancy. So our previous guess was that there was a problem.

Paste the code directly. There is no filtering operation here. It belongs to the early version. If necessary, you can improve it by yourself.

import sys def eor(a,b):         if a==b:                 return str(0)         else:                 return str(1) # Complement 8 bits def eight(a):         c=""         d=8         if len(a)<d:                 while 1:                         c=c+str(0)                         d=d-1                         if d==len(a):                                 break                 a=c+a         return a a=bin(int(sys.argv[1]))[2:] b=bin(int(sys.argv[2]))[2:] a=eight(a) b=eight(b) c="" for i in range(0,8):         c=c+eor(a,b) print c ``