how can i analyze the authenticity of website operation data?

Posted by lipsius at 2020-03-01

Author brief introduction

Zhan long, a core member of the research and Defense Laboratory of Weishitong, is a security researcher.

Now engaged in the research of communication security and password technology, the research field involves password cracking, password strength verification, system security and reinforcement.

He has worked in many security enterprises and accumulated rich experience in password cracking.


In 2016, there were endless data leaks. In May, there were 117 million login credentials and 427 million MySpace user credentials of LinkedIn, which were sold by hackers at the prices of 5 and 6 bitcoin respectively.

In July 2016, someone leaked the database of an overseas website before March 18, 2016 and published it on the Internet.

The data published on the Internet is quite messy. In order to analyze the user's password usage habits, we sorted out the data, deleted the sensitive information and only kept the password field.

The database of this overseas website comes from the Internet. This paper analyzes the password behavior of this website only for the purpose of security research, so as to enhance everyone's security awareness and password usage habits. Here, we solemnly declare that we are opposed to any illegal and unauthorized penetration testing activities, especially the dragging of warehouses.

Password preprocessing

First, we use awk text processing tool to preprocess the password and extract hash value

awk -F ","    '{a[$2]++}END{for(i in a){print i,a[i] }}'\private\database.sql > passwords.log

Then, sort by repetition rate

sort -k 2 passwords.log > passwords.sort

Analysis of encryption method

First, we need to analyze the encryption algorithm of the password.

Here, we calculate the MD5 value of the weak password "123456". After comparison, we find that the MD5 value is the same as the 19th hash value, so we can basically determine that the password is a single MD5 encryption, without salt, and without MD5 twice, which leads to the password is easy to be cracked into plaintext.


From the above table, we can see that the password of 850000 users is empty, and that of the common weak password (88888817178899123456) is 270000.

In addition, most of the remaining user passwords start with fwe and qwe, and end with 123456. The number of different passwords is roughly the same.

Although all of them are weak passwords, the above weak passwords are quite different from our common weak passwords.


Common weak password in China

      000000,111111,11111111,112233,123123,123321,123456,12345678,654321,666666,888888,abcdef,abcabc,abc123,a1b2c3,aaa111,123qwe,qwerty,qweasd,admin,password,[email protected],passwd,iloveyou,5201314

Common password in foreign countries


Based on the above analysis, I want to quickly climb to the top of the free app list of app stores in the United States, Hong Kong, Indonesia, Taiwan, Singapore and other places in the short term.

Therefore, can we have reason to doubt whether these users are real users? These users are probably zombie users who exist to swipe the rankings.

Password length analysis

The total number of passwords is 20463553 (about 20 million), most of which are 8-digit passwords, and the total number of 8-digit passwords is 14276613 (about 14 million), accounting for 67.77% of all passwords.

The password policy of this app requires a password length of 8-20 bits, so most people choose 8 bits, so it's not hard to understand. Because the password is too long and has no meaning, it is difficult for users to remember.


① 8, 9, 10, 11 digit passwords account for 84% of the total passwords.

② According to the proportion of the length of the password from high to low: 8 > 10 > 9 > 6 > 7.

PS: Although the length of the password limit needs to be at least eight digits, it can be found that there is a password with a length less than eight digits.

③ The proportion of 8-digit password was 67.77%.

Analysis on the structure of cipher composition

From the analysis of the above figure, it can be concluded that in the 8-bit password, there are few pure numbers, but more mixed numbers and letters.

With L for characters and D for numbers, we analyze the structure of mixed passwords of various lengths.

Experience and lesson

The user password of this website is stored in the database after only calculating MD5 once, which is very insecure. We should adopt a more secure hash function, and add salt to the user password to enhance the data protection ability of the website.

Disclaimer: the purpose of this paper is to carry out security research and improve the security awareness of users. The database comes from the open data on the network.

Other recommendations:

0. Heavy sharing | white hat hacker talking about advisory sales and service

1. Security point of view: Top 10 pain points of enterprise information security, did you win?

2. How to build a security team for a growing Internet enterprise - the first quarter

3. This is the Internet and security team needs several talents!

4. White hat hacker: ex, remember our agreement to go to the top of Taiping mountain?

5. Young hacker: my first girlfriend, where are you?

6. Spring Festival! What are the top 500 security companies doing

7. Investment in domestic information security since 2015

8. 301: on the current situation and attack trend of Internet Security

9. You must see! 301 on the current situation of salary for domestic security talents

10. 301: Discussion on the current situation and development of cloud computing service platform from the perspective of security

Pay attention to official account number 301 by 2D code

Cooperation contact: 2036234 (remark unit + name)