Recently, some knowledge about the breakthrough verification code has been studied and recorded. On the one hand, it's a summary of the knowledge learned in these days to help you understand; on the other hand, it's hoped to help the students who are studying this aspect of technology; on the other hand, it's also hoped to attract the attention of the website managers and take more into account when providing the verification code. As you have just come into contact with the knowledge in this field, your understanding is relatively simple, and mistakes are inevitable again. Welcome to clap bricks. The function of verification code: it can effectively prevent a hacker from making continuous login attempts to a specific registered user with a specific program brute force. In fact, the modern verification code is generally to prevent the machine from registering in batches and the machine from posting replies in batches. At present, in order to prevent users from using robots to automatically register, log in and fill in water, many websites have adopted the verification code technology. The so-called verification code is to generate a picture with a series of randomly generated numbers or symbols. Some interference pixels (to prevent OCR) are added to the picture. The user can visually identify the verification code information in the picture, input the form and submit it to the website for verification. Only after the verification is successful can a certain function be used. Our most common verification code is 1, four digits, a random number string, the most original verification code, and the verification effect is almost zero. 2. Random digital picture verification code. The characters in the picture are quite regular. Some of them may add some random interferon, and some of them are random character colors. The verification effect is better than the previous one. People who have no basic knowledge of graphics and iconography cannot break it! 3. Random numbers in various image formats + random capital English letters + random interference pixels + random positions. 4. Chinese character is the latest verification code for registration. It is generated randomly, which is more difficult to fight and affects the user experience. Therefore, there are few general applications. For the sake of simplicity, the main object of our explanation is type 2. Let's first look at some pictures of this kind of verification code which are more common on the Internet
(I don't know what happened, CSDN can't upload the pictures again, I put these four kinds of pictures in the download package, which can be downloaded for comparison and inspection.) these four kinds of patterns basically represent the verification code type mentioned in 2. Initially, it seems that the first picture is the easiest to crack, the second one, the third one is more difficult, and the fourth one is the most difficult. The truth? In fact, these three kinds of pictures are the same in difficulty. The first picture, the easiest, uses the same color for both the picture background and the number. The characters are regular and the positions of the characters are uniform. In the second picture, it seems not easy. In fact, a careful study will find out the rules. No matter how the background color and interferon change, verify that the characters are regular and the colors are the same, so it is very easy to exclude interferon, as long as all the non character pigments are excluded. The third picture seems to be more complex. Besides the background color and interferon mentioned above have been changing, the color of the validation character is also changing, and the color of each character is different. It seems that we can't break through the verification code. In this article, take one of the verification codes as an example to illustrate that the fourth picture is made by students themselves. In the fourth picture, in addition to the features mentioned in the third picture, two straight lines are added to the text, which seems difficult and easy to remove. The verification code recognition is generally divided into the following steps: 1. Take out the font 2. Binarization 3. Calculate the characteristics 4. Control sample 1: take out the verification code of the font recognition, after all, it is not professional OCR recognition, and because the verification code of each website is different, the most common method is to establish the verification code library. When removing the font, we need to download several more pictures, so that these pictures, including all the characters, our letters here only have pictures, so we only need to collect pictures including 0-9. 2: Binarization binarization is to use a number to represent 1 for each pixel on the verification number of the picture, and 0 for other parts. In this way, you can calculate each number pattern, record these patterns, and use them as keys. 3: Computing features binarize the image to be recognized to get the image features. 4: In the control sample, the character patterns of the three kinds of picture feature codes and the verification codes are compared to get the numbers on the verification pictures. Using the current method, the identification of verification code can be basically 100%. Through the above steps, you may have said that you did not find out how to take out interferon! In fact, the method of taking out interferon is very simple. An important feature of interferon is that it can't affect the display effect of the verification code, so when making interferon, its RGB may be lower or higher than a specific value. For example, in the picture I gave, the RGB values of interferon will not exceed 125, so we can easily remove interferon. PHP code
I did an example. You can download and crack the above verification code from here. Then we can use Snoopy (lighter than curl, so I like it) to simulate browser and visit the website.
111 original articles published, 40 praised, 1.28 million visitors+