special hacker pan shaohua: how to use ai "routine" to be a liar

Posted by punzalan at 2020-02-27

Editor in this paper

Is the plot of the domestic movie pale?

Because the best screenwriters in China are all writing scripts for Telecom fraud gangs.

If you are "lucky" to go through all kinds of Telecom fraud, you will be conquered by their daily thinking and their scripts.

However, there is always a deeper routine to deal with. Hacker God found that the swindler's routine in front of artificial intelligence and machine learning is just like the trick of a three-year-old bear child.

Know Chuangyu, known as the special forces of the Internet world, is the coolest security company in many people's minds. Pan Shaohua is the "chief of staff" of the special forces. He led the hackers to develop a set of anti fraud system against the sky, which can help the swindler to save the victims at the critical moment when he swung his hammer, make the swindler run in tears in the cold wind, and decide to be good in the dark night.

In this open course of hard innovation, Lei Feng invited pan Shaohua, the director of the Threat Intelligence Center and the director of Beijing R & D center, a cyber security special soldier with both beauty and wisdom.

He tells us about the theme of how hackers and special forces use artificial intelligence to set up fraud scammers.

[pan Shaohua]


A kind of

Hello, I'm pan Shaohua from Beijing zhichuangyu Information Technology Co., Ltd. The main research direction of our team is telecommunication network security, mainly focusing on telecommunication anti fraud and business anti fraud.

I have been interested in network security since 2001. I have been committed to making the Internet better and more secure since I joined in 2008.

I appreciate Einstein's saying:

The world is dangerous, not because of the wicked, but because of the indifferent.

It's also a driving force for me to fight against Mafia at the forefront.

80 billion "business" - fraud

The Internet brings a lot of convenience, but it is also used by bad people. We call them black or gray industrial chains for industries that get illegal profits on the Internet. In the early years, mafia elements were not so common because of online payment and personal privacy disclosure. But in the past two years, the number of reported cases of fraud is really increasing.

For example:

XX car owner, you have traffic violations at XX intersection today. Click on the link for details.

If we use a little mobile phone, open the Trojan link, it will be a direct hit.

This is a simple statistic. In 2015 alone, the report data, such as "guess who I am" and "fake public prosecution law", involved in phone fraud cases, the loss of the whole nation's people was about 22 billion. In addition, some fraud based on the website, together with mobile phone virus theft, the actual total loss should be more than 80 billion yuan. There should be millions of people across the country who visually provide technical capabilities behind fraudsters.

[part of fraud message display]

It's a common idea to blame operators for these scams. They charge so much money, but they make bad people rampant. Of course, operators have their own problems, but they have no choice.

For example, the real name system of telephone card is mainly implemented to combat Telecom fraud. But the fraudsters soon found some workarounds. So at present, the real name system in the fraud of SMS, the effect is not ideal.

Why is that?

Behind the fraudsters, there is a complete black industry chain, including:

Virus production,

Malicious website production,

Provide black mobile card,


Pseudo base station equipment (need to be produced by relevant professional factories),

SMS group sending platform (SMS generation itself is a gray area, and many black industry technicians will also directly provide technical support to fraudsters),

Money laundering related work (one million yuan, they will quickly divide the money into small amount, many transactions, wash it into a legal amount and transfer it out)

Each team is only a link in the chain of professional black production. For him, it can not only avoid legal risks, but also focus on a specific "black technology field".

This whole industrial chain is difficult to crack down on by operators alone, and even the resources that the public security can launch are limited. Therefore, this matter requires the participation of all parties from the civil society.

We have also done a lot of work that is not related to technology. For example, we launched the public welfare organization of security alliance with Tencent and Baidu. We shared 800 million databases of malicious web addresses and exchanged 50 million data of malicious web addresses every day. All the data are screened in the system by machine, and then enter the manual audit platform to make sure that there is no problem with the blackout data.

In addition, we also use machine learning method to detect and identify malicious data on the Internet using a large number of computing resources.

Comparison of two anti fraud methods

Netizens in some places can feel that when you visit a website, a security alert may pop up to remind you not to visit. After you receive the fraud call, you may receive a text message from the operator or the public security organ to remind you that you may be harassed by the fraud, do not believe it. This may be using our technology.

There is an evolutionary process in anti fraud.

1、 Disposal after the event

Blacklist system of operators

In the past, we would intercept based on the blacklist of operators. For example, we found that a phone call is a fraud call. We checked it manually to make sure it was a fraud call. So I blacklisted it a few days later.

Operators have some technical means to fight against fraud. For example, international terminal block. Operators can cut the number of international long-distance calls at a specific beginning and card them out. For example, "0002" is an irregular international call in itself. In addition, there are some 0057, 0058 starts, but the caller number with length less than 10 digits is also likely to have problems.

But for rigid rules, fraudsters have a way to deal with them:

For example, operators set five detection rules. But he will try new strategic breakthroughs, such as landing locally. After finding a workable method, fraudsters can always use this method to bypass interception.

The biggest problem is that the interception system based on blacklist has no way to update the number of fraud and harassment in real time.

Complicated reporting process

First, the victim should report the case, and then the public security organ should do technical consultation and investigation. After confirming that there is a problem, the public security organ will coordinate with the bank to freeze the funds, and finally solve the case.

However, there are many disadvantages in this method:

Often be cheated later, black produce transfers money immediately. When the real bank froze, it was empty. In addition, there are hundreds of thousands of communications fraud every year, which is hard to be covered by the strength of the police. Last year, Xu Yuyu's case became a national event, so it was quickly broken. But usually if you are defrauded of ten thousand yuan, the cost of solving a case by public security may be several hundred thousand yuan. Objectively, it is difficult to find out all the individual cases.

2、 Real time blocking method

We will analyze the types of fraud that have been popular recently. For example, this picture shows the following:

[popular fraud types]

For users to answer and make calls, we can find that the call is likely to be a fraud call in real time through machine learning, so we need to send a real-time alarm immediately.

When a user goes online, if we detect that he is visiting a phishing or fraud website, we can immediately block the website. The overall approach is to cut off the fraud process before the final loss arrives.

Next, I will elaborate on how real-time blocking is technically implemented.

We will deploy a real-time monitoring system in the operator network.

1. Call list collection. We will collect real-time bills from the call recorder.

2. Desensitization of talk list. Because the information of who to whom to call is sensitive information, we will desensitize through specific encryption algorithm. From these desensitized data, we can not know specific call records.

[phone list after desensitization]

We will hash the received calls. For the opposite number, we will keep the clear text (because it may be a fraud call).

3. Input machine learning system. For machine learning system, it doesn't need to know which specific number the phone is dialed by, it just needs to judge whether this kind of behavior is fraud.

By extracting data features and inputting them into machine learning system, event model can be used to judge which call behavior is fraud. In this process, we constantly use the cloud data and parameter adjustment to ensure the accuracy of the detection results - while the false alarm rate is low, we can detect as many fraudulent calls as possible.

4. Data decryption. Input the processed data into the data of the operator, and carry out symmetric decryption.

5. Warning prompt. After judging the fraud phone, the operator can make a choice and use his own work order system to prompt the user.

SMS reminder: you just received a fraud call, do not be fooled.

Flash reminder: through mobile pop-up, remind users of fraud.

Call reminder: call the user.

Color printing reminder: issue the color printing associated with the number to the user.

The core technology of machine learning

I. data

For machine learning system, the most important is data. The data comes from the 2.5 million active fraud number database in the cloud, including Internet users' reports and historical case data, which are used as training samples to let the machine learn how to identify a phone being dialed as a fraud phone.

Because many of these data are from the mobile client, so the information update is more timely, so the 2.5 million data is the latest.

2、 Machine learning system

Big data machine learning used to be superior, but now it has been applied in many fields. Machine learning is also an out of the box approach for us to work in specific areas of anti fraud.

We have built in more than 50 models of fraudulent phone calls. These include several elements, including:

Distribution of callees

Called duration distribution

Called time distribution (morning, evening, midnight)

User characteristics


We don't know which of these factors is most relevant to fraud. So we throw the data into the machine learning system for supervised or semi supervised learning, and find out the correlation automatically.

The number data features are roughly divided into six directions.

1. Number active feature data

For example, daily call times, average call time, earliest and latest call time and other basic statistical properties. A normal number should be the same number of incoming and outgoing calls, and will not be dialed continuously every day.

2. Social network of numbers

For example, the number of friends, the proportion of strangers, how many people your number has called, etc. Similarly, social network also includes the number it is dialed, which friends it has, whether there is a correlation between friends and the call number, and so on.

3. Behavior event flow of number

A number, please do something before and after, we will analyze it as an event flow. For example, I called five minutes ago and another four minutes later. How many of them are normal calls and how many are abnormal. For example, if a call is hung up in three seconds, or if the call lasts for ten minutes, it is extremely abnormal.

4. Behavior characteristics of numbers

For example, the number of calls between users and overseas numbers, the number of calls with fixed or short numbers, etc. Some swindlers call the landline number to cheat the teacher, and some swindlers call the mobile number. After a large amount of data, the statistical characteristics are still very obvious.

5. Number credit

When we accumulate a certain amount of data, we can establish the number credit. The behavior of normal users will be recognized as a white list, and numbers that do not match this behavior mode can be considered as low credit.

6. Number anomaly

For example, the abnormal behavior of the number and the abnormal call number will be included in the abnormal behavior file. For the numbers we think have problems, we will focus on monitoring and analysis.

Cross validation of event model and machine learning model

1、 Sudden increase model

For example, fraud numbers are suddenly used, and may disappear after a period of time. (because of a new number)

[model of sudden increase in call volume of fraud number]

This is a fraud number we detected. On January 12, 2015, there was almost no dial-up record. By the next day, the number of calls reached more than 100, and on the third day, it reached 1000. And after a week or so, its dialing number is directly reduced to zero. This characteristic is obvious.

2、 Event model

Fraudsters also have time costs. For him, it is necessary to dial out as many numbers as possible in the shortest time and get as many fish as possible. So it's impossible for a number to cheat someone and then abandon it. So we can always sum up and analyze his routine.

Let's take a look at a classic scam script:

Five or six scammers sit in one room and start the fraud process

1. First, use the + 185 automatic voice system to make a call to tell you that there are documents that have not been delivered successfully, and let you press the 9-turn manual. If you respond, the "service process" will follow. If you don't pick it up or hang up in two seconds, the next action will be cancelled.

2. A few minutes later, another person who pretended to be a police officer called. His purpose was to give information to testify and convince you of the scam. He will guide you to the "official website" to search for information, which you found on the Internet.

3. An hour later, the user received a phone call from the counterfeit Public Security Bureau.

4. According to the instructions of the Public Security Bureau, the user dials 114 to confirm the telephone number of the procuratorate.

5. The "confirmed" procuratorate called.

What are the rules behind a classic fraud routine? ]

The more later steps, the more cheaters "old drivers", that is, team leaders to operate. According to such an event model, you can string seemingly independent behaviors.

3、 Intelligent analysis of the mode based on call behavior

Telecommunication fraud can be compared with criminal cases.

For example, when a homicide case is found, we can use different dimensions to narrow the scope of suspicion. For example, the eyewitness found that this was a man. At 9:00 a.m. when the incident happened, he found the means of transportation according to the probe, and finally judged who was in conflict with the victim from the social relationship of the victim.

We can also use a similar method to narrow the encirclement.

If a number calls continuously, rarely dial in and only broadcast, the call duration is very long, and often call a large number of scattered strangers. Every time a rule is triggered, we give a score. If all triggers are triggered, the score will be higher.

Logically speaking, it is difficult for a normal call to trigger so many abnormal events at the same time.

[the probability of "black swan" event accumulated by abnormal events is very low]

In this way, we can distinguish between "shallow fraud" and "deep fraud". Shallow fraud is to call at will and try your luck. If you are willing to take care of it, you will get it. And deep fraud is just mentioned a few people work together, with "a whole set of services" to cheat you.

Misinformation and difficulties of anti fraud Technology

We will verify the false positives.

1. Historical test results. For the historical test results, we will send them to the cloud and use the third-party data, such as Tencent mobile manager, to check whether they match. Because of the difference between the two judgment logics, they can be used for verification.

2. Latest test results. The public security department and the operation chamber of Commerce shall carry out a sample check. For example, for a hundred users who have sent alarm messages, 30 of them are selected to make a follow-up call to confirm whether they have received the call of "I am your leader" or "guess who I am".

[feedback on customer service from the person receiving the fraud call]

Through the technology described above, the actual detection accuracy is as follows:

The accuracy rate of deception that pretends to be public prosecution is about 99%, because there is a whole set of procedures for this kind of deception, which is more conducive to judgment.

The accuracy rate of faking the tricks of acquaintances is slightly lower, 97%.

The accuracy rate of customer service fraud can reach 99%.

According to a half year trial in a city, the amount of users cheated has decreased by more than 70%.

But there are also some problems in our system, for example, we can't fully cover the fraud of counterfeiting public security law. Because the first swindler will brainwash the victim and ask him not to answer any more phone calls. He can only contact the "police" in a single line, or simply let the victim's phone be busy all the time. So sometimes we call back and we can't get in at all. When we can get through, the victim's money has been transferred.

Some time ago, Tsinghua University professor was cheated. At that time, Beijing Public Security Bureau had found out the situation. The police uncle called the teacher three times. But the scammer set too hard for the teacher. He told the teacher not to answer other people's phone. The teacher believed that the swindler was the real public security, and finally was cheated.

Here I would like to remind you not to harass the fraudsters easily. Why? For you, if you ignore him, you are one of his countless sunk costs. If you hook up with him, he will think that you are one of the possible targets and will stare at you. If you annoy him, he has energy to play with you.

A few days ago, a user molested the swindler. Two days later, his mobile phone number was suddenly blocked by major security companies, and the operator also blackmailed him. The reason is that in order to retaliate, the swindler copied his mobile phone number and sent a large number of spam messages.

In the end, I would like to say that although everyone from the perspective of bystanders, they feel that the cheated person is a bit silly. But when you're in a scam, sometimes it's really hard to get around. Whenever I see that these technologies can really stop the occurrence of fraud, I think the efforts of the team are worth it.


A kind of

Concerned about WeChat official account "home channel" and reply to "anti fraud" can be obtained.

Full ppt of open class and online video link of open class.

Listen to an open class and charge yourself!


"Pay attention to us as soon as you like"


The official account of Lei Feng's industry is reported.

Focus on cutting-edge technology and tell the story behind hackers.

Long press the QR code below and identify the concern