conventional characteristics and disposal options in risk control countermeasures

Posted by millikan at 2020-03-06

After a long struggle in my heart, I finally decided to settle down and share the experience of feature and punishment selection in the actual confrontation of risk control in these years. The reason why they have been reluctant to write down this part is that not only risk control practitioners, but also many black industry practitioners pay close attention to the dynamics of the whole risk control industry in the dark, so many enterprises regard the identification method as the core secret. But on the other hand, in addition to a small number of Unicorn enterprises, most of them are relatively junior in risk control, and do not have some core thinking methodology. Therefore, based on the above two points to weigh, we decided to share some basic thinking methods in risk control confrontation with you, so as to help novice risk control students to quickly form an effective force.

This article focuses on feature selection and punishment selection in confrontation.

1、 Feature selection in risk control

We know that in real combat risk control, in order to pursue profit, the black industry will minimize the cost to the greatest extent. In order to ensure the controllable cost, the strategy adopted by the black production in the attack is to be simple and not complicated, to be machine and not man-made. In short, it is a goal: to achieve the harvest of benefits. For specific introduction, please refer to another article of mine:

Sun Yajian: Practical Guide to risk control of Internet business

So when we do risk control, our basic idea is to find different and collective behaviors. In what ways can we look for so-called different or aggregative behaviors?

1. Search for features through the legitimacy of basic information

Basic information refers to the information collected automatically by the system when a user is operating. Taking information publishing as an example, when we check a posting data, we can definitely get the information of posting IP, posting phone, address, email, enterprise, real name and so on. This information is the basic information. When we observe the data, we will find that the basic information of some users does not conform to the usage habits of a normal person. We can confirm that this information is illegal, and then confirm that a user is an abnormal black user.

Let's take two examples:

For example, we get the user's mobile number, but we find it is an empty number in the process of dialing. Normal users will not post with an empty number mobile phone, so we think that the mobile phone number itself is illegal, so users who use illegal basic information can basically be defined as illegal.

Another example is that we get the address filled in by the user, but we find that there is no such place in the world, so we can also think that this information is problematic.

2. Search for features through content representation

In general, in the risk control scenario of UGC, in order to achieve transformation and avoid the conventional risk control strategy, the black industry usually leaves contact information in the content (picture, text, video, audio) to generate effective links with users to implement fraud, or release some bad content to attract traffic. So feature mining for content is the easiest way for students who are new to the field of risk control.

In fact, generally speaking, the characteristics of content classes are summarized as follows: whether there is a certain content in a certain data field. Common features can be classified as follows:

A. Key words: key words are the most commonly used content features, characterized by fast effectiveness and fast failure. It is generally used for rapid hemostasis to avoid the spread of adverse effects. But when using keywords, we need to consider the exemption conditions. For example, we think "fake machine" is illegal, but some users will appear "reject fake machine" in the content. At this time, if we do not consider the keyword exemption, it will lead to miscalculation.

B. List class: we will use the list when we need to intercept or exempt users from punishment before they generate behaviors. Common blacklists include user blacklist, device blacklist, gray list, white list, etc. Blacklist is often used for direct interception to avoid the possibility of secondary evils. Grey list is generally used to challenge users, such as allowing users to backfill authentication codes, do some identity authentication, etc. White list is generally used to exempt users. For example, if a user is misjudged by a certain policy, the user can be added to the white list to exempt detection. The generation of the list should not be one-time. The list should be maintained for a long time and attention should be paid to the entry and exit of the list. Once the maintenance is stopped, the accuracy of the list will be greatly reduced, leading to a sharp rise in misjudgment. For example, take blacklist as an example, when users enter the blacklist, they should pay attention to the accuracy of the logic of entering the blacklist, and try to control the accuracy as close as possible to 100%. When a user is wrongly judged, he should be removed from the blacklist to avoid repeated injuries.

C. Algorithm class: generally, when subjective judgment is needed on the content, because the cost of all manual auditing is too large, the assistance of algorithm ability is needed at this time. Common algorithms used in the field of content anti fraud include: yellow, violent terrorism, political detection, advertising recognition, two-dimensional code recognition, watermark recognition, OCR image recognition, garbage content detection, false face detection and so on. Through these algorithms, we can effectively intercept the garbage content generated by users.

When we use the characteristics of algorithm class, we should always pay attention to the situation of misjudgment, to provide badcase for algorithm updating and iteration.

D. Other: the rest of the features about content can't all be classified into one type, so let's call it other. Generally speaking, all the methods that we can find exceptions through all or part of the content of a field are in this range. Here are some common examples:

For example, we found that users with a mobile number beginning with 189 have a very high probability of problems, so "mobile number beginning with 189" is a referential feature.

For another example, we found that a user's field is empty, but this field is required in the front end. This situation is usually caused by the fact that the backend does not check the validity of the field. At this time, it is impossible for normal users to have the field empty. This situation can only occur if the black production bypasses the front end through a vulnerability. So at this time, "a field is empty" is a strong feature, which can directly define this behavior as an abnormal behavior.

Summary: Generally speaking, the features we can find directly in the content have the characteristics of quick effect and quick failure. This is because the change cost of black production for the content is very small and almost negligible. Generally, some random characters, Chinese character variants and format variants are added to the content, which can easily bypass the content class strategy. Therefore, most of the content class strategy is to serve the bottom line of the business, and the main battlefield for real fierce confrontation is generally in line Is the feature area.

3. Search for features through user behavior:

Generally, we call all actions generated by users (such as login, registration, release, etc.) behavior. The excavation of abnormal behaviors is the core of risk control, and also the relatively high part of hands-on. When confronting through behavioral characteristics, it tests the strategic analysis and the ability of operators, especially the sensitivity to data.

The discovery of behavior characteristics can also be considered from the following aspects:

A. Frequency features: for normal users, actions are generated only to achieve a certain purpose, and usually the actions to achieve the purpose will be terminated, so the actions of normal users are usually discrete and sparse. For the black production users, in order to maximize the benefits, high-frequency actions are the core of reducing costs, so the actions of black production users are often continuous and close. Based on this idea, frequency strategy plays an important role in risk control. When considering frequency characteristics, several factors are usually considered, which are time window, resource, operation and threshold.

For example, the number of IP used by the same user in 7 days is more than 15. Within 7 days, it represents the time window. Users and IP represent the resources used by the black production. The quantity is the calculation, and the threshold value is greater than 15.

The four core factors are constantly adjusted through the analysis of black labor behavior.

The time window can be: second, minute, hour, day, month, quarter, year.

Resources can be: userid, infoid, IP, UA, cookie, device, mobile number, mailbox, etc. Any entity with unique characteristics.

The operation can be: sum, average, maximum, minimum and other operation methods

The determination of threshold is to choose the optimal solution according to the performance of quasi calling rate.

B. Find features through abnormal operation behavior:

In addition to the behavior characteristics of frequency class, we will also find that many actions generated by users do not conform to the common sense in the process of confrontation. This abnormal performance will help us find effective means of confrontation. Here are some common abnormal behaviors:

Abnormal migration of geographical location: for example, the location difference between the IP home and the device is too large, and the geographic location changes too fast. The basic idea of feature selection of geographic location is that users will not change their location in a short period of time. If the change is too large or too fast, there will be some risks. It indicates that the location may be modified by cheating, using proxy IP, etc.

Abnormal front-end data collection: for example, when normal users fill in a form, they are bound to leave the behavior of mouse movement on the PC side, and the behavior of sliding on the app side, etc. But if we find that a form submitted on the PC side, the user's mouse has never moved, or the trajectory of the movement is very similar, this time will conflict with our general cognition. This kind of conflict usually finds many clues of black production.

There will be many abnormal operation behaviors. I can't list them here. I can only give some simple examples to help you expand your thinking. In fact, as long as we are good at analysis, we will find all kinds of actions and behaviors that do not conform to our conventional cognition. At this time, features will be generated.

4. Equipment features: in fact, the equipment related features can be incorporated into the three feature types mentioned above. However, in the current confrontation environment, the black production is more and more dependent on the equipment, so the equipment features become very important in the risk control confrontation, which is why I list the equipment features as a detailed introduction. Here are some core device features:

Virtual machine: in order to minimize the cost, in the first place, when attacking the app or m-terminal, black production usually does not buy real mobile devices, but attacks through the Android simulator or IOS simulator on the PC. So as long as we find that the user is using a simulation device, the user rate is probably a black production user

Machine changing software: if the virtual machine is blocked by the risk control strategy, the black industry will usually buy real mobile devices, but in order to forge the effect of multiple devices through a mobile phone, it is necessary to modify or erase some real hardware information of the mobile phone through the machine changing software. Through the continuous change of these information, people mistakenly think that it is multiple devices, so as to effectively avoid one Some frequency limit, hardware limit and other strategies.

Open more software: at present, many apps will limit the number of users bound and logged in. At this time, you can open many of the same app through open more software, so that you can maintain the login status of N users on the same device.

Group control: after the above means continue to fail, black industry practitioners began to buy a large number of relatively cheap mobile phones, combined with the above means, use real mobile devices to carry out black industry attacks. Because these mobile devices have the characteristics of aggregation, the industry calls this means group control. Therefore, the identification of group control is very important, but the specific identification method cannot be disclosed because of the need for confidentiality. You can think about the characteristics of group control equipment, which is not difficult to solve along the way.

Cloud control: if group control is the equipment purchase completed by the black industry itself, then cloud control is the unified purchase of equipment by a supplier, and then lease the equipment to the black industry groups that want to use according to the time sharing. The cloud control team is responsible for providing all kinds of attack capabilities required by the black industry, so that the utilization rate of the equipment will be greatly increased, and the cost of the black industry will be sharply reduced. With the existence of cloud control, some very small Mafia gangs can also have a strong aggressiveness.

The above list is an important part of the device features. In fact, there are many dimensions for our reference, such as whether it is a common device, whether it is a common environment, and whether it is root. Relative to the characteristics of the equipment, the overall risk control ability needs to have a certain precipitation. Whether it is the identification of group control or virtual machine, it needs a long time to implement. So if it's a start-up risk control team, it's recommended that you buy the third-party security services directly to solve the equipment related cheating.

5. Clustering characteristics: in the risk control industry, there is a saying that good people are all kinds of good, bad people are all the same bad. If normal users have behaviors, there are huge differences between individuals and individuals, and their behaviors are very scattered. In order to minimize the cost, the behavior of users in the black production will be highly consistent. This is the theoretical basis of clustering application. Generally, there are several types of clustering features:

A. Content clustering: as the name implies, content clustering is to cluster the content generated by all users through algorithms for highly similar or the same content, and find clues of black production by analyzing the cluster size and content in the cluster. Common related algorithms include text similarity algorithm and image similarity algorithm. Generally, black production can prevent the generation of clustering by adding random characters and interfering pixels into the content, so the iteration of the algorithm is necessary.

B. Behavior clustering: behavior clustering is actually corresponding to content clustering. If the content is not found well, it can be recalled by behavior clustering. Usually we draw the user's behavior into a complete behavior sequence, which records the action itself and the process of generating the action, such as the mouse's moving track, the button's click frequency, the password filling rate, etc. After the behavior sequences of all users are drawn out, a highly similar behavior sequence is found through the algorithm to determine the range of black production users.

C. Relationship clustering: through relationship clustering, we don't focus on actions or behaviors, but on user related attributes or relationships, such as which IP users use, which mobile phone number, and which other users this mobile phone number corresponds to. Various entities are related to another entity through attributes or relationships. These triples draw the relationship between entities into a relationship network. By analyzing the density of the relationship between the networks, we can find a group of users with highly aggregated relationships. If these users with highly aggregated relationships form a relatively large scale, they usually represent that these users have a very large risk.

Clustering features not only have great value in the process of real-time confrontation, but also play an important role in risk early warning and rapid batch processing. When large-scale clustering behavior is generated, we can judge whether a location is under attack by observing the number of clusters and the number of clusters in the cluster. Similarly, we can quickly consume the generated clusters after discovery, so as to achieve rapid response and processing.

6. Third party capabilities: in the field of risk control, it is unrealistic for all capabilities to be built by themselves. Many of the above features need to be invested in a large amount of costs in the actual construction process, so the cooperation in the industry is particularly important. Third party suppliers of general security services will provide data products to the outside through labels or portraits. Common are IP tags (proxy IP, black DNS, etc.), mobile phone number tags (cat pool, small number, etc.), personal credit, enterprise credit, etc. On the one hand, cooperation with a third party can make up for the lack of business data of its own side, and effectively intercept resources in other businesses that have been clearly marked as black. On the other hand, it can also keep in touch with the latest security trends in the industry, grasp new security capabilities in real time to improve its own business risk control system.

The above are the common feature selection methods in the process of risk control confrontation. These methods can only be said to provide you with some ideas when you think about it. Risk control is a war of interests, which will never end. The form of confrontation will change with the improvement of both parties' technical capabilities. However, as long as you master the effective methodology, no matter how fierce the confrontation is, Can be easily to complete the risk control work.

After saying how to choose the characteristics, let's talk about how to choose the way of punishment after the risk is clear.

2、 Choice of punishment

Ali has a nine word policy for security called "light control, heavy detection, fast response". Among them, light control means that we should try our best to avoid the harm of security to the user experience and minimize the impact on normal users. In fact, this is a keynote in the process of penalty selection.

1. Design of punishment system:

As for the design of punishment system, we need to make a definition of punishment first. Punishment is the process of restricting the due rights of users.

Therefore, if you want to know how to punish, you should make clear what kind of rights the user has. By defining the set of rights, you can define the punishment relatively.

Take 58 intra city business as an example, a user has the following rights: registration right, login right, search right, browse right, post right, browse right, promotion right, membership right, IM chat right, etc.; suppose that the above list is the complete set of 58 users' rights, then when designing the punishment system Then we can do the following design: prohibit registration, prohibit login, restrict browsing and search, restrict posting, delete posts, reduce rights, restrict promotion, prohibit member signing, Im ban and so on.

Combined with the severity of the user's mistakes, the above single or combined punishment can achieve effective risk control.

2. Application method of punishment

According to the nature of punishment, we can roughly divide all punishment into three categories, the first is called marking, the second is called challenge and the third is called punishment.

A. Tag: tag generally refers to tag a user or resource without any actual processing. Generally, it is used in the business with short risk control path. If this scenario is in the front position in the whole business link, it means that the available data will be limited, and the shorter the path, the direct processing will reduce the trial and error cost of black production, thus increasing the difficulty of wind control.

For example, in the common registration business, if we find that the registered users have problems, we can directly prohibit the users from completing the registration, which is equivalent to directly telling the black industry that the current method can not be passed, so the black industry can completely break through our risk control defense line as long as it breaks through once in the process of frequent attempts. So in this scenario, we usually don't actually have an impact on users, but mark the user's behavior. This mark will exist throughout the entire business link as a feature. We observe the subsequent performance of marked users in real time. Once there are more abnormal behaviors in the subsequent business, we can directly punish them. At this time, it's hard to understand what we identify in the process of black production, so as to improve the cost of black production.

B. Challenge: the challenge type punishment is generally applied when the strategy accuracy is insufficient, but there is a relatively large risk when it is directly ignored. As the name implies, it is to challenge users to prove that they are not robots, they are the owners of this account and so on. Challenge type punishment does not actually affect the rights of users. Once users are challenged, they can complete the normal use function. Therefore, challenge is a way to experience fine punishment. However, the ability of each challenge will directly determine the interception effect on black production users. Common forms of challenge are as follows:

1) Challenge based on Verification Code: Graphic verification code is mostly used to distinguish people from robots, and to prevent large quantities of machine posts. SMS or voice verification code is mainly used to prove that the current number is my operation.

The main forms of verification code are: character verification code, sliding verification code, click verification code, no sense verification code, SMS verification code, voice verification code, etc. Here is an introduction to the applicable logic and effectiveness of various verification codes:

Character verification code: the primary verification code, which allows the user to backfill the characters in the picture or calculate the calculation results. This kind of verification code is very difficult to crack. It has achieved a passing rate of more than 98% in the black industry, which has exceeded that of human beings. So this kind of verification code is not recommended in any scenario.

Slide verification code: this verification method originally comes from "polar verification".

This kind of verification code is verified by dragging the slider on the x-axis. At the same time, the back-end will analyze and judge the browser environment and sliding track of the front-end to intercept abnormal machine behavior, which is much better than the conventional character verification code in experience. But this kind of verification code can still be cracked by many coding platforms or by simulation of sliding trajectory.

Click to select verification code: the most famous Click to select verification code is 12306 verification code, which actually uses the deviation between human cognitive ability and machine cognitive ability. For example, when seeing an animal, human can easily distinguish whether the animal is a kitten or a puppy, which is extremely difficult for the machine. This kind of verification code machine is relatively difficult to crack directly, but it can still be bypassed at low cost through the coding platform.

No sense verification code: no sense verification code is actually to put a whole set of risk control strategies on a button. The form of extreme verification is as follows:

When clicking the button for verification, various data collected by the front-end is transmitted to the back-end risk control engine. As long as the detection speed is fast enough, users can complete the verification very quickly. This kind of verification seems to be an excellent experience, but in fact, it only represents the detection ability of the back end, so there is no substantive difference between having this button or not. The difficulty of decoding the verification code is directly related to the risk control ability of the back end, and it can not directly give the conclusion that it is easy or hard to crack.

SMS verification code: this verification code can be said to be our most common form of verification code. Its advantages are security and experience. Through the backfilling of verification code, it can be roughly determined that the operation is carried out by myself. However, when SMS verification is needed for the first time in registration and other links, it can be done through the mobile phone number of maochi. This kind of mobile phone card is different from the ordinary mobile phone card. It can not connect the phone but can receive SMS, so it will be used in mass production to complete the registration.

Voice verification code: in order to solve the problem that some mobile phones in catpool receive the verification code, the voice verification code came into being. This verification code is issued through the phone, and the user remembers the number and completes the backfilling. The experience of voice verification code is relatively poor, which is not recommended generally

2) The challenge of identity based compliance: in many cases, we find that users who cheat have not done any real name or can prove their identity. Generally at this time, we will allow some users with high risk but unable to judge as black to conduct real name authentication and enterprise authentication. This demand may also come from the hard rules of the business. Generally, the ways of personal real name are: real name two factor authentication, hand-held ID card authentication, bank card authentication, mobile phone number three factor authentication, face authentication, etc. The way of enterprise certification is usually business license certification and public account payment certification. We suggest that the business side should pay attention to the relevant requirements of certification when carrying out new business. On the one hand, certification can greatly increase the cost of black production as a threshold, on the other hand, certification can also be used as a unique feature for us to conduct risk control identification.

3) Challenges based on private information: these challenges are mostly used for account security or password retrieval. When we judge that the current user may not be the user, we can challenge in this way. The common question is "what's my father's name?" "Which is my most common bank card number?" This challenge method challenges the current operator through the information only known by the user himself. If the current operator cannot pass, it means that the current operator is not the user himself.

This kind of challenge experience is quite excellent, and effectively prevents risks through information asymmetry, so when users already have some key information in the business, they can consider to challenge in this way.

The above is a common form of challenge. Different forms of challenge are applicable to different scenarios and conditions, so the understanding of ability is also the premise of risk control.

C. Punishment: when we can clearly define the user's behavior as malicious behavior, the actions taken at this time usually belong to punishment.

We should consider the malicious behavior of users from two dimensions. The first is that users are indeed normal users, only to obtain some short-term benefits violates the platform rules. The other is that the user is a batch of malicious black production users, which is to grab traffic fraud and cash.

When dealing with the first kind of users, our punishment attitude should be that the guidance is greater than the punishment. At this time, we can make the punishment flexible. For example, the score system of Taobao merchants' violation is a relatively successful example. When a user first violates the rules, he may be able to reduce the search right when the score reaches a certain level. When he makes a second offence, he may be temporarily closed and finally he may be permanently cleared. Through relatively flexible punishment, we not only punish the bad behavior itself, but also give users a space and opportunity to improve. Through active operation guidance, we can transform these users into high-quality users.

When we deal with the clear black production users, our punishment attitude should be clear and clear. Once the accuracy of the judgment is ensured, we not only need to restrict the rights of the account, but also all the effective resources related to the account, such as mobile phone number, email address and ID card number, should be effectively consumed to ensure that the same resources will not be wronged twice in the business. The faster the punishment takes effect, the more thorough the resource consumption is, the less likely the risk will appear.

Therefore, as a whole, punishment should not only make normal users as unaffected as possible, but also efficiently consume black production resources, and give active guidance to users in the gray area, so that such users can consciously change to promote ecological development.

3、 Conclusion:

The above are some experiences about feature selection and punishment selection in risk control confrontation. Because of the characteristics of risk control industry, this paper can't give you all the possibilities synchronously, but most of the main thinking methods in the actual battle of risk control are reflected, hoping to help some new students in the risk control industry, and also hope that other senior students in the security industry can Enough to guide and supplement the deficiencies.