analysis of internal threat detection system based on user (role) sidewriting

Posted by barello at 2020-03-14

*The original author of this article: Mu Qianzhi, this article belongs to the freebuf original award program, and can't be reproduced without permission

Talk about "spy shadow"

Recent Jason Bourne came back again. He fought with Dewey, the top CIA executive who knew everything, and asset, the number one killer. In order to avoid the spoiler, interested children's shoes can watch movies on their own:), we are talking about the origin of the whole story today: Nicky, the former CIA agent, used the discarded computer to invade the CIA network. In the post snow era, internal threat prevention has long been the focus of attention at home and abroad. Even so, a security fortress like the CIA cannot avoid Nicky, an insider, using his knowledge of internal security mechanism to invade the system and steal black operation data.

First of all, with regard to Nicky's invasion of an old computer, we can make a preliminary analysis from the perspective of security:

Movies always come from reality, and also reflect the concern and importance of American internal threats in the post Snowden era. Nicky's insiders (resigned employees) are potential threats to any organization. However, in reality, the internal threat detection system is still covered with mystery, not only there is no unified design standard, but also there is no universally recognized commercial version. Today, we introduce an internal threat detection system architecture, hoping to help you understand this field.

Requirements of internal threat detection system in Enterprises

The premise of deploying internal threat detection system in an enterprise is to implement internal security audit. The computer operation and network use behavior of internal employees should be recorded in detail. No matter what kind of commercial audit software is used, the monitoring of internal human behavior should at least include the following categories:

In the above audit data requirements, login events, file events, network events and some equipment events are generally reflected in the existing commercial audit software, and the only one to be considered is mail data. Due to the particularity of email data, we must weigh the weight of user privacy and internal security. Generally speaking, the email header and nearby information can be partially audited, while the email content is generally not required. However, email communication is very important for side writing and monitoring user behavior when the actual user is not aware of it.

On the basis of internal security audit, we can establish an internal threat detection system, which should meet several basic needs:

Next, we propose a detection system framework that can meet the above requirements to some extent, and then analyze it step by step.

Three layer detection framework

The current internal threat detection idea is to build a behavior model through the user's computer and network behavior, and then use the anomaly detection algorithm to detect user anomalies. If anomaly detection is directly used as attack classifier for user data, the effect is not ideal, so we need to build a three-layer detection framework, that is, figure 3:

The initial data input in the figure above is the audit log, and then the required user (role) behavior structure tree is built through the data analysis engine, and the first level detection is realized through the tree structure comparison; then the features are extracted from the user behavior structure tree, and the second level detection is realized by using the exception detection algorithm; finally, the feature matrix is offset calculated to realize the third level detection. Next, we start with data input and analyze the corresponding modules in turn.

data input

The original data input is the internal audit log, which records the user's behavior according to the user ID, timestamp, device Mac, behavior and other elements. The original data record is as follows:

After the original data is input, it must be preprocessed, that is, the key elements of building user / role behavior tree are extracted by data analysis engine. We must resolve the key elements such as user ID, device ID, activity name, activity attribute and time stamp from the original audit record. Activity names such as login, mail, and read file are used to identify behavior categories, while activity attributes can supplement the description of activities, such as the file name, which is the attribute of read file activity.

The first step of data analysis is content analysis, which can be partially deployed according to the actual deployment requirements. Content parsing mainly involves two types: mail content parsing and website content parsing. The method of website content analysis is similar to naive Bayes detection, which mainly changes the plain text content of HTTP page into the word bag vector through the form of word bag; the mail content is more related to the user's personality and psychological state, that is, the LIWC data of mail content representation is obtained by the analysis of mail content and LIWC database. LIWC is linguistic inquiry and word count. Through LIWC, we can describe the characteristics of users from a linguistic perspective.

After getting the word bag feature and LIWC feature, the key elements such as user, device, activity and activity attribute are extracted with other data, and the user / role behavior tree is constructed.

First floor

The traditional detection methods focus on describing the behavior characteristics of users themselves, while the current research methods supplement the behavior comparison of the user's working group or professional role, in order to reduce the impact of the change of user's behavior caused by the change of working environment on anomaly detection. After the above data analysis, we can draw the behavior structure tree of user / role, as shown in Figure 4:

The overall tree structure of users / roles is shown in the figure above. Each user node, as the root node, extends three branches, namely, daily (current data), normal (existing normal data) and attack (attack data). After that, each branch continues to branch according to the device, extending the activity and activity attributes in turn.

At the beginning, the normal and attack branches are both empty. The system reads the user's one-day data record and generates the daily branch of the day. If the normal branch is empty, it will be added to it and repeated until the end of the training period.

Once the construction of user / role tree is completed, when the user's data arrives in the new day, on the one hand, it can match with the existing security policy, such as "log in to the computer to copy files during off work time", or match the branches in the existing attack tree, so as to achieve more real-time user / role behavior detection. The construction of the role tree is similar to that of the user, except that the device node of the role tree is the collection of device nodes used by the role user.

The second floor

The performance of existing internal attack behavior can be attributed to two aspects of user behavior: one is to use a new device that has not been used before and open a new file that has not been seen before; the other is to use the device or open the file before, but the frequency of use changes greatly. Therefore, from the perspective of "new" and "degree", we can extract features that can reflect these aspects, such as:

After extracting features from the data, you need to specify exception indicators to compare the degree of user behavior. The commonly used exception indicators are:

Each of the above 13 exception indicators is a subset of the original features, that is, it contains multiple original features. For example, the file exception indicator will contain many specific features such as opening exception, writing exception, creating exception, etc. In general, the user features will be clustered together, and the abnormal behavior will be far away from the cluster. We can calculate the distance between the new behavior of users on each exception indicator, and then assign a weight to each indicator, and use the method of weighted sum of exception measurement to determine the degree of the new behavior of users.

The third level

If the training period is M-1 days in total and the user behavior characteristics are n columns in total, then the new day's user behavior records can be calculated to obtain a m * n characteristic matrix:

Our next question is, how to calculate the offset of the last row from the other M-1 rows?

There are many calculation methods. One method is to calculate the Euclidean distance between the last row and each row vector of the previous M-1 row in turn, and then take the maximum distance as the offset value; or you can calculate the Mahalanobis distance between the last row and the previous multiple rows, or you can directly calculate the covariance matrix. It doesn't matter which method we use, it's important to compare the last line with the previous M-1 line. The calculated offset value can be used as the judgment value of user behavior, and judge whether the user behavior is normal according to the size of the determined threshold.

Mahalanobis distance is a statistical distance, which is different from European distance. Please refer to Url = xq5jdud-rskifuotlhgflm3w6me50qgo9ocp0vkwz8oa9bchgsa1-5lg-elnreaf4qhvugs0x0lalhm4g for interest_

Still that sentence, the method is not important, the key is the goal:)

Instant feedback

The first layer relies on the tree structure of the user for branch comparison, and needs to determine the threshold value of mismatch; the second layer depends on the distance after the user projects on multiple anomaly indicators, so it also needs to specify the threshold value; the third layer relies on the offset calculation of the user's eigenvector matrix, and may need to develop the threshold value of distance. Therefore, associated with three-level detection, the abnormal user behavior will be analyzed manually, and the detection threshold will be fed back after the result is determined, so that the sensitivity of the whole detection system can be adjusted flexibly.

Operation test

The system has just been deployed and tested with cert-cmu data on a small scale. The focus is to draw the abnormal degree of user behavior against the centralized abnormal indicator, in which the abscissa is the abnormal indicator, and the graph represents the abnormal degree of four users on different indicators. From the graph, you can see the abnormal degree of user, so as to help the security analyst to further determine.


With the development of information technology, the potential harm of internal threat becomes more and more serious, so the actual internal threat detection system has become an urgent problem. Today, we introduced a three-tier internal threat detection system framework based on user / role behavior. The reason why there are three levels of detection is that we can get some complementation in three dimensions: real-time detection, multi index anomaly measurement and feature matrix offset analysis. The traditional anomaly detection focuses more on feature matrix analysis, but ignores the real-time detection and multi index anomaly analysis. Multi index anomaly detection is an effective method to achieve multi-level internal threat detection, so the three-layer detection system to some extent makes up for the above shortcomings. In addition, because the threshold system can be fed back by analysts, the system has flexibility and can be updated in real time.

The internal threat detection system needs to be continuously optimized in practice, so it is particularly important to learn and train according to the audit records of enterprise employees. The three-layer detection system framework introduced today is a feasible implementation framework, hoping to be helpful for interested children's shoes.


1 J.R.C etc, Understanding insider threat: A framework for characterising attacks, 2011

2 Ph.A. Legg, etc, Automated Insider Threat Detection System, 2015

3 P.A.Legg, etc, Caught in the act of an insider attack: Detection and assessment of insider threat, 2015

*The original author of this article: Mu Qianzhi, this article belongs to the freebuf original award program, and can't be reproduced without permission