# several graph partition algorithms for anomaly detection

In the field of security, "graph analysis" is widely used in various scenarios such as account transaction exception, different event correlation, etc. Compared with other machine learning algorithm classes, its unique advantage is that the analysis method is in line with people's thinking mode, and the analysis process can be visualized intuitively.

[this is the original manuscript of Hansi. If you need to reprint it, please indicate the source! ]

For example, the following figure is a comprehensive analysis of several types of security events in a customer enterprise of Hansi: login, use of USB disk, detection of virus, machine IP, and user use of machine.

The "edge" in the figure represents the event occurred; the size of the point (machine, user, IP, virus, USB disk) represents the number of events. On one picture, we can quickly locate the virus with the most outbreaks, which users use the same machine illegally, and which machines have used the same USB disk.

The figure below is another example. Hans helps bank customers to analyze transaction exceptions: the size of the point is directly proportional to the degree of exit, and the color changes with the degree of entry in blue ⇒ white ⇒ red. In financial terms, a volcano with too much going out is called a volcano, and a black hole with too much going in is called a black hole. Such cases are often related to fraud and money laundering.

However, once a graph becomes larger, the analysis process will be slower, and the number of edges to be analyzed is often much larger than N, even if the worst-case will not reach n * (n-1) / 2 equal to the number of nodes in a fully connected digraph. Because of the limitation of screen size and readability, it is not suitable to put thousands of nodes and corresponding edges on a single graph.

In this case, we adopt the strategy of divide and rule: using the community characteristics of the graph in practical experience, we divide the graph into several strong connected regions, and analyze and visualize each region.

There are three additional features of a good graph partitioning algorithm in practical application:

1. High speed, it is better to parallelize or speed up with GPU.

2. It can deal with the characteristics of small world network (that is, the degree of nodes is fat tail distribution).

3. Not sensitive to parameters.

Many algorithms can't satisfy 2 and 3. Most of the algorithms in textbooks divide the graphs equally, and assume that we know how many kinds of graphs to divide.

According to the above, Hans has helped customers solve the problem of abnormal behavior detection by using "graph computing" in practical application. This paper will focus on three kinds of graph partition algorithms which are widely used, high efficiency and can be applied to anomaly detection.

Spectral partition

Spectral partition algorithm: it is the first algorithm to solve the problem of graph partition, and its idea comes from spectral partition theory. The spectrum of a matrix is its eigenvalue and eigenvector. It is a NP hard problem to find the optimal solution of graph partition criterion. A good solution is to consider the continuous relaxation form of the problem, and transform the original problem into the spectral decomposition of Laplacian matrix, so this kind of method is called spectral partition.

Assuming that each data sample is regarded as the vertex v in the graph, and the edge e between the vertices is weighted according to the similarity between the samples, a similarity based undirected weighted graph G = (V, e) can be obtained. The similarity matrix is usually represented by W or a, sometimes referred to as affinity matrix, which is often obtained by calculating Gaussian kernel.

The degree of the corresponding point is obtained by adding the elements of each row of the similarity matrix. The diagonal matrix composed of all the degree values as the diagonal elements is called the degree matrix, which is usually recorded as D. After defining the similarity matrix W and degree matrix D, we can get the following Laplacian matrix:

L=D - W

According to different criterion functions and spectral mapping methods, spectral partition algorithm has developed many different specific implementation methods, which can be summarized into the following three main steps:

For a given graph G = (V, e), the Laplacian matrix L of the graph is calculated;

The L matrix is decomposed into eigenvalues, and the eigenvectors corresponding to the first k eigenvalues are selected to construct the eigenvector matrix Q;

K-means algorithm or other classical clustering algorithms are used to partition matrix Q. each row represents a sample point, that is, the category of the original graph's vertex

The above steps are just a framework of spectrum division. In the specific implementation, there are different division criteria, such as minimum cut, ratio cut, normalized cut, etc.

In the spectrum division algorithm, firstly, Laplacian matrix is introduced, Laplacian Eigenmap is used to reduce dimensions, and then clustering algorithm is used to divide these low-dimensional data, so that the amount of computation is greatly reduced. The following figure is the effect diagram realized by spectrum division algorithm:

However, there are some shortcomings in the spectral partition algorithm

1) The construction of eigenvector matrix Q is undoubtedly the most time-consuming in the algorithm. In the high-dimensional case, it is very difficult to solve the eigenvalue without saying that solving eigenvector is solving eigenvalue;

2) It is necessary to define the recursion termination condition with prior knowledge, that is, it does not have the ability to identify the total number of categories of the graph;

3) The complex network graph in the real world often contains many classes, but the recursive bisection strategy can't guarantee the optimal partition.

Multilayer partition algorithm

The second type of graph partitioning algorithm is called * multilevel partitioning (1995, karypis).

It is famous for its high efficiency and fast operation time. It is 10% - 50% faster than spectrum division algorithm. It can calculate tens of millions of graphs in seconds. The main implementation steps are generally divided into three phases: coarsening phase, initial partitioning phase and uncoarsing phase.

In short, as shown in the figure below, the algorithm is to compress the original graph one layer by one in the coarsening stage to become "small", to get a graph with a small enough number of vertices, and then restore the graph with a small enough number to become "large" in the initial dividing stage and the thinning stage one layer by one, until it is still the original graph, to complete the division.

In the coarsening stage, the main purpose is to reduce the complexity of the original graph and build a multi-level hierarchy of the graph. It compresses and merges the points and edges of the original graph, constructs a small hierarchical graph sequence, and finally compresses the original graph into a graph with a small enough number of vertices. This idea of compression (see the figure below) can be formally defined as matching. Graph matching refers to the set of edges, in which any two edges have no common vertices. Among all matching of a graph, the one with the most matching sides is called the maximum matching of the graph

In the whole coarsening stage, all points and weights of the original graph will be accumulated, and the final response is in the minimum scale graph. A simple division of the minimum scale graph is called the initial division stage, which is very fast and time-consuming due to the small number of nodes. It is also not the core part of multi-layer algorithm, and its algorithm is similar to the algorithm in the next refinement stage

The refinement stage, also known as the restoration and optimization stage of the graph, restores the graph to the original graph layer by layer according to the coarsening level. In the restoration process, some fine algorithms are used to optimize layer by layer until the division of the original graph is obtained

The common partition algorithms include spectral bisection (sb), graph growing algorithm (GGP), green refinement (GR), kernigan Lin refinement (KLR), among which kernigan Lin partition algorithm is more famous.

*Kernighan Lin partition algorithm *, referred to as KL algorithm, was proposed by Kernighan and Lin in 1970. It is a local search optimization algorithm. The objective function of optimization is to connect different classes of edge weights to minimize the sum.

For a simple example, as shown in the following figure, purple points belong to the same category, black points belong to the same category. KL algorithm is the process of converting figure (a) to figure (b).

How to exchange the points in the purple category and the points in the black category is determined by calculating the difference between the loss weights of different categories, that is, the difference between the internal and external weights before the exchange (as shown in the figure (a) below) minus the value of the internal and external weights after the exchange. If and only if the value is positive, the exchange is rejected. Repeat until the value is negative.

KL algorithm is easy to understand, but the solution is usually local optimal. The following figure is an example of graph division using multi-level division algorithm:

The biggest limitation of multi-level partition algorithm is that it needs prior knowledge to produce a better initial class.

MCL

Finally, Markov cluster algorithm (2000, Stijn van Dongen), or MCL algorithm for short, is a fast and scalable unsupervised graph clustering algorithm, which can also be used for graph partition sometimes. Its idea is very simple, mainly based on random walk and Markov chain. Let's briefly talk about these two concepts

Random walk means that if we start from a certain point in the graph, we will probably wander in a certain subgraph instead of wandering back and forth between subgraphs. And the calculation of random walk is realized by Markov chain. Markov chain refers to a random sequence, which satisfies "no aftereffect", that is to say The future state only depends on the current state, and has nothing to do with the past state.

The key idea of MCL algorithm is that "random walkers will not leave a dense class easily after arriving at it". The former is a random walk process, and the latter is based on the "no aftereffect" of Markov chain. The process of random walk in MCL algorithm is actually a process of constantly modifying transition probability matrix, which repeatedly performs two operations: expansion and expansion.

The extension is the limit distribution of the transfer matrix of Markov chain mentioned earlier. In this step, the transfer probability matrix is self multiplied until it no longer changes. The purpose is to connect different areas of the graph. Expansion is to power every element and normalize every column. The purpose is to make the connection of strong neighbor stronger and that of weak neighbor weaker, that is to say, the probability of high probability in transfer matrix is higher and the probability of low is smaller. These two operations are repeated until the probability transition matrix converges, and the final matrix is obtained, and the result can be obtained according to the final matrix.

MCL algorithm is used to test both the weighted graph and the non weighted graph, and the number of subgraphs divided does not need to be set in advance, which is the biggest advantage of the algorithm; the divided subgraphs are non-uniform, which is used to test the data with long tail distribution. The following figure is the result of graph division with MCL:

However, MCL algorithm is not suitable for the case of large diameter graph. (diameter refers to the maximum distance between two points, and distance refers to the minimum length of all roads between two points.)

About hansight: driven by the collection, processing and analysis technology of big data, hansight is committed to helping enterprises detect the internal and external security threats that have occurred or are about to occur in real time and automatically, improving the efficiency of security event processing, and protecting the security of enterprise information assets to the maximum extent.

Official website: http://www.hansight.com