IMCAFS

Home

the influence of three kinds of eigenvectors on deep learning attack detection

Posted by santillano at 2020-03-25
all

By Manning @ Skylab

0x00 introduction

The combination of deep learning and network security is a major trend of network security in the future. Today, we use the mainstream algorithm based on deep learning to detect the SQL injection behavior, to throw out the influence of three feature vectors on the detection effect of deep learning model.

0x01 introduction to deep learning

Deep learning is a branch of machine learning, which attempts to use multiple processing layers including complex structures or multiple nonlinear transformations to abstract data at a high level. Deep learning is a kind of representation learning method based on data in machine learning. The advantage of deep learning is to replace manual feature acquisition with unsupervised or semi supervised feature learning and hierarchical feature extraction. In our experiment, we used the python deep learning library: tensorflow. The model used is:

Multilayer perceptron

MLP (MLP) is a forward structure artificial neural network, which maps a set of input vectors to a set of output vectors. MLP can be regarded as a directed graph, which consists of multiple node layers. Each layer is fully connected to the next layer. In addition to the input nodes, each node is a neuron (or processing unit) with a nonlinear activation function. Detailed introduction

Convolutional neural network

Convolutional neural network (CNN) is a kind of feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area, which has excellent performance for large-scale image processing. Convolution neural network is composed of one or more convolution layers and the top fully connected layer (corresponding to the classical neural network), which also includes correlation weight and pooling layer. This structure enables the convolutional neural network to utilize the two-dimensional structure of input data. Compared with other deep learning structures, convolutional neural network can give better results in image and speech recognition. The first mock exam can also be trained using the backpropagation algorithm. Compared with other depth and feedforward neural networks, convolutional neural networks need less parameters to be estimated, which makes it an attractive depth learning structure. Detailed introduction

Cyclic neural network

Recurrent neural network (RNN) is the general term of two kinds of artificial neural networks. One is time recurrent neural network, the other is structural recurrent neural network. The connections among the neurons of the time recurrent neural network form a digraph, while the structure recurrent neural network uses the similar neural network structure to recursively construct a more complex depth network. RNN generally refers to time recurrent neural network. Because the recurrent neural network can't deal with the problem of exploding or disappearing with recursion, it's difficult to capture the long-term time correlation, which can be solved by combining different LSTM. Detailed introduction

The network structure used in the experiment

Multilayer perceptron

The structure of neural network is as follows:

Input layer

Hidden layer L1

Hidden layer L2

Hidden layer L3

Output layer

Each hidden layer uses 128 neurons, and the activation function is relu.  

The figure above shows the structure of tensorboard output.

The network structure used in the experiment

Convolutional neural network

The structure of neural network is as follows:

Input layer

Convolution layer

Pooling layer

Convolution layer

Pooling layer

Fully connected layer

Output layer

Cyclic neural network

The structure of neural network is as follows:

Input layer

Forward layer

Back layer

Output layer

PS: training set and test set are from 360 enterprise security - Tianyan big data platform, and the purity of the model is good.

0x02 introduction to eigenvectors

Three methods are used in our eigenvector transformation, which is also a good choice to deal with strings at present.

Feature vector based on word2vec

Feature vector based on word bag

Feature vector based on fofe

Feature vector based on word2vec

Word2vec can transform vocabulary into a multi-dimensional feature vector according to the model. When constructing the features of sentences, we use the way of violent vector addition.

In the experiment of natural language, word2vec can well express the relationship between words and expressions. For details, please refer to the exploration of word similarity in Wikipedia corpus

Feature vector based on word bag

On the attack platform of SkyEye lab, we select 250 words that are most frequently used in SQL injection to build a word bag model.

A detailed introduction of the word bag model

Feature vector based on fofe

Fofe is a simple and subtle rule base coding method. Generally speaking, on the basis of one hot, we use the size of numerical value to indicate a coding form of word location information. Based on the above word bag model, we add fofe algorithm.

The specific thesis of fofe algorithm is from Mr. Jiang Hui.  

The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models

0x03 analysis of experimental results

We have 50000 training data and 500000 test data.

All three vector results show very good accuracy.

It can be seen from the above figure that there is no obvious difference between the performance of the feature vector based on FOFE and the feature vector based on word bag. The integration of position elements does not bring obvious improvement of the detection level of the FOFE feature vector. The vector of word2vec doesn't perform well in the real set. The reason is that we use the rough method of vector addition to build sentences, which can't reflect the attributes of word2vec to sentences.

As can be seen from the above figure, the judging speed of the eigenvector based on word2vec is significantly slower than the other two methods. The speed based on word bag is faster than that based on fofe. The essential reason is that the introduction of fofe algorithm brings a certain amount of computation, which meets the expectation of speed reduction.

0x04 summary

In my opinion, this time we use three ways to establish vectors and three kinds of neural network structures for cross experiments to explore the relationship between the vector forms of the three ways and the neural network structure, which is a good example. The most surprising thing in this experiment is that the combination of CNN and word2vec performs best in the real set. The feature vector based on fofe has the concept of order, but it can not bring better detection results based on the word bag model.

In the aspect of security detection, deep neural network can lead us into the ability level of detecting "unknown unknown", which is also the direction that we have to work hard on. Step by step, we will continue in this direction.

0x05 reference

https://zh.wikipedia.org/wiki/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0

https://yq.aliyun.com/articles/118686?spm=5176.100239.0.0.g2XnLx

http://www.52nlp.cn/tag/word2vec

http://blog.csdn.net/u010213393/article/details/40987945