By Manning @ Skylab
0x00 introduction
The combination of deep learning and network security is a major trend of network security in the future. Today, we use the mainstream algorithm based on deep learning to detect the SQL injection behavior, to throw out the influence of three feature vectors on the detection effect of deep learning model.
0x01 introduction to deep learning
Deep learning is a branch of machine learning, which attempts to use multiple processing layers including complex structures or multiple nonlinear transformations to abstract data at a high level. Deep learning is a kind of representation learning method based on data in machine learning. The advantage of deep learning is to replace manual feature acquisition with unsupervised or semi supervised feature learning and hierarchical feature extraction. In our experiment, we used the python deep learning library: tensorflow. The model used is:
Multilayer perceptron
MLP (MLP) is a forward structure artificial neural network, which maps a set of input vectors to a set of output vectors. MLP can be regarded as a directed graph, which consists of multiple node layers. Each layer is fully connected to the next layer. In addition to the input nodes, each node is a neuron (or processing unit) with a nonlinear activation function. Detailed introduction
Convolutional neural network
Convolutional neural network (CNN) is a kind of feed-forward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area, which has excellent performance for large-scale image processing. Convolution neural network is composed of one or more convolution layers and the top fully connected layer (corresponding to the classical neural network), which also includes correlation weight and pooling layer. This structure enables the convolution neural network to utilize the two-dimensional structure of input data. Compared with other deep learning structures, convolutional neural network can give better results in image and speech recognition. The first mock exam can also be trained using the backpropagation algorithm. Compared with other depth and feedforward neural networks, convolutional neural networks need less parameters to be estimated, which makes it an attractive depth learning structure. Detailed introduction
Cyclic neural network
Recurrent neural network (RNN) is the general term of two kinds of artificial neural networks. One is time recurrent neural network, the other is structural recurrent neural network. The connections among the neurons of the time recurrent neural network form a digraph, while the structure recurrent neural network uses the similar neural network structure to recursively construct a more complex depth network. RNN generally refers to time recurrent neural network. Because the recurrent neural network can't deal with the problem of exploding or disappearing with recursion, it's difficult to capture the long-term time correlation, which can be solved by combining different LSTM. Detailed introduction
The network structure used in the experiment
Multilayer perceptron
The structure of neural network is as follows:
Input layer
Hidden layer L1
Hidden layer L2
Hidden layer L3
Output layer
Each hidden layer uses 128 neurons, and the activation function is relu.
The figure above shows the structure of tensorboard output.
The network structure used in the experiment
Convolutional neural network
The structure of neural network is as follows:
Input layer
Convolution layer
Pooling layer
Convolution layer
Pooling layer
Fully connected layer
Output layer
Cyclic neural network
The structure of neural network is as follows:
Input layer
Forward layer
Back layer
Output layer
PS: training set and test set are from 360 enterprise security - Tianyan big data platform, and the purity of the model is good.
0x02 introduction to eigenvectors
Three methods are used in our eigenvector transformation, which is also a good choice to deal with strings at present.
Feature vector based on word2vec
Feature vector based on word bag
Feature vector based on fofe
Feature vector based on word2vec
Word2vec can transform vocabulary into a multi-dimensional feature vector according to the model. When constructing the features of sentences, we use the way of violent vector addition.
In the experiment of natural language, word2vec can well express the relationship between words and expressions. For details, please refer to the exploration of word similarity in Wikipedia corpus
Feature vector based on word bag
On the attack platform of SkyEye lab, we select 250 words that are most frequently used in SQL injection to build a word bag model.
A detailed introduction of the word bag model
Feature vector based on fofe
Fofe is a simple and subtle rule base coding method. Generally speaking, on the basis of one hot, we use the size of numerical value to indicate a coding form of word location information. Based on the above word bag model, we add fofe algorithm.
The specific thesis of fofe algorithm is from Mr. Jiang Hui.
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models
0x03 analysis of experimental results
We have 50000 training data and 500000 test data.
All three vector results show very good accuracy.
It can be seen from the above figure that there is no significant difference between the performance of fofe based feature vector and word bag feature vector, and the integration of location elements does not bring significant improvement to the detection level of fofe feature vector. The vector of word2vec doesn't perform well in the real set. The reason is that we use the rough method of vector addition to build sentences, which can't reflect the attributes of word2vec to sentences.
As can be seen from the above figure, the judging speed of the eigenvector based on word2vec is significantly slower than the other two methods. The speed based on word bag is faster than that based on fofe. The essential reason is that the introduction of fofe algorithm brings a certain amount of computation, which meets the expectation of speed reduction.
0x04 summary
In my opinion, this time we use three ways to establish vectors and three kinds of neural network structures for cross experiments to explore the relationship between the vector forms of the three ways and the neural network structure, which is a good example. The most surprising thing in this experiment is that the combination of CNN and word2vec performs best in the real set. The feature vector based on fofe has the concept of order, but it can not bring better detection results based on the word bag model.
In the aspect of security detection, deep neural network can lead us into the ability level of detecting "unknown unknown", which is also the direction that we have to work hard on. Step by step, we will continue in this direction.
0x05 reference
https://zh.wikipedia.org/wiki/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0
https://yq.aliyun.com/articles/118686?spm=5176.100239.0.0.g2XnLx
http://www.52nlp.cn/tag/word2vec
http://blog.csdn.net/u010213393/article/details/40987945