principle analysis of counter sample

Posted by tzul at 2020-03-11

In recent years, with the accumulation of massive data, the improvement of computing power, and the continuous innovation and evolution of machine learning methods and systems, artificial intelligence technology has made a significant breakthrough, and has been successfully applied in image processing, natural language processing, speech recognition and other fields. In image classification, speech recognition and other pattern recognition tasks, the accuracy of machine learning is even higher than that of human beings.

Artificial intelligence technology has great potential to change the fate of human beings, but there are also huge security risks. The root cause of this security risk is that at the beginning of machine learning algorithm design, the related security threats are generally not considered, which makes the judgment results of machine learning algorithm easy to be affected by malicious attackers, resulting in the judgment of AI system inaccurate. In 2013, Szegedy and others first found a very interesting "counter intuitive" phenomenon in the field of image classification: attackers can disturb the input samples by constructing a slight disturbance, which can make the image recognition system based on the depth neural network (DNN) output arbitrary error results that the attacker wants. The researchers call such aggressive input samples countermeasures. Then more and more researches found that in addition to DNN model, counter samples can also successfully attack different machine learning models such as reinforcement learning model, cyclic neural network (RNN) model, and different deep learning application systems such as speech recognition, image recognition, text processing, malware detection, etc.

In terms of mathematical principle, the anti attack utilizes the inherent defect of the artificial intelligence algorithm model, that is, what the artificial intelligence algorithm learns is only the statistical characteristics of data or the correlation between data, but not the characteristics reflecting the nature of data or the causal relationship between data, and does not realize the true sense of "intelligence". In this paper, we take the full connected neural network as an example to introduce the nature of the effect of Countermeasures on the artificial intelligence model.

Neural network is the most widely used model in artificial intelligence system, and it is a typical supervised learning model. Although the models such as generative adversarial networks (GAN) are called unsupervised algorithms, their essence is to combine neural networks, and each neural network in the model is still a supervised learning algorithm. For the most basic neural network, its training process is shown in Figure 1.

Figure 1 neural network training process

The neural network model can be regarded as a mapping from data set to label set, which is represented by y = f (x), where x is the input data of neural network, y is the output data, and YX is the corresponding label of data X. In the training process, for the input data x, compare the output y and label YX of the neural network, and update the parameters in the neural network model y = f (x) according to the difference between them, that is, the values of weight and bias. The trained model can be used for classification. The effect of countermeasure samples on the model y = f (x) is shown in Figure 2.

Figure 2. Influence of countermeasure samples on neural network model

The output of model y = f (x) can be greatly changed by adding a disturbance to the input data X (Δ X1 and Δ x2 in Figure 2). Generally, the disturbances ∧ x1 ∧ and ∧ x2 ∧ are very small. For example, in the application of image recognition, only changing a few pixels in the input image, or even only modifying the value of one pixel, may make the classifier make a wrong judgment. For a trained model y = f (x), not all input data X can be disturbed to make its output change greatly. At the same time, for an input data x, it is often necessary to add a specific type of disturbance to make the model output change. That is to say, the generation of countermeasure samples needs certain conditions. So what is the relationship between adversary samples and models? Next, through specific cases to illustrate.

An ideal biclassification problem

For the sake of intuition, the input data of neural network model adopts two-dimensional vector to facilitate drawing. The neural network model is used to calculate the two-dimensional points. For such problems, the ideal state is shown in Figure 3.

Figure 3 ideal biclassification problem

Figure 3 shows a linear binary classification problem. The difference between the two categories is large, so it is easy to find the classification boundary of the two categories. For practical problems, the distribution of data to be classified is often unknown. At the same time, due to the complexity of practical problems, noise interference and other factors, the distinction between different categories of data in the data set to be classified is not obvious. Next, we will discuss the two classification problems of several typical data sets.

2 linear binary classification problem

Consider the data set shown in Figure 4. Both categories in the dataset are Gaussian. Due to features extraction, data noise and other reasons, compared with the data set in Figure 3, the distance between the two types of data in this data set is relatively close, and some data cross each other, that is to say, the discrimination between the two types of data is not obvious. Neural network model is used to classify the dataset, and the contour map of the model is shown in Figure 5.

Figure 4 linear classification data set

Figure 5 contour map of linear classification

In Figure 5, the lines represent the contours of the neural network model y = f (x). According to the density of contour lines, two-dimensional plane can be divided into unstable region and stable region.

Unstable area: area with dense contour. In the unstable region, the absolute value of the gradient y = f (x) is larger, that is, the function value y changes rapidly with X, and the small change of X will have a great impact on the value of Y.

Stable area: the area with sparse contour. In the stable region, the function value y changes slowly with X, and the small change of X will not have a great impact on the value of Y.

As shown in Figure 5, if the input data falls in the unstable region of the neural network model, then the model is easy to be cheated by the counter sample in this input data. If the input data falls in the stable region of the neural network model, then the model is not easy to be cheated by the counter samples. This explains that in the actual neural network model, such as the neural network of image recognition, some input images can make the model classification error after minor changes, while other images can still make the model output correct classification results even after major changes.

In addition, according to the definition of gradient, the gradient vector and contour are orthogonal. Along the direction of gradient, the function value changes the fastest, but along the direction of contour, the function value does not change. Therefore, for the input data X falling in the unstable region, its sensitivity to the disturbance △ x depends on the angle between △ X and the gradient vector (or contour line). If ∆ x is along the gradient direction, then the output y of the model function will change greatly if the ∆ x is small. If ∆ x is along the contour line, the output y of the function will not change even if ∆ x is large. This explains that in the actual neural network model, such as the neural network used for image recognition, some images can only cause classification errors after a specific disturbance, rather than any disturbance to the image.

The problem of binary classification of bimonthly dataset

The action mechanism of the anti sample is explained by contour distribution map. The following is a further demonstration of more complex data sets. This section classifies the bimonthly dataset. The contour map of data set and neural network are shown in Figure 6 and Figure 7 respectively.

Figure 6. Bimonthly dataset

Figure 7 contour map of bimonthly dataset

For the bimonthly dataset, the contour distribution of its classification model function is more complex. As can be seen in Figure 6, the distance between the two types of data is relatively close, and there are some intersections at the same time, so the contour lines at the decision boundary are relatively dense. Similar to linear classification, if the input x changes slightly along the direction of the gradient, the output y of the model will change greatly.

4 ring dataset

The contour map of the ring dataset and its neural network are shown in Figure 8 and Figure 9 respectively.

Figure 8 ring dataset

Figure 9 contour map of circular dataset

The two types of data in the data set shown in Figure 8 are also close and partially intersected. As can be seen from Figure 9, contour lines are more complex and unstable areas are more. Because of the close distance between the two kinds of data, the decision boundary belongs to the unstable region. At the same time, areas with dense contour lines appear in non boundary areas. That is to say, there are also unstable regions at the non decision boundary, as shown in Figure 9. In these unstable regions, the model is easily deceived by the counter samples. It can be seen that for the data set with complex distribution, the model has more unstable areas and more complex distribution.

The above data sets show the principle of neural network model against sample deception. For the convenience of illustration, the data in the above data sets are two-dimensional, which can be displayed intuitively through images. As the data distribution in the data set becomes more and more complex, there will be more unstable regions in the model, and the location of the unstable regions will be more difficult to predict. In practical application, the dimension of data set is often very high, for example, each sample of MNIST data set has 784 features, namely 784 dimensions; each sample of cifar data set has 3072 features, namely 3072 dimensions. For high-dimensional data space, the distribution of data is often unknown, and the decision-making boundary of the model is more complex, which can not be directly displayed by images, so the distribution of unstable areas of the model can not be accurately known. At present, although the academic community has put forward some methods to defend against attacks, the effect is limited. The main reason is that the existing research mainly changes the position of unstable areas through various methods, but it does not eliminate them. For high-dimensional data sets and more complex classification models, the distribution of unstable regions of the model is unpredictable. Therefore, the protection of the counter samples has not been fundamentally solved, and more in-depth research is needed from the mathematical principle.

Content editor: Wu Zijian, Tianshu laboratory; responsible editor: Xiao Qing

Past review

Istio Series II: envoy component analysis

Graph mining: an eye for the world

Soar: software definition security choreography

[recruitment] recruitment announcement of interns of Lvmeng science and Technology Innovation Center (long term effective)

The original article of the official account only represents the author's viewpoint and does not represent the position of the Green League. All original content copyright belongs to green alliance technology research communication. Without authorization, no media, WeChat official account is allowed to be copied, reproduced, excerpts or otherwise used. The reprint should be marked from the Green Alliance Technology Research Newsletter and attached to the link.

About us

Lvmeng technology research communication is operated by Lvmeng technology innovation center, which is the leading technology research department of Lvmeng technology. It includes Cloud Security Lab, security big data analysis lab and Internet of things Security Lab. The team members are composed of doctors and masters from Tsinghua University, Peking University, Harbin Institute of technology, Chinese Academy of Sciences, Beijing post and other key universities.

As one of the important training units of "post doctoral workstation sub station of Haidian Park of Zhongguancun Science and Technology Park", Lvmeng science and technology innovation center has carried out post doctoral joint training with Tsinghua University. The scientific research achievements have covered all kinds of national projects, national patents, national standards, high-level academic papers, professional books, etc.

We continue to explore the cutting-edge academic direction in the field of information security, starting from practice, combining the company's resources and advanced technology, to achieve a concept level prototype system, and then deliver product line incubation products and create huge economic value.

Long press the QR code above to follow us