lemna: deep learning black box interpretation model for security applications - arkteam

Posted by tetley at 2020-04-16

Author: {wjn} @ arkteam

Original title: Lemna: explaining deep learning based security applications

Original author: Wenbo Guo, Dongliang mu, Jun Xu, Purui Su, Gang Wang, Xinyu Xing

Source: CCS'2018 (best paper)

Original link: Doid = 3243734.3243792

The incomprehensibility of deep learning model greatly limits its application in security applications. Especially in the black box scenario, it is difficult for developers to know why the model is invalid, what they have learned and how to modify it. Therefore, many scholars have been committed to the study of deep learning interpretability, and there have been many interpretation methods for CNN (commonly used in the field of image recognition). However, there are few researches on the interpretability of RNN (sequential model) and MLP (efficient) models commonly used in the security field. Due to the high dependence between features and the high requirement of interpretation accuracy, the existing interpretation methods are difficult to explain the deep learning model in the field of security.

In this context, the author of this paper proposes a high-precision black box interpretation model Lemna (local interpretation method using nonlinear approximation), which is widely used in security applications.

1、 The transformation of model interpretation in black box

The main task of model interpretation is to explain why the classifier classifies sample x as category y, which features and their weights are used to classify sample X. Find the linear regression estimation of the local classification boundary near the sample X in the feature space. The coefficients of each part of the features on the G (x) can be regarded as the weights of each feature. The weights can indicate the influence of the corresponding feature on the model decision-making, and finally realize the interpretation of the classifier.

Figure 1 explanation method of black box model

2、 In order to achieve RNN / MLP and high interpretation accuracy, two technical methods are used

Weight of each linear regression model

L (f (x), y) loss function

S threshold (super parameter)

Parameter vector of linear regression model

3、 Example of Lemna application

Figure 2 using Lemna to interpret the classifier (used to determine the starting point of binary function)

Lemna is applied to interpret the classifier (used to determine the starting point of the binary function). 83 is the real function starting point and 0.99 is the output probability of RNN classifier. By sending the hex sequence (83) to Lemna, the Lemna system interprets the classification decision by color coding the most important hexadecimal (the importance of the feature is reduced from red to yellow). The figure above shows that Lemna points out that the hexadecimal code "90" before the function starts is the most important reason for RNN classifier to judge.

4、 Comparative experiment

In the experimental part of this paper, the deployment and application of Lemna are carried out for two common deep learning application scenarios in the security field: binary Reverse Engineering (RNN model) and malicious PDF detection (MLP model). The results are as follows: 1

(1) The accuracy of fitting local decision boundary

Figure 2 Comparison of the accuracy of lime and Lemna in fitting the local decision boundary

RMSE (root mean square error) was 0.1532, which was still nearly 10 times higher than that of Lemna (0.0196). The results show that the hybrid regression model proposed by the author can establish a more accurate approximation than the simple linear model.

(2) Interpretation accuracy estimation

Figure 3 Comparison of interpretation accuracy between lime and Lemna, where

(a) Feature removal test: construct sample t (x) 1 by invalidating feature FX selected in case X;

(b) Feature enhancement test: randomly select an instance r from the opposite class (that is, as long as the label of R is not y), replace the eigenvalue of instance R with the eigenvalue of FX, and construct t (x) 2;

(c) Feature generation test: keep the feature value of the selected feature FX, and randomly assign the value to the remaining features to build t (x) 3.


[1] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining (KDD)