fingerprint identification technology of industrial control system

Posted by fierce at 2020-02-27

At present, fingerprint recognition technology has been widely used in ICT (information and communication technology) system. The attacker obtains the device fingerprint by scanning the network to associate the vulnerability intrusion system, while the security guard uses the device fingerprint to discover the vulnerability of the system and detect the network abnormality of the system. In industrial control system (ICS), fingerprint recognition technology is also used to detect the vulnerability of ICs system and detect the attack behavior. This paper focuses on fingerprint identification technology in the field of industrial control security.

What is fingerprint recognition

In the field of ICT, fingerprint recognition is a technology that uses different information to describe the devices or software running in the network. What we know most is the fingerprint of the device, which is used to identify the hardware, operating system, running software (and its related version number, configuration parameters) and other information of the device remotely.

Common application scenarios of fingerprint recognition

Network space search: Shodan is the most popular network space search engine at present. It scans HTTP, FTP, SSH, Telnet, SNMP and sip protocols, and identifies the whole network devices by analyzing the information in the process of interaction between the client and the server. At present, the more popular threat sensing system also uses fingerprint devices technology to detect the network space devices.

Asset management: in reality, system administrators seldom know all the information about assets, or they know the wrong information. The reason for this situation may be that the information is not updated in time, or the system maintenance is outsourced, or the equipment supplier provides the wrong configuration information. Therefore, it is necessary for ICs system to provide information to ensure efficient site inspection, and the system administrator can accurately understand the relevant configuration information of the system.

Intrusion detection: in theory, attackers can invade ICs network by injecting commands or fake data, such as causing catastrophic consequences such as large-scale power outage. Some devices cannot be upgraded because they are old, even some device suppliers do not provide online upgrades and patches. It is very important for security staff to detect the intrusion early.

Types of fingerprint extraction methods

There are two kinds of fingerprint extraction methods: active and passive. Active fingerprint extraction requires tools to scan the network system to obtain information, while passive fingerprint extraction method is to obtain information through less intrusive and passive monitoring network. Generally, the probability of successful identification system of active fingerprint recognition is higher. This is because active recognition means collecting all the information needed to generate fingerprint, while passive recognition can only collect session channel information. However, active recognition does not work at all times. Detection scanning is more likely to cause the network to be busy and easy to be detected. For example, in a SCADA system, active scanning can overload the system. Active debugging will increase the number of frames processed by the device, and PLCs and RTUs will not be able to support the excess traffic, resulting in the failure of normal requests to respond. However, due to the complexity of collecting information, passive monitoring network has the problem of fingerprint accuracy.

ICS / SCADA environmental characteristics.

In the process of application of fingerprint identification technology in ICs field, it has advantages and challenges compared with traditional network. ICs system components have their inherent characteristics and defects compared with conventional Internet and corporate LAN. On the one hand, compared with the traditional IT system, the industrial control equipment in the ICs system has a long life cycle, stable network topology and conversation; on the other hand, the information collection method faces the problems of active or passive method selection, equipment diversification, long-time TCP conversation connection, etc. The protocol customization of equipment supplier is a double-edged sword. The protocol allows the detector to locate (open protocol) or identify specific equipment (private protocol) in ICs system, but it is very difficult to analyze the message compared with private protocol because there is no document.

General process of fingerprint extraction

Although identifying objects (such as operating system identification, hardware identification, specific software identification, etc.) and information data sources (such as packets, network traffic, time, etc.) are different, all identification tools or methods have common functional tasks. Therefore, all the working modules of recognition tools are abstracted here. The overall processing flow of fingerprint extraction tool is shown in Figure 1.

Data sources: any fingerprint recognition method or tool depends on one or more information sources. In the ICs environment, there are many kinds of different information that can be used as fingerprint identification information sources, mainly divided into two types: TCP / IP protocol (Protocol) and network session (communication traffic) characteristics. TCP / IP protocol features are widely used in traditional ICT fingerprint recognition technology. But we are aware of a major problem. In the ICs environment mentioned above, too many TCP triple handshakes are not allowed for long-time TCP session connection (that is, there are no syn and syn-ack packets). Because the information needed for identification is stored in these initial packets, the implementation method based on TCP / IP protocol is very inefficient. Of course, there are exceptions. Some protocols used in ICs (such as MODBUS) provide entry functions, which facilitate fingerprint identification. In the ICs environment, the network session has significant stability and regularity. The research community believes that the data sources such as time dimension, network traffic characteristics and interaction mode are suitable for the ICs environment to avoid the problems encountered by using the TCP / IP protocol.

Gathering: the collecting module realizes the method of collecting information. This is more about passive recognition to avoid any possible system interference. The best solution is to do transparent network sniffing for ICs system components. Passive recognition uses pcap files generated by Wireshark and other tools or sniffer on the direct production line for data analysis. It has no network traffic injection and does not respond to the coming messages, which can ensure the operation of ICs system is not interfered. Plcscan, modbuspatrol and other similar tools are an exception, because MODBUS provides query related functions, and the active query mode of ICs system has little impact on the system. In addition, not all the network flow data is valuable. The collection module will filter out the ICs network session independent data and dirty data (such as TCP retransmission, repeated ACK packets, etc.).

Model generation: This module organizes and stores data. The active recognition method can get fixed data structure or detailed information list through scanning query, and does not need redundant specification. However, the identification methods based on time, network traffic, session interaction mode and other data sources need to deal with the complete behavioral feature set of ICs system, which can not be reduced by a simple signature. Therefore, the adopted feature set needs systematic and complete data structure definition, description and generalization of ICs environment architecture, attributes, operation trend, etc.

Decision models: generate the output of the module. In order to solve the data structure definition mentioned in the generation model, a concept of context model is introduced to describe the operation behavior of the system and the operation implementation of the device. Moreover, it can show the role of the device in the system based on the eigenvalues and operation relationship. After the data is collected, it is processed by context model to obtain higher level of fit analysis information.

Pre processing: after being processed by the gathering module, session information will be further refined. The processing depends on the data structure and classification algorithm of the decision model. The simplest example is that the preprocessing module extracts the unknown ICs environment data and establishes the signature information for the subsequent comparison. In other cases, the preprocessing module extracts higher level information through context model. The preprocessing module needs to filter useless information and mark incomplete information at the same time.

Classification: the classification module determines the fingerprint. In the standard TCP / IP protocol stack analysis method, a series of comparison algorithms are implemented to identify the operating system and other information. Some methods identify software applications and hardware components. When the ICs protocol provides methods to scan and query the device information, comprehensive fingerprint information analysis can be done. In ICs system, the identification target of fingerprint also includes vendor identification, hardware identification (such as device model), Software identification, component type identification (such as SCADA servers and PLCs), component role identification (main PLCs and ordinary PLCs), network topology identification (interaction between PLCs and single SCADA service or between PLCs), generation of processing flow information (there are network characteristics differences between energy industry and water plant processing flow), etc.

Decision: the output of classification algorithm, i.e. the result of fingerprint generation. The implementation of traditional matching method involves the update of result set. In the case of data modeling such as machine learning, supervised / unsupervised learning training will be conducted for the output results to optimize the generation model.

Method introduction

In the process of recognition, are you confused about the time consumption caused by manual scanning recognition? Port scanning may disrupt the operation of low computing power or outdated devices. In ICs environment, we always prefer to use only passive monitoring to minimize potential risks. At present, there are some relatively mature passive identification tools (such as ettercap, p0f, satori and networkminer), which are based on TCP / IP stack monitoring and analysis network. But researchers prefer to use transparent, deep packet inspection to identify ICs network behavior. Here, three new and typical identification methods of industrial control safety research community are introduced. They are all passive recognition methods based on the analysis of network traffic characteristics. We understand the method from several dimensions, such as principle basis, data source, classification algorithm, accuracy, robustness, universality, etc.

Eigenvalue coefficient rating: as mentioned above, the network session in the ICs environment is periodic, and the field devices port is relatively fixed to the SCADA port. Therefore, the author [2] extracts five basic conversation features to rate the eigenvalues. These five basic conversational features include:

1. Source IP (s-ip)

2. Source port

3. Target IP (d-ip)

5. Unit interval length (1s) (segsize)

4. Target port

According to these basic characteristics, five characteristic coefficients describing the network traffic characteristics are calculated (see reference [2] for the specific algorithm), which are as follows:

A. Periodicity coefficient pr

B. Communication durability coefficient Dr

C. Device complexity gap coefficient Cr

D. Network service popularity coefficient ur

E. Segment size coefficient SR

The experimental results of data are shown in Figure 2. Each row of eigenvalues gets a result F value and a rank rank value through the algorithm designed by the author. We can clearly find that different device IP, Port Association score coefficient is significantly different. We can find out the SCADA port by scoring coefficient, and find out the SCADA equipment by the SCADA port. The accuracy of the method is evaluated by F-score, as shown in Figure 3. From the F-score score, we can see that: on the one hand, the method can accurately identify field devices and master (precisiond1 = 1), and can completely identify the SCADA protocol customized by different manufacturers; on the other hand, in dataset1, the method can not completely and accurately identify the system HMI (recalld1 = 0.9434). The overall score of F-score is f-scored1 = 0.9709 and f-scored2 = 1. It can be seen that the method performs well in terms of usability and versatility.

Cross layer response times: the author of this method [1] designs a cross layer response time algorithm (CLRT) identification method based on the characteristics of ICs environment, such as low computing power, fixed CPU loading and starting, simple periodic network traffic, etc. Cross layer response time can determine whether the attacker forges the response through his own machine, or whether the CPU load of ICs environment or the application configuration changes. In this method, two packets with the same source and the same direction in a time slice (such as a day) are used as the basic elements (Figure 4) to form a sample data set, which is processed by ANN algorithm and Bayes classifier algorithm to identify equipment and software. The experimental data comes from the environmental data of two operating substations, with a data cycle of 5 months. In some cases (sample time period difference, such as one day or one month), this method can achieve 99% accuracy in fingerprint extraction. In terms of robustness, the analysis results of naive Bayesian classification algorithm can highly identify device forgery attacks (Figure 6, CLRT part). On the one hand, this method requires SCADA protocol to use "read" and "response" messages, but not all SCADA protocols are implemented; on the other hand, the speed of generating fingerprint depends on the number of acks in quick mode used by the system.

Physical fingerprinting: the mechanical and physical characteristics of a device determine the time when it executes a specific operation instruction, so the characteristics generate device fingerprints. The author [1] gives two methods to calculate the response time of operation instruction, one is the automatic response time stamp, that is, the difference between the discovery operation instruction and the discovery response time of Ethernet layer; the other is the event sequence record time stamp, that is, the time difference between the discovery operation instruction of Ethernet layer and the occurrence event of application layer, as shown in Figure 5. The same experimental data [1] as cross layer response times identification method, the experimental environment shows:

① different suppliers have different response time for some specific instructions (the opening time of self-locking relay is similar, but the closing time is obviously different);

② the response time of different operation instructions of the same equipment is different.

Through the processing of ff-ann algorithm and Na ̈ ve Bayes classifier algorithm on the collected time data set (sample), the accuracy of the identification equipment is 92%. The physical fingerprint method also performs well in terms of robustness, as shown in Figure 6, physical fingerprint. At the same time, the physical fingerprint method also depends on the SCADA protocol including time response operation, but not all the SCADA protocols support it.


Passive recognition method has been more accepted by industrial control industry due to its characteristics of interference with ICs production environment and automation as small as possible, and researchers have also invested more enthusiasm in passive recognition. Although there are some mature passive identification tools based on TCP / IP protocol stack, researchers prefer to use network traffic characteristic analysis method to avoid interference with normal industrial production. As far as the current community environment of researchers is concerned, the passive recognition method based on network traffic characteristics needs to be improved in generality and accuracy, and still needs to be improved and standardized.


[ 1 ]   David Formby,Preethi Srinivasan,Andrew Leonard,JonathanRogers and Raheem             Beyah,“Who’s in Control of Your Control System? DeviceFingerprinting for Cyber-   PhysicalSystems ”, Available at [Online]:

[ 2 ]   Sungho Jeon, Jeong-Han Yun, Seungoh Choi and Woo-NyonKim , “Passive Fingerprinting of SCADA in Critical Infrastructure Networkwithout Deep Packet Inspection”, cs.CR ,2016.

[ 3 ]   Marco Caselli, Frank Kargl and ValentinTudor ,“Device fingerprinting” ,Available at [Online]:

[4] Peng Yong, Xiang Xiang Xiang, Zhang Miao, Chen Dongqing, Gao Haihui, Xie Feng, Dai Zhonghua, "Scene fingerprint and anomaly detection of industrial control system", Journal of Tsinghua University (NATURAL SCIENCE EDITION), Vol. 56, No. 1, 2016.

[5] lighthouse lab, "organizational behavior analysis report on intelligence collection of key infrastructure in cyberspace", /.

[ 6 ]   Erik Hjelmvik ,“Passive OSFingerprinting” ,,Saturday, 05November 2011。