vulnerability analysis of industrial distributed control system

Posted by santillano at 2020-03-17

As one of the types of industrial control system, distributed control system (DCS) is mainly used to control key basic measures, distributed in petroleum, chemical industry, metallurgy, cement and water systems, and is the "brain" of process operators. Its importance is self-evident.

In the mid-1970s, a microprocessor-based distributed control system appeared. After more than 30 years of development, DCs has integrated 4C technologies such as computer, communication, display and control, and faced more and more security risks while its functions are increasingly sound.

Let's review the highlights of vulnerability analysis of industrial distributed control system at the snow 2019 security developers summit.

Editor's Click

Crownless: after the industrial production enters the information and digital era, people pay more and more attention to its security. If the system is intruded, not only the production may be forced to stop, but also accidents such as explosion may occur. The technical difficulty of industrial control safety is low, but it is very fragile. This requires us to invest more resources in industrial control safety.

Guest introduction

Jian siting, master of software engineering, Fudan University, senior researcher of industrial control security of shadow security team, permanent member of China Automation Association, lecturer of kcon 2018, developed Ethernet / IP protocol device sniffing tool.

At the conference, the lecturer introduced the structure of industrial distribution system, industrial network topology and vulnerability analysis, which opened a new field of vision in the field of risk control.

Specific content of the speech

The following is the full shorthand:

Hello everyone! I now work in the industrial automation manufacturer's side to be responsible for the penetration and defense of industrial control, as well as the industrial control research of shadow safety, dawn safety and miester safety.

Next, I will share the title of "vulnerability analysis of industrial distribution system".

There are many kinds in the industrial control system. Besides PLC, there is also a large kind of DCS industrial distributed control system. It is different from PLC in architecture and application. What are the vulnerabilities in this field that cause the unstable state of the system? Let's share with you.

The user uses a set of such DCS system. The user uses unauthorized USB, which is brought by the owner's supplier. It accidentally infects its upper computer with the wannacry variety, resulting in its upper computer blue screen. The user finds our team to help it solve the virus problem.

What happened through this? After we finished the HMI protection, let's help to check whether the DCS system of industrial control network has the corresponding vulnerability. Based on this, today's issue arises.

The structure you see is the standard architecture of DCs that we often encounter. The gray color on the next side is the so-called controller, which is called "unit controller" in DCS. What is it to control? It controls valves and sensors in the chemical, petroleum and water industries.

The original intention of automation design is to replace manual manufacturing, which does not need human participation in industrial production. It does this. It has different kinds of controllers. On the next level, through the form of Ethernet is its monitoring system.

You can see that the controller is a box like thing. It connects the outside sensors and actuators through the signal line. As the operator of the main control room, how can you know the site status at the first time and what kind of interaction interface? It will pass through the upper layer, which we call the monitoring system.

The next level is the so-called management system. In the management system, you will extract the information you monitor and the management information, and then analyze your production status. The lower layer is just pure monitoring, monitoring and control, and the upper layer will have analysis. This is the standard DCS system.

What is the difference between it and PLC industrial control system? PLC is mainly based on the gray color controller below. The monitoring software above and the controller below can be from two different manufacturers, two different software, or even an application software developed on a PLC. However, this situation will never occur in DCS. DCS is a manufacturer, and the protocol is also a private protocol.

We mainly test the red box, which is a safety instrument system. For example, I assume a scene, because you may not be doing manufacturing, it is difficult to reproduce this picture in your mind.

For example, when everyone drives, how does the gasoline in the car generate? The underground oil is extracted by three major oil companies, and becomes the oil we use after refining and cracking. In the process, it will undergo high heating and high-speed centrifugal rotation, but these two behaviors are very dangerous.

What's the meaning of the red system? For example, it sets a super high temperature, such as 1200 ℃, if the liquid in the tank is less than 10%, and if the centrifugal speed reaches tens of thousands of revolutions, it will set up this controller to ensure the safety. Once it exceeds the set index, it will stop immediately to ensure the safety of equipment and people.

This is the essential difference between traditional information security and industrial security. Traditional information protects more knowledge assets and information assets, but industrial control security is not the case. There are many information assets, but more importantly, people and equipment.

We once had a thing, we all know that the boiler is used for heating in winter in the north. The heater is delivered by the heating company. It has a boiler specially producing steam to deliver the steam. If the water level in the boiler is too low, then it will explode. This controller is to ensure that the water can not be dried.

It's like boiling water at home. In the boiler system, once the water is dry, the boiler will be blown out, because there is no way to release the hot gas inside. This controller is to ensure the safety of equipment and personnel.

How many parts does the scene consist of? There are two Cisco 2960 two-layer switches to form its network structure, another two DCS controllers, another two servers running Server2003, and four clients and XPSP3. Why do I mark this side gray? It's because we need to patch and secure it. The customer asked us to test these two devices, and we used a Kali machine.

This figure shows the network architecture. The two switches in the middle are 2960 switches, the two below are redundant controllers, the two above are Server 2003, and there are four clients. There are two lines on each machine, one is yellow, the other is green. It is to ensure that the single network will not be interrupted, and the dual network will come down.

But in order to ensure the network redundancy, it not only guarantees the dual network, but also ensures that if the switch breaks down, it will not affect the use. Here it forms STP, the environment for generating numbers.

If you look carefully, you will find that there is a ring here. How to ensure that there is no ring? This is a communication problem solved by DCs manufacturers themselves. It is necessary to ensure that the link is redundant and not form a ring. Just like a cell, there are many routes. How to make sure not to rotate? It's up to the manufacturer to solve this problem.

The interface behind each computer is like this. There are two ports on a network card. Manufacturers have made it a bridge mode. Traditionally, it should be two different network segments, but they have made it into the same network segment. In this mode, it forms a ring.

How to break the loop and prevent it from interfering with the loop? The manufacturer has developed a two-layer fault-tolerant Ethernet protocol, which was developed 10 years ago. It is a very excellent Ethernet redundancy protocol.

The figure on the left shows that when the protocol layer of the two port network card is added with its own driver software, it is made into a bridge. When bridging this mode, it also shields the detection of STP ring formation, which is a very smart place. At the same time, it opens the routing function here.

Up there is the protocol layer. It needs to run its own industrial protocol. You can see what protocol runs in it from the back. This picture is based on a two-layer high reliability network. The other picture is whether the state of each node is normal. To any node, there are four links that can be taken, and any link that fails has no impact, but at the same time, it only takes one link.

You can see the figure below. Although it has 4 pieces, it will block 2 pieces in the end. Which one is the best is calculated by itself.

When our team received the penetration task, the first thing we saw was the 2960 switch. First, we went to see if we could find information related to the 2960 and the manufacturer. We found the configuration file of this manufacturer. What does the red mark mean? It does not allow you to form a ring in the DCS network, and enables MSTP protocol to ensure only single link communication, and at the same time, it can achieve multi-path redundancy.

On this basis, what are the characteristics of STP? It is a network designed to break the loop. STP has a kind of unused state, the second is the forwarding state, the third is to turn off the gateway. In order to prevent the formation of a loop, the other is learning, and the other is listening. These five states are in a loop, and the state of this port of the switch is such a loop.

Each state switch needs 5-15 seconds, and no data forwarding is done between these 5-15 seconds, that is to say, the data is interrupted. Another figure, when forming a triangle or a loop, blocks the port. Although it looks like a loop, it is actually a line.

What tools do we use to do this destructive experiment? Let's try to see what happens if we attack this STP. We have adopted a tool, which is specially used to attack the two-layer protocol of industrial Ethernet. Everyone has this tool before Kali 2007.

We chose BPDU to send continuously. What do you mean? When the ring network is formed in the three switches just seen, there will always be a manager to decide which port is closed and which port is open. How to choose this manager? I think the device with the lowest address is the manager.

Let's imitate it. If we send a large number of MAC addresses to this switch, there must be a lower address than you. Will it cause link oscillation? This video is made by us. The upper computer of the DCS system can see the attack status. It enters the user name and password. After entering it, we choose the online monitoring chart. This chart is the online production status of the customer. I switch to this system for the 2960 If I hack the MAC address once, the target MAC must be the switch's MAC. I think if it is not the port Mac, because it is layer 2, there is no MAC on the port.

We send a BPDU to vibrate once, and tell it to vibrate. Then it starts to learn. Then we send a BPDU to vibrate continuously, and the MAC address is frantically sending. At this time, we switch to the platform. At this time, there are still some pictures, but the data has not been refreshed. Why are there still pictures? It has a time of reconnection with its own controller. The reconnection time has three times defined in it. Now it seems that all of them are OK.

At this time, the data has not been refreshed. When we want to go back to the interface, we find that it has reported an error. The "timeout" controller has been disconnected completely. This is our demonstration, which caused the connection between the DCS upper system and its controller to be disconnected.

There is a big difference in DCS. Once the upper computer is disconnected, the following controller equipment will automatically enter the protection state, which will not last on its own. Indirectly, the controller will be in the hold state, and all ports will remain in the state of the last time, without any change.

In addition, when we find this, we try to see what protocol the controller runs, and what method we take? Do you know that cve-2018-0171 is a vulnerability found by a security team of Cisco, that is to say, if a malformed packet is sent on port 4786, the switch will be killed, or even get root permission.

The reason why we use this CVE to play this switch is to make a port image to draw out the flow from the upper computer to the controller. Because the switch is not like a hub, it is not full port forwarding, and you can't see the traffic between them at all.

This is our attack code. The bottom one is sent. If special characters are installed in TV1 and TV2, 2960 switch will crash directly. At that time, we wanted to get root, but we didn't succeed. Unfortunately, we killed the switch directly.

And the customer only gave us one day, there was no way to modify the code to find the wrong place to get root. We take another rough way, which is MAC flooding. The MAC address table of 2960 switch is about 4K. If we test that it exceeds 4K, it will become a hub.

Because it can't find the ports that need to be forwarded by itself, because the MAC address table, which port corresponds to which port, will flush out the original real MAC address, which is all false. What if the switch finds that there is no corresponding Mac? Broadcasting. When broadcasting, we can catch the clear flow from the upper computer to the controller.

Next, it runs a Modbus TCP, which is a very common industrial protocol in industrial Ethernet, but it not only runs the Modbus TCP, but also runs a multicast protocol. The 224 multicast segment runs the communication service on the multicast and the data on the TCP protocol. This is its design method. We also catch the traffic through the way of MAC flooding.

The Modbus TCP port is 502. Who will use Modbus TCP? The DCS system in the market has this interface, and this protocol has congenital deficiencies.

But why do many people use this Modbus TCP? Because Modbus TCP is the protocol of industrial Ethernet, it has been produced when we only talk about function and business in the industrial environment when we haven't talked about security. It is the first generation of industrial Ethernet protocol, so it pays more attention to how to realize business, but does not consider security.

This is its message. It is a 7-layer protocol. There is a Modbus TCP station address at the bottom. This station address is not the address of TCP. It thinks that the node in TCP starts from 0, that is, as many devices as there are in the station, it has as many addresses. There is a function code at the back. What the data stream is going to do is defined by the function code.

In addition, there is the address, followed by the data, but there is no verification in it. The two verification places are empty, because it uses the TCP verification, and there is no verification in the protocol. There are many codes, such as 1, 2, 3, 4, 5 and 6. There are at most 16 codes, including reading, writing, batch reading and batch writing. But there is no authentication in it, so replay or any access is effective for it, which is a defect of Modbus TCP protocol.

We wrote this code for the defects in the protocol. The target address is and the port is 502. We hack the address "000" once, and the behavior is to make it generate a "01" jump. This jump is just a numerical change in the eyes of our information security people, and has no significance.

But this is not the case in the industrial field. If this position corresponds to a 24V steam valve of yours, this steam valve is closed. When you make it jump once, it will open and close again. For example, 40 tons of steam will flow out of the pipeline in an instant, that is to say, people in the vicinity will die if they touch it. It is invisible steam, which may be There was a layer of white smoke. In the past, the body didn't even have the original shape. This was an accident we had at the scene. The program was wrong and turned on 40 tons of steam. The on-site maintenance engineer didn't even have any clothes left.

The whole code is like this. The first few characters are standard and fixed, followed by the word length of the calculation protocol, and then which function code is put, and the function code is "1". How to write data? The last place is FF. That FF is Zhi 1. 000 is to 0. A delay is made in the middle. After such a jump, the 0-1 jump of the flag bit is realized. I made such a presentation to the customer.

This video is to introduce the execution of this program. In addition, you can see that on that side is an analog field controller. You can pay attention to the first "0". Once this program is executed, such a jump will occur.

This code is actually very simple, that is, we need to do the only off address, where is the location? It's the place with the address "1". You can take a look at that place. It's changed from 1 to 0. It's just such an action. You can run all the states in the calculator for one bit, or for all the bits. It's OK that all the states in the calculator are not the states that the original program wants.

We have talked so much, and the most critical PPT is this page. In the face of such a problem, how can we protect it?

First, the switch must increase the port security policy. Just like the switch port of industrial users is either open or default, which is not allowed. If you don't have to plug this fracture, you can ensure the safety of the port.

Second, increase the isolation of industrial firewalls. Because the protocol itself is not secure, how to ensure that it is an illegal attack? By filtering the firewall to ensure that the original MAC and attack vector kick out. At the same time, the network must monitor the host.

Third, the user side didn't install firewall or kill software on the computer. This time, we helped the user install kill soft, open the firewall, and do baseline security. At the same time, we installed white list protection, so as to prevent other malicious Trojans from passing up for execution. Overall protection in this regard.

Thank you very much. This is my personal information, including QR code, email and three teams: shadow, breaking Xiao and Mister. If you are interested in public security, you can communicate with me in private.

There are no technical difficulties in industrial control safety, but it is very fragile. But the reason why the country has raised the security of infrastructure to such a high level is that it is a security level composed of three parts: personnel, equipment and assets, not a simple loss of information, which may cause equipment damage and casualties.

Note: click the original link at the end of the article to view the PPT of this topic. Other topics, PPT, will be released after the agreement is given by the lecturers. Please keep watching the snow forum and the official account of snow Institute.

1. Watch the snow 2019 security developers summit, a successful conclusion! Live review

2. Review of SDC issues in 2019 | new threat countermeasures: TSCM technology anti stealing

3. Review of SDC topics in 2019 | from the perspective of security research, the security capacity building of MacOS platform EDR

4. Review of SDC topics in 2019 | Android container and virtualization

5. Review of SDC topics in 2019 | Judicial Forensics Technology Based on cloud data

6. Review of SDC topics in 2019 | design and implementation of Android vulnerability detection sandbox

7. Review of SDC issues in 2019 | who pushed my window: IOS app interface security analysis

The official account ID:ikanxue

Official microblog: snow safety

Business cooperation: [email protected]

A kind of Click "read the original" to view the speech ppt