using machine learning to track hackers georgia institute of technology won $17 million traceability project

Posted by millikan at 2020-03-08

Using machine learning to track hackers Georgia Institute of technology won $17 million traceability project

Georgia Institute of technology has just won a $17.3 million contract to seek technical means to achieve attribution, or traceability, of cyber attacks.

The researchers tried to locate the source of network attack by machine learning technology, so they chose the ancient Greek nemesis, rhamnousia, to name the project, in order to show their determination of "just revenge".

"Whenever we think of people suffering from attacks on systems, theft of intellectual property rights, and tampering with data, we realize that there is a duty bound: we can no longer let these attackers go unpunished," said Manos andonakakis, assistant professor of electrical and computer engineering at the Georgia Institute of technology

Michael Farrell, chief scientist at the network technology and information security laboratory, added: "if you can't lock in your opponent, you can't be a deterrent to him. Traceability is the key to cyber deterrence, and the U.S. government needs to explore a formal means of reuse. "

But traceability is a very controversial and influential issue. People have different opinions and no conclusion about its feasibility. The wrong result will lead to major international events, such as "cyber war". However, the most likely result of accurate traceability is to be a powerful deterrent to other countries' violations, just like other cyber threats, economic or military sanctions.

Is accurate traceability possible?

"When we input the only information in our hands into the learning system, the system can find the information we didn't provide, and these signs exist in thousands of documents," said Louis Columbus, technical director of panda laboratory He believes machine learning will help us find the "fingerprints" left by attackers.

Erie Kahn, one of the founders of sqrrl and a former head of cybersecurity at the White House, agreed. He believes that traceability can be achieved by any combination of three methods: aggressiveness ("hacking a C2 server, observing its data transmission and flow - only the U.S. government can legally do so in the U.S.); attacker's fault (" sometimes the attacker's fault will leave traceable traces "); and probability inference (" studying the attacker's code or third party " , looking for patterns or flags to speculate on the attacker in a certain certainty). The final approach will be the cornerstone of the Georgia Institute of technology project.

Murray Harper, vice president of technology at beyondtrust, agrees: "it's statistically feasible to do traceability by studying code samples and attack patterns."

But Elijah kroshenko, President of high tech bridge, still has doubts: "it's a great idea to trace the origin of the Internet, but I doubt whether it can be realized in only 17 million. This year, we have invested billions of dollars in hundreds of innovation projects, but we still haven't found a way to identify cyber criminals."

Brian Bartholomew, senior safety researcher at Kaspersky laboratories, warned: "traceability is a big problem, and there are many factors that make it difficult for us to naturally believe in the results of traceability."

Generally speaking, few network security experts think traceability is impossible, but many people think it is unreliable in the end. Notably, two independent researchers (David Halley of ESET and Shawn Sullivan of F-Secure) have shown that traceability is an art, not a science.

"Internet tracing is not so much a science as an art," Sullivan said. You often make general inferences based on known behaviors and clues. There is no science in this process. Due to the lack of evidence, analysis is easy to be fallacious. I don't believe in traceability as much as I believe in physical laws. "

It is also obvious that people have to be careful about traceability because traceability often means punishment. Scott Fulton, a technician at beyond trust, said: "while it is scientifically feasible to increase the reliability of trained machines in determining common paradigms, it is not legally admissible."

Is machine learning accurate?

The output of machine learning is not in the form of yes / no, but a fraction of probability. These scores are derived from the interaction between the algorithm and the data. The algorithm looks for the patterns and relationships of the data, and the machine learns from the results by repeating the process.

The efficiency of this process depends on the quality of the algorithm, and the accuracy of the output depends on the accuracy of the data used in learning. Both are affected by human intervention and human error. In fact, it is generally believed that the algorithm itself is not completely objective, it has the prejudice in the subconscious of the developers. However, what is more worrying is that if the data is wrong, the output will be wrong.

"This is the most critical part of the traceability process. If the data used to build the model is wrong, the prediction is naturally unreliable," said Clarence

Kroshenko of high tech warned: "the quality of machine learning depends on the ability of people who design algorithms and select data groups. In addition, professional black hat has been using machine learning and big data to create complex deception or smoke screen systems to engage in criminal activities. "

"There will be new threat sources, which means that new relevant items must be established in the future," said hayber of beyond trust, but the existing statistical matching will not be effective.

Kaspersky's Bartholomew commented: "I think the biggest problem is the source of the initial data of the system. The source of the attack is very complex, and in many cases, the attacker will not have a unified name. Attacker X in the mouth of organization a may eventually be called attacker y and Z by organization B. It's very difficult to deal with this form of organization unless they acquiesce that it's homologous (like a government). But if they take this approach, we are born to limit our vision and constantly revolve around this assumption.

Mike Anders, an Internet intelligence researcher at shadow blade technologies, has an optimistic attitude. After all, no "intelligence" can be 100% accurate. "Unless the result of the conjecture is confirmed, the accuracy of all intelligence work is always less than 100%, and even after it is confirmed, these work may be wrong! Waiting for 100% accuracy traceability is just a lame excuse. The lack of complete intelligence is always used for this purpose. Whether we can make a judgment when we know that their information may be wrong or the data may be insufficient is the basis for distinguishing the real leaders, decision makers and the breadbaskets. "

Misleading information, directions and signs

Some experts fear that different sources of attack will deliberately mislead the traceability machine.

"There are many attackers who actively use deception to mislead or confuse investigators," Bartholomew warned. We can theoretically distinguish the wrong signs in various cases, however, some attackers are very good at forging clues for people to follow. In recent years, this trend has become more and more popular, and I think it will only get stronger and stronger. "

Harry added: "some of the logo attacks are aimed at producing error traceability. At the same time, many attacks attempt to forge programming details, timelines and other information

Kroshenko warned: "black hats can easily use dozens of VPNs from multiple countries to cheat the FBI's machines, or use the FBI's internal IP to carry out attacks. These cases are difficult to investigate at the technical and political levels. We can clearly guess who is behind the attack, but we don't have any exact technical evidence unless the attacker makes a mistake that leads to exposure. "

Columbus believes that a good traceability engine will make it more difficult to act on false signs, but it will not be able to root them out, "unless the attacker knows the specific model used for traceability. For example, all the information the Department of defense gets so they can make fake logos fool the system - for example, convincing president x to attack us. "

Cyber war

One of the possible consequences of the recognition of traceability results is to promote the inevitable network war. If the source of a destructive network attack is locked in a specific government, the attacked government will have to fight back publicly.

Bartholomew: "the key point is that while it's gratifying to create new technologies to help traceability, we must not rely too much on such" evidence "or simply use it as a tool to analyze intelligence."

Clarence believes that although the network war is inevitable, but accurate traceability will reduce it. This is the deterrent effect of traceability. "Cyber war is inevitable The reason why it is inevitable is that it is difficult to trace the origin of attacks because they are cheap and easy. If we can "solve" the traceability problem, even if it is only partially solved, we can make any country have to think twice before attacking (to avoid being found).

Sqrrl's Kahn points out that retaliatory counterattack does not lie only in the spontaneous network level or the war level. "If the interests of the United States are seriously damaged by a cyber attack, the U.S. government will do everything it can to fight the enemy head-on, which may include diplomatic condemnation, specific actions and (secret or public) cyber acts."

Michael Anders also pointed out that the government will never make decisions based on this single source of information. "Cyber war is a possible option. But, like all decisions about war, it will never be based on just one thing or one event. Or at least, it shouldn't be so sloppy. We need to keep in mind that the real intelligence work is to extract information from data and analyze it. "

"Network traceability is part of intelligence induction. Traceability is not just a component, but it's not the last step before making a conclusion. The decision of any war will not only take the result of black box operation, but also need the comprehensive analysis of all kinds of intelligence to get the accuracy. That's why human network analysts are so important to the decision-making process. Machines can be weapons of war, but you still need human control over them. The reason of network intelligence analysis is exactly the same. "

Do enterprises need to look back?

The consensus at this stage is that accurate and automatic traceability can be achieved to a certain extent, but its accuracy cannot reach 100%. We have to take into account the constant danger of erroneous input and weak analysis. Under this premise, should enterprises (rather than governments) have expectations on traceability? Louis Columbus believes that enterprises should, although they are not very interested now.

"Businesses really should be concerned," Anders said. However, the government needs to intervene and clearly limit the scope of authority of the attacked commercial units on the premise of respecting "active network defense". Open recognition of active defense means a lot, which is not equal to the initiative attack in traditional concepts such as "hacker", but a kind of network defense action in a strict sense. This needs to be done by the Department of justice, the FBI and Congress. After that, there won't be so much blame for people's lack of 100% accuracy in traceability. "

"Although there is a long way to go, it's still a lot easier for us to find the essence of the problem. Traceability is an effort to make decisions. Obviously, the more efforts, the better. It can't be 100% accurate - that's the nature of the network. Sometimes you have to get used to it. "