hierarchical analysis of threat intelligence

Posted by lipsius at 2020-03-06

Threat Intelligence, the security hot spot rising in recent years, has gradually started to land from concept to technology to platform. Both old and new security companies are exploring this aspect, including feasible technical solutions, exchange standards and business models. This paper mainly introduces the level of threat intelligence that I understand for the readers, and puts forward a pyramid model based on my own practice in recent years, hoping to help improve the industry's understanding of threat intelligence.

The concept of Threat Intelligence

Currently, Gartner's definition of threat intelligence is widely cited:

“Threat intelligence is evidence-based knowledge, including context,mechanisms, indicators, implications and actionable advice, about an existingor emerging menace or hazard to assets that can be used to inform decisionsregarding the subject's response to that menace or hazard.”

This is an ideal definition, which puts forward clear requirements for the amount of information that should be included in the intelligence. It can be regarded as a narrow sense Threat Intelligence to provide a complete intelligence pattern for high-end users to make decisions.

In fact, the majority of organizations do not have access to such accurate and comprehensive information services, and even if they do, they cannot take countermeasures. Imagine that even if there are security vendors who can tell a large company about the organization, national background and even personnel behind a security threat (these are necessary components of high-end Threat Intelligence), what can they do? The usual organization is not a law enforcement agency and cannot take any threat mitigation measures against this information.

The indicator of compromise (IOC) may be more practical for general company organizations. It is composed of data that can be used by border security devices and host security protection software. Typical intrusion indicators include file hash, IP, domain name, program operation path, registry key, etc., which are analyzed in detail below.

The role of Threat Intelligence

Through the exchange and sharing of threat intelligence, combining the forces of all parties in the security industry, integrating information resources to achieve a greater range of rapid response, in order to combat the evolving security threats. The figure on the Webroot souvenir below shows the role of threat intelligence very vividly.

At present, threat intelligence, as a key word of security, seems to be very hot, but it is obviously expected too much. There is no single point of silver bullet technology in the field of security. Accurate and timely label data can help users quickly deal with threats that have occurred or are occurring, such as the hash of black samples, the IP or domain name of externally connected C & C and downloader servers, network boundary devices or agents running on the host can find and adopt automatic countermeasures through simple matching. When users try to analyze a suspicious event, Threat Intelligence can provide useful reference materials for users to determine the malicious nature of the suspicious event, such as whether the IP involved in the event is in some known blacklist, whether the related domain name is used by known apt activities, etc.

The level of Threat Intelligence

The figure above is a pyramid diagram showing the threat intelligence level. We explain the information composition of each level, the role it can play and the method of analysis and acquisition from the bottom to the top.


The bottom-level Threat Intelligence consists of files, mainly involving various malicious codes related to malicious network activities: Trojan, backdoor, downloader, dropper, etc. Generally speaking, the document sample is the starting point and basic data of the whole event analysis, and its importance is equivalent to the most important physical evidence in criminal investigation, such as murder weapon and corpse. Hash used to mark files is the most basic threat information, which can be easily used to search on the target system. If a Trojan horse file is found on the system, the object is very likely to be infected. The following figure is a list of some files hash related to the butterfly attack activity released by Symantec.

The main problem of the vast majority of file hash is that the specificity is too strong. No matter MD5, SHA1, sha256, as long as a bit change occurs in the file, it will lead to totally different hash values. This feature can avoid false positives, but also enable attackers to avoid detection through the simplest content modification, so once it is disclosed publicly, it will almost expire immediately. Therefore, as an intrusion indicator, file hash can only be used to find the events that have occurred. For the defense party, it is necessary to use automatic search matching mechanism to shorten the window from the event occurrence to the discovery time as much as possible, so as to reduce the loss as much as possible.

Host and network features

On top of the file hash are all kinds of hosts and network features directly related by analyzing the file samples. These data can be used as intrusion indicators. In short, host features may contain data with distinguishing ability generated when malicious code runs on the machine, such as mutex, written registry key, file path, etc., while network features may include information such as IP / domain name downloaded by externally connected C & C or components, URL accessed, communication protocol, etc. The figure below is an example of a Trojan's built-in host and network characteristic data.

Most of these data can be obtained by running the samples in the controlled environment (sandbox and virtual machine). It may be necessary for manual reverse debugging and analysis of samples with high resistance or unable to run, which requires a lot of time and energy. Compared with the file hash, these features obtained from the static or dynamic analysis of the file are relatively stable, but the cost of change is still small, especially after being exposed publicly, the value as an intrusion indicator will soon disappear.

Event level intelligence

Above the information related to a single sample is the threat intelligence at the event level. When we get a large number of file sample related details, we can achieve the classification of sample family by analyzing the similarity of each dimension. The following figure shows the homology analysis based on the sample characteristics of three suspected European sources of secret monitoring software. It can be seen that some key features are consistent, suggesting that they have a common source.

By analyzing the upstream and downstream relationship between the samples, we can infer the access channel of malicious code when the attack occurs, so as to understand the attack tactics of the opponent, whether the security loopholes are used and what kind of social work skills are used through harpoon mail, water pit attack, U disk implantation or other active attacks. It is of great significance to understand the attack methods for the defense party to adjust the protection scheme, fill in the loopholes and blind spots, make the protection more targeted and reduce the cost. The figure below shows the malicious code infection mode used by attackers in the lotus apt event disclosed by 360 company in 2015.


If the collected data includes the behavior of the victim's internal network and host, we can also understand the way and method that the attacker attempts to further obtain control in the victim's internal system. Therefore, in order to get accurate Threat Intelligence of event level, more data input and more analysis resources investment are needed, so the results also contain more information.

In the following, we have sorted out the dimensions that can be used for correlation analysis in each link of Lockheed Martin kill chain model.

Reconnaissance and tracking

Weapon construction

Load delivery

Penetration and utilization

Installation and implantation

communication control

Achieve goals

Target country

Specific mutex

Malicious code entry mode

Loophole utilization

Initial start path

Domain name registration information

target data

Target individual

Execution process

Harpoon mail

social engineering

Continuous start mode

Domain name preference

Packing method

Industry involved

Encryption and decryption method

Puddle attack

Camouflage normal mode

Domain name naming preferences

Transmission method

Specific functional modules

U disk

Where IP is ASN

Destructive function

Countermeasure analysis measures

Active infiltration

Backdoor tool

Source project path

Tool type

Specific data string

Tool configuration

Language compiling environment

communication protocol

Specific digital signature

Certification credentials

Component organization structure

SSL certificate

Special mistakes

An example of an actual multiple attack event from Lockheed Martin based on element correlation:

Organizational intelligence

Based on the event level fact collection and analysis, we may be able to identify the same organization behind multiple attacks, and determine the organization's source, division of labor, resource status, personnel composition, action objectives and other elements.

By analyzing the development and maintenance status of tools, breakthrough technology and communication infrastructure used by the adversary, we can infer the resource status of the adversary. Generally speaking, using a series of vulnerability utilization and control tools developed by ourselves, the communication network uses a lot of IP, domain name and server resources. When the load is delivered with professional targeted skills, it can be judged that the opponent has strong ability, has sufficient resource support, and there may be a clear division of labor within the organization.

The source of the organization can be inferred by analyzing the language related features contained in the sample files involved in the event, such as the language of strings, the language version of development or packaging tools, and the default configuration of non executable samples. In the case of a large number of samples, we can infer the opponent's daily working time, even the vacation situation, by analyzing the generation time of samples, and match the holidays of specific countries, which may also become an effective clue to analyze the source of the opponent. If we have the resources to know the distribution of the affected countries, industries and individuals, as well as the geographical location of the infrastructure used by the types of attacks launched by the opponents (secret stealing or wealth seeking), we can also infer the source of the opponents with great confidence.

The following is an analysis of the source points of an apt activity named operation cleaver:


Personnel intelligence

On top of the organization is the threat intelligence related to personnel, which is the last step of threat analysis, to realize the mapping from virtual identity to real identity.

People are at the top of the Threat Intelligence pyramid, because it is the most stable part of the whole threat system. Once people are located, they will also be located at the root of the threat. To solve the problem of people is a real solution. Verno Vinci has written a wonderful novel called real name and real name. The biggest problem in the future virtual space is how to map the characters to the real people. Once the mapping is completed, it means the end of the battle. This is because if we deal with people, it means that the means to solve the problem will no longer be limited by technical possibilities. Generally, we can only take technical measures to deal with all types of Threat Intelligence under people. For example, we know some IOC (sample, IP or domain name) related to attacks, and the effective measures to deal with them are mainly limited to manual or automatic identification, isolation and blocking. When we are fighting against people, the available means are much more abundant. We can not only exert pressure on the target itself, but also on its environment. We can achieve the similar effect of dimension reduction attack by using the weakness of human nature.

In order to obtain such information, the input of information is no longer limited to technical analysis. It may require other data input and unconventional means of evidence collection, such as real registration information, social account association data, transaction data, honeypot and countermeasures. The following figure is a retrospective demonstration of 360 company's use of interactive data association system to the initiator of xcodeghost event. Starting from a C & C domain name used by the attacker (with privacy protection), it is finally located to an associated domain name through layer by layer push, and the attacker registers it with its real name.


This paper makes a hierarchical division from the perspective of the stability of threat data information intelligence, describes the composition of each layer and gives some corresponding examples. In the future, it will make a more in-depth analysis of the elements of the concept of Threat Intelligence defined by Gartner.

Reference link


Butterfly: Corporate spies out for fiancialgain


Ocean Lotus (apt-c-00) digital ocean Hunter

Intelligence-Driven Computer NetworkDefense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains

Operation Cleaver