dong yicheng: situation awareness from entering the pit to rebirth

Posted by tzul at 2020-02-27

For security management, not seeing means not being able to manage. In the severe network security situation, the tide of network security situation awareness construction is coming as scheduled. Compared with the general security technology work, this is a complex project with large resource investment, wide coverage and high technical difficulty, so that people often see the cool display, numerous models and massive data and so on, but have no way to start the specific construction work. Drucker said that doing the right thing is more important than doing it right. In order to avoid a large number of human and material resources "floating", to ensure the effective role of situation awareness, and to improve the efficiency of safety work, we need to abandon the traditional work mode of simply stacking tools, and construct from the bottom.

Situation awareness construction is not simply the accumulation and display of data, but comes from the actual needs of security governance and offensive and defensive confrontation. The security team summarizes and combs the daily security operation and attack defense confrontation work, abstracts them into security scenarios and high-level threats, and forms specific perception objects. Through the systematic analysis method, establish the corresponding analysis model, based on big data technology, analyze the formation trend of the original log. Finally, situational awareness needs to be automated to improve efficiency. This paper discusses the above problems in order to form the working idea of situation awareness construction.

1、 Situation awareness comes from the actual needs of security operation

Situation awareness platform is the technical support of security operation. And security operation, from a macro perspective, is subordinate to the security strategy of the enterprise. It is an important part of security governance. It goes hand in hand with security development and security delivery, covering the information system life cycle security management. Specifically speaking, the immediate significance of security work is to fight against attacks and threats. When attacking more and more high-end, situation awareness platform is needed to provide systematic solutions. Therefore, to think about situation awareness, we should start with attack and threat. That's why security teams need situational awareness.

(1) Holistic attack

Lockheed Martin put forward the "attack chain model" in 2011, which divides the cyberspace attack into seven steps, including reconnaissance, production of attack tools, delivery of tools, release of codes, successful installation and control, and active outreach& Control), remote control and diffusion (actions on objectives).

(2) Single point penetration

At present, a relatively mature penetration attack mode has been formed for specific application systems, which is mainly divided into the following stages: information collection, vulnerability attack, vulnerability utilization (vulnerability utilization, brute force cracking, arbitrary file download, drag database, etc.), Authority maintenance (webshell, rebound shell, account creation, etc.), authority promotion, attack implementation and trace elimination.

Under the framework of "attack chain model", attackers constantly repeat the infiltration process and jump sideways, and approach the core network and system from the outside to the inside, so as to control the whole intranet and make the target of attack denial of service, system controlled and sensitive information leaked.

In the actual environment, the whole attack process is long and complex. In order to facilitate observation and understanding, it is necessary to abstract the attack process as the target of situation awareness

2、 Security situation is the concrete representation of security scene

Combined with the attack process, we summarize the goal of situation awareness into two processes: event reduction and information increase, and present them through multiple security perspectives. This is what the security team needs to perceive.

(1) Large scale elimination of false positives and event classification

In the process of massive attacks, from the perspective of monitoring, there will be a large number of events and alarms. The number of these events and alarms far exceeds the scope that human can handle. Therefore, the first thing to achieve situation awareness is to reduce the number of events. Generally, there are no more than 10 events to be processed. The most effective is to reduce false alarms and unify the alarm classification standard. For the highest level alarm event, it must be confirmed that there are phenomena such as denial of service, system control, data leakage, etc. or substantial attacks that cause the failure. It is effectively distinguished from a large number of "pseudo high risk" events such as aimless scanning (using high-risk vulnerability attack script but not affected by the vulnerability), low-risk outreach (such as rogue promotion), etc.

In the process of safe operation, monitoring is only the first step, followed by a series of processes such as disposal, traceability, repair, optimization, etc., which inevitably need a lot of human intervention. In order to avoid human being becoming the bottleneck in the whole process, it is necessary to prioritize the events to be handled, and form an hour level, day level, week level and month level processing echelon, so that important events can be solved in a timely manner, while taking into account the overall situation.

Hour level: real time lost event, deteriorating security event.

Day level: closed-loop disposal of stock high-risk events.

Weekly level: periodic tasks such as asset management, vulnerability scanning, baseline verification, security update, etc., analysis of medium and low risks and potential risks, and tracking of important external security events.

Monthly level: overall attack and defense drill, safe operation drill and effectiveness test, event review.

It can be seen that the best way to reflect the value of situational awareness and effectively reduce the mttd and MTTR is to deal with hourly events. As a sign of substantial influence in the process of invasion, the lost event is easy to find accurately. Therefore, the hour level events are mainly lost events. After experience accumulation, the safety scenario knowledge base can be formed:

(2) Building high-level threat awareness

High order threat discovery is a process of increasing information. This does not contradict the reduction of events. Event reduction is mainly to remove a large number of false alarms and alarms without substantial impact, and to integrate and refine events. In this process, the amount of information has not increased. The high-order threat detection is to further process the monitoring data, and carry out the causal relationship and correlation, so as to mine new events.

1. Locate the intrusion phase and restore the attack process

Locate the stage of monomer event in penetration and attack chain, and combine with traceability analysis to further form a complete evidence chain and restore the attack process. Through the reduction attack process, it is conducive to comprehensively assess the loss and security risk, and develop a perfect solution.

Location intrusion phase:

In the process of restoring attack, we should pay special attention to the strict causal relationship between the evidence before and after, otherwise, because the monitoring information is often scattered and redundant, it is likely to form a wrong inference logic. For example, we found that the server is under the control of Botnet, and there is "mining" behavior. We further trace the process associated with the Trojan entity. Can we confirm that the Trojan is a remote control Trojan? Not always. In fact, after clearing the process and Trojan entities, the "mining" behavior still exists, and Trojan entities are revived. Therefore, the Trojan entity can only explain the "mining" behavior, but can not explain the remote control phenomenon. In other words, the existence of the Trojan entity is a necessary condition for the whole "mining" behavior, but it is not sufficient, and other evidences need to be added to explain the remote control phenomenon.

In general, the sufficient conditions for remote control may include command execution after vulnerability attack, command execution after password explosion, or the existence of other remote control Trojan entities and remote control addresses. After further tracing, we analyze the 0 day vulnerability attack and download the command of "mining" Trojan from the traffic log. At this point, the execution of vulnerability attack Command + Trojan entity of "mining" can fully explain the whole "mining" event, which is a necessary and sufficient condition for each other. In essence, the necessary and insufficient information is insufficient evidence, insufficient argument, common false alarm, or loophole scanning but not loophole utilization, etc.

Accordingly, sufficient and unnecessary cases are essentially redundant evidences and interference. For example, it is found that the user's mailbox has been embezzled, and it can be traced back to account password brute force cracking and user terminal infection Trojan horse. In theory, both of these situations can lead to mailbox theft. But in practice, it is obvious that the attack cost of the former is much lower. Generally speaking, attackers always take the shortest path to attack. Therefore, it is easier to focus on checking account password brute force cracking. In fact, the follow-up investigation results of this event also support this point.

To sum up, in the process of connecting evidence chains, we should pay attention to the formation of sufficient and necessary relationship between the front and back evidences, otherwise, we should consider to supplement necessary evidences or remove redundant evidences.

2. Deep mining abnormal behavior

For the high-level threats that have occurred outside but are still unknown in the local area, through the introduction of multi-source external threat intelligence and the organic combination of its own security technology system, it has the ability of detection and protection. At present, the application is based on domain name, IP address, sample characteristics and so on. It should be noted that the accuracy of threat intelligence is not the same as the accuracy of determining the lost intelligence. For example, a terminal may resolve a malicious domain name, but it does not necessarily establish a connection, which may be due to access control policies or other unknown reasons; even if a connection is established, it does not necessarily transmit remote control instructions, but also depends on the traffic analysis; even if the traffic layer is judged to be lost, its sample characteristics may not be hit. As mentioned above, the monitoring information is often scattered and redundant, and threat intelligence is judged based on monitoring, so the introduction of threat intelligence is not only to introduce some high-precision detection rules, but also needs a complete analysis and traceability system.

For unknown threats that have not yet been disclosed, through modeling and analysis, based on big data technology, abnormal behaviors are detected from massive logs, with the potential threat detection ability. Unknown threats are generally based on 0 day vulnerabilities or special Trojans. To deal with this kind of high-level threat, we should consider two aspects. One is "dimension reduction defense". First of all, we should be prepared for collapse. Such attacks have technical advantages and high concealment, and cannot be detected by known rules. In this way, we can transfer the perspective from attack characteristics to behavior characteristics. The second is "disturbing thinking". At the specific operation level, the preset traps and lures are helpful to find anomalies.

3. Forecast risk situation

Through modeling analysis, based on big data technology, the future network security situation is predicted. On the one hand, forecasting the threat trend and evaluating the response ability and loss are conducive to the deployment of safe operation resources and better control of potential risks. On the other hand, forecasting the development trend of security incidents is conducive to the formulation of accurate solutions, better control of changes in the situation and safety investment costs.

(3) Security perspective based on operation business division

Security perspective is the classification of security operation. Security operation serves information system and business operation, which is divided into subject, object and pipeline. The main body is the user and terminal, the pipeline is the network environment and access control, and the object is the application system. The security system of application system is huge and involves a wide range, which is generally divided into system, application and data according to the technology stack.

Similar security work has the same characteristics in security technology, such as terminal security focuses on patch and antivirus, while application security focuses on vulnerability attack and Trojan detection. Scientific division of security perspective is conducive to the development of unified log format and structured data processing, and better analysis and display.

3、 Data processing and analysis is the core capability of situation awareness

In the enterprise security operation architecture based on general technology, we put forward the security operation technology architecture, which collects data based on the security protection system and establishes data analysis capability based on the basic technology platform. But from the data source, data processing ability to the target effect mentioned above, data processing is still needed. This is the action path for the security team to establish situational awareness.

(1) Establish log processing life cycle

There are many kinds of collected logs, large amount of data and distributed collection. Generally, preprocessing is needed for further analysis, and a log processing life cycle covering collection, preprocessing, transmission, storage, cleaning and use is established to form a high-quality and stable data source.

1. A wide variety

Logs are generally divided into intelligence, traceability and Forensics (DPI, EDR), alarm and monitoring categories. Each category of logs is collected by a variety of technical means or equipment, which needs to be preprocessed. Through the design of filtering, splitting, merging, replacing and other basic processing methods, as well as the combination and selection of the above methods, structured data is formed after verification.

2. Large amount of data

1) Incremental control

Through preprocessing, we need to remove redundant data information, only keep the core fields with analysis value, effectively reduce the volume of data, reduce the pressure of link bandwidth in the process of data transmission.

2) Stock control

Regularly clean up the stock data to avoid the pressure on storage caused by continuous input of incremental data.

3. Distributed acquisition

For large-scale enterprises with multi-level subsidiaries and dispatched offices, the local organization generally does not have the ability to process big data, so it is necessary to upload logs to the data center for unified modeling and analysis. Even in large-scale data centers, due to the limitations of physical environment and network conditions, there is also a situation of distributed collection. The pre module of log processing needs to be deployed in the collection area to realize the functions of collection, pre-processing and transmission, so as to meet the demand of timely and efficient upload of structured data under the condition of limited local computing resources and transmission bandwidth resources.

(2) Building analysis model based on machine learning

1. Overview

The core of machine learning is "using algorithms to parse data, learn from it, and then make decisions or predictions about something in the world". Specifically, it can be divided into supervised learning, unsupervised learning and reinforcement learning. Among them, supervised learning needs to use labels to mark samples, mainly including classification and regression; unsupervised learning does not use labels for data, mainly including clustering and dimensionality reduction. Each specific type has a variety of different algorithms.

In addition, deep learning and convolution neural network are gradually used in the modeling and analysis of security situation awareness.

2. Establish analysis model based on security scenario

In combination with the objective effect mentioned above, the modeling method adopted for specific requirements can be preliminarily planned:

1) Event classification

By classification, events are marked as different risk levels.

2) Invasion stage

Through classification, events are marked as different intrusion stages; through clustering, correlation is reduced to a complete intrusion process.

3) Abnormal behavior

Through clustering, according to the attributes and behavior objects, abnormal behaviors are analyzed.

4) Situation prediction

Through regression, the data of previous markers are used to predict the future situation.

The goal of the analysis is to realize security scenario analysis and high-level threat perception, while the model selection and modeling process are only means. In theory, as long as the goal can be achieved, the choice of specific means can not be limited to the category of machine learning.

(3) Effective control based on control object

Different from the safety perspective, the control object is mainly the control elements and "grasp" of the safety operation. Through the management and operation of the control object, the event closed-loop processing is completed.

1. Assets

Asset management is an important foundation of security governance, which is of great significance for event location and unified protection. Generally, asset information collection can be carried out in the following three aspects:

1) Basic account

Based on the life cycle of asset management, from procurement to online, maintenance and scrap; based on the construction of network environment, from network topology to address allocation, address conversion and access control; based on the technical architecture of information system, from basic computing environment to operating system, middleware and application technology selection.

2) Active acquisition

Collect asset information through full address scanning and host agent.

3) Passive acquisition

Passive perception of surviving assets through traffic analysis and access control.

2, events

Events are the main content of safe operation. The data is analyzed to form events, summarized to a unified processing platform for response, disposal and knowledge base establishment.

3, strategy

Strategy is the main control means of event handling and security protection. Manage the policies in a unified way, from establishment, audit, distribution, modification, and abolition, and verify the effectiveness of the policies.

4. Intelligence

Collect external multi-source intelligence, establish Threat Intelligence base and distribute it to monitoring and protection equipment.

4、 Automatic security operation is the advanced goal of situation awareness

In order to speed up the above analysis, quickly distribute protection and response strategies, save labor costs, and effectively improve the efficiency and scale of safe operation, it is necessary to automate the operation. At present, there are mainly soar model (security orchestration, automatic response) and OODA model (observe, Orient, decide, act, observe, adjust, decide and act). In terms of specific landing, the former is mainly based on the phantom platform of Splunk, and the latter is mainly based on the IACD framework (integrated adaptive network defense).

In the process of implementing IACD, playbooks should be made first. Based on the above scenario analysis and the accumulation of event handling knowledge in practice, an event analysis and handling process can be formed:

Furthermore, the attack and response processes are analyzed and designed to form workflows, and local instances are formed in combination with the local actual environment for pilot and deployment.

5、 Expand enterprise network security technology architecture

Based on the above requirements, the enterprise network security technology architecture is expanded, and the focus of security construction is shifted from tool set accumulation to operation platform and infrastructure construction, so as to realize the standardization and pooling of security resources.

(1) Content

This layer is actually the combing and integration of existing security means. All kinds of security systems, whether they are "boxes" of commercial procurement or software based on open source or self-developed, have clear tool attributes: first, they are deployed on the "front line" to play the most direct security protection effect; second, they are independent with clear functional characteristics; third, they are replaceable, and similar products can be "plug and play" in principle; fourth, they should be under unified management Li.

These security measures are integrated to form a basic defense in depth system. We describe it as "family barrel set meal" or "big dish plan". Safety measures are embedded in all stages of information construction to form standardized protection measures. In other words, as long as a new endpoint is online, a full set of agent-based endpoint protection means will be installed by default; as long as there are links such as boundary, business flow or data flow, they will be pulled to the flow cleaning resource pool for unified protection and monitoring. In addition, for online services, services, systems and equipment, access to various gateways to achieve unified access control; through active detection to achieve vulnerability detection and asset management, the introduction of threat intelligence.

Based on this, the standardized business of safety has been formed, which can effectively avoid the uneven protection, the difficulty of updating safety products, and is conducive to the horizontal expansion and large-scale export of safety capabilities.

(2) Operation platform

The construction of operation platform mainly realizes the management and scheduling of security resources, as well as the analysis and presentation to connect the operation work. Through data analysis technology, we can achieve the ability of security monitoring, response and early warning, resist external threats, and ensure the safe and stable operation of business.

1. Interface drive

It provides a unified interface for external security resources, mainly including the call and release of resources such as endpoint agent distribution, traffic protection and monitoring access, gateway access, active detection and scanning, so that the business can automatically use pooled security resources.

2. Cluster management

Manage all security resources in a unified way, distribute and update policies, collect logs and conduct real-time processing.

3. Service Bus

For the collected massive logs, message queue and cache are established, and real-time analysis, processing, storage and retrieval are carried out based on big data technology.

4. Unified presentation

Accumulate safety analysis cases, form safety scenarios and analysis rules, and display data processing results in a unified way. According to the security operation structure, establish response and disposal process to control security risks.

(3) Underlying architecture

1. Safety technology

Although a large-scale safety technology framework has been built to improve the safety protection ability, attack and defense technology is still an important basis and starting point for analysis and protection, and risk control is the main line and goal throughout the whole work.

2. High availability

The security technology architecture should meet the high availability requirements. On the one hand, the performance requirements determine that the management platform needs to be deployed in clusters, and a stable and reliable computing infrastructure is required. On the other hand, the goal of security services determines that security resources need to be physically close to the deployment of business systems. In addition to the multi center deployment adopted by the data center itself for high availability, even within the same data center, security resources need to be deployed in multiple physical locations. Therefore, the security technology architecture needs to consider the distributed deployment technology.

3. High concurrency

For massive traffic and logs, a single security tool cannot meet the performance requirements, so it needs to be deployed in a cluster. Through the technology of load balancing, the business pressure scheduling of security resources is realized.

4. Big data

For massive logs, it has the ability of real-time processing, fast retrieval and storage.

(4) Deployment example and business process

1. LVS / nginx is used as the cluster deployment tool for load balancing.

2. Kubernetes is used to provide the basic computing environment for the security platform.

3. Zookeeper is used for unified management of security tool policies.

4. Filebeat is used to collect and transmit logs.

5. Kafka is used as the message queue receiving log.

6. Flink is used to process the logs in real time.

7. Hive is used as big data storage.

8. Logstash is used to receive the real-time processing results of logs.

9. Elastic search is used to store real-time processing results and provide full-text retrieval.

10. Kibana is used to show the data in elastic search.

11. Redis is used as the cache database and MySQL as the main storage.

12. JIRA is used for work order management, follow-up processing of log analysis results, docking with zookeeper, strategy adjustment and distribution.

13. This paper mainly discusses the transformation of enterprise security technology architecture, so the specific selection of tool level is not carried out.

6、 Data processing flow

The security event processing shall form a closed loop according to the process of early warning, protection, detection, response, recovery, counterattack (WPDRRC). Through data processing process, landing safety event life cycle.

(1) Basic data classification

The data collected at the tool level can be classified into the following categories:

Intelligence: mainly external threat intelligence collected through various channels.

Traceability and Forensics: a large number of logs for traceability and forensics generated by link traffic DPI (deep packet analysis) tool and EDR (endpoint detection response) tool, as well as infrastructure operation logs.

Warning category: threat warning detected by various security tools.

Monitoring: monitoring and scanning of various security tool business systems.

(2) Data processing module

The processing level mainly includes the following modules:

Threat Intelligence base: select intelligence data with high reliability and applicability to establish Threat Intelligence base and form early warning information for comparative analysis.

Association analysis engine: Based on alarm data and monitoring data, judge the accuracy, severity and urgency of threat alarm for response. Among them, threat warning with high accuracy is used as internal intelligence data and input into Threat Intelligence base.

Traceability and forensics module: compare the information of threat intelligence base with traceability and forensics data, judge the situation of loss, determine the scope and severity of impact for response.

Response disposal platform: for situations requiring disposal, work orders are formed according to priority for adjustment of tool level strategies.

(3) Data flow process

1. Intelligence data is input into Threat Intelligence base.

2. Traceability forensics data input log receiving module, further input into the full-text search engine.

3. The new intelligence data is compared with the data of traceability and forensics, and the detection results are input into the correlation analysis engine.

4. The new traceability forensics data is compared with the stock threat intelligence database, and the detection results are input into the correlation analysis engine.

5. Alarm type and monitoring type data input correlation analysis engine to analyze substantive high-risk events.

6. The correlation analysis engine comprehensively evaluates the accuracy, severity and urgency of the event, forms the priority, inputs the response and disposal platform, executes the plan to prevent the situation from deteriorating, and inputs the traceability and forensics module.

7. Through the full-text search engine, the traceability forensics module analyzes the event in detail, judges the scope of influence and the development of the situation; develops and improves the solution and inputs the response and disposal platform to eliminate the risk; forms the internal threat information input threat information platform.

8. The response and disposal platform forms alarm log and security tool strategy, which are distributed and executed through work order system.

To sum up, the essence and internal connection of situation awareness construction are considered, and a relatively complete logic and path are formed as the reference for the project design stage. As the slogan of IBM security at rsa2019: "we don't need more tools. We need new rules."

reference material:

Enterprise security operation architecture based on general technology

On the tactical confrontation in network attack and defense

IACD integrated adaptive network defense framework

Key points and code analysis of ten deep learning algorithms

Realization of structured conversion of log based on visual configuration

It's probably the easiest way to get started with machine learning

For a clear and large picture of this article, please pay attention to "cactus intelligence station" via wechat, click the "release" menu to view the download method.

About the author: Dong Yicheng, senior network security engineer, works in the information security department of the financial information center of the people's Bank of China, CISP, winner of the bank science and Technology Development Award. Responsible for the construction of Internet security protection system and security operation, focusing on penetration testing, web security, PKI / Ca fields.

More good articles from cactus intelligence station:

Osquery learning notes: a theoretical study

Ouyangxin: practice and thinking of terminal safety operation

Lu Yi: building flexible information security defense system from the perspective of attack

Long press identification QR code to get more original articles

Welcome to reprint and contact cactus intelligence station

Welcome to praise. A kind of