For security management, not seeing means not being able to manage. In the severe network security situation, the tide of network security situation awareness construction is coming as scheduled. Compared with the general security technology work, this is a complex project with large resource investment, wide coverage and high technical difficulty, so that people often see the cool display, numerous models and massive data and so on, but have no way to start the specific construction work. Drucker said that doing the right thing is more important than doing it right. In order to avoid a large number of human and material resources "floating", to ensure the effective role of situation awareness, and to improve the efficiency of safety work, we need to abandon the traditional work mode of simply stacking tools, and construct from the bottom.
Situation awareness construction is not simply the accumulation and display of data, but comes from the actual needs of security governance and offensive and defensive confrontation. The security team summarizes and combs the daily security operation and attack defense confrontation work, abstracts them into security scenarios and high-level threats, and forms specific perception objects. Through the systematic analysis method, establish the corresponding analysis model, based on big data technology, analyze the formation trend of the original log. Finally, situational awareness needs to be automated to improve efficiency. This paper discusses the above problems in order to form the working idea of situation awareness construction.
1、 Situation awareness comes from the actual needs of security operation
Situation awareness platform is the technical support of security operation. And security operation, from a macro perspective, is subordinate to the security strategy of the enterprise. It is an important part of security governance. It goes hand in hand with security development and security delivery, covering the information system life cycle security management. Specifically speaking, the immediate significance of security work is to fight against attacks and threats. When attacking more and more high-end, situation awareness platform is needed to provide systematic solutions. Therefore, to think about situation awareness, we should start with attack and threat. That's why security teams need situational awareness.
(1) Holistic attack
Lockheed Martin put forward the "attack chain model" in 2011, which divides the cyberspace attack into seven steps, including reconnaissance, production of attack tools, delivery of tools, release of codes, successful installation and control, and active outreach& Control), remote control and diffusion (actions on objectives).
(2) Single point penetration
At present, a relatively mature penetration attack mode has been formed for specific application systems, which is mainly divided into the following stages: information collection, vulnerability attack, vulnerability utilization (vulnerability utilization, brute force cracking, arbitrary file download, drag database, etc.), Authority maintenance (webshell, rebound shell, account creation, etc.), authority promotion, attack implementation and trace elimination.
Under the framework of "attack chain model", attackers constantly repeat the infiltration process and jump sideways, and approach the core network and system from the outside to the inside, so as to control the whole intranet and make the target of attack denial of service, system controlled and sensitive information leaked.
In the actual environment, the whole attack process is long and complex. In order to facilitate observation and understanding, it is necessary to abstract the attack process as the target of situation awareness
2、 Security situation is the concrete representation of security scene
Combined with the attack process, we summarize the goal of situation awareness into two processes: event reduction and information increase, and present them through multiple security perspectives. This is what the security team needs to perceive.
(1) Large scale elimination of false positives and event classification
In the process of massive attacks, from the perspective of monitoring, there will be a large number of events and alarms. The number of these events and alarms far exceeds the scope that human can handle. Therefore, the first thing to achieve situation awareness is to reduce the number of events. Generally, there are no more than 10 events to be processed. The most effective is to reduce false alarms and unify the alarm classification standard. For the highest level alarm event, it must be confirmed that there are phenomena such as denial of service, system control, data leakage, etc. or substantial attacks that cause the failure. It is effectively distinguished from a large number of "pseudo high risk" events such as aimless scanning (using high-risk vulnerability attack script but not affected by the vulnerability), low-risk outreach (such as rogue promotion), etc.
In the process of safe operation, monitoring is only the first step, followed by a series of processes such as disposal, traceability, repair, optimization, etc., which inevitably need a lot of human intervention. In order to avoid human being becoming the bottleneck in the whole process, it is necessary to prioritize the events to be handled, and form an hour level, day level, week level and month level processing echelon, so that important events can be solved in a timely manner, while taking into account the overall situation.
Hour level: real time lost event, deteriorating security event.
Day level: closed-loop disposal of stock high-risk events.
Weekly level: periodic tasks such as asset management, vulnerability scanning, baseline verification, security update, etc., analysis of medium and low risks and potential risks, and tracking of important external security events.
Monthly level: overall attack and defense drill, safe operation drill and effectiveness test, event review.
It can be seen that the best way to reflect the value of situational awareness and effectively reduce the mttd and MTTR is to deal with hourly events. As a sign of substantial influence in the process of invasion, the lost event is easy to find accurately. Therefore, the hour level events are mainly lost events. After experience accumulation, the safety scenario knowledge base can be formed:
(2) Building high-level threat awareness
High order threat discovery is a process of increasing information. This does not contradict the reduction of events. Event reduction is mainly to remove a large number of false alarms and alarms without substantial impact, and to integrate and refine events. In this process, the amount of information has not increased. The high-order threat detection is to further process the monitoring data, and carry out the causal relationship and correlation, so as to mine new events.
1. Locate the intrusion phase and restore the attack process
Locate the stage of monomer event in penetration and attack chain, and combine with traceability analysis to further form a complete evidence chain and restore the attack process. Through the reduction attack process, it is conducive to comprehensively assess the loss and security risk, and develop a perfect solution.
Location intrusion phase:
In the process of restoring attack, we should pay special attention to the strict causal relationship between the evidence before and after, otherwise, because the monitoring information is often scattered and redundant, it is likely to form a wrong inference logic. For example, we found that the server is under the control of Botnet, and there is "mining" behavior. We further trace the process associated with the Trojan entity. Can we confirm that the Trojan is a remote control Trojan? Not always. In fact, after clearing the process and Trojan entities, the "mining" behavior still exists, and Trojan entities are revived. Therefore, the Trojan entity can only explain the "mining" behavior, but can not explain the remote control phenomenon. In other words, the existence of the Trojan entity is a necessary condition for the whole "mining" behavior, but it is not sufficient, and other evidences need to be added to explain the remote control phenomenon.
In general, the sufficient conditions for remote control may include command execution after vulnerability attack, command execution after password explosion, or the existence of other remote control Trojan entities and remote control addresses. After further tracing, we analyze the 0 day vulnerability attack and download the command of "mining" Trojan from the traffic log. At this point, the execution of vulnerability attack Command + Trojan entity of "mining" can fully explain the whole "mining" event, which is a necessary and sufficient condition for each other. In essence, the necessary and insufficient information is insufficient evidence, insufficient argument, common false alarm, or loophole scanning but not loophole utilization, etc.
Accordingly, sufficient and unnecessary cases are essentially redundant evidences and interference. For example, it is found that the user's mailbox has been embezzled, and it can be traced back to account password brute force cracking and user terminal infection Trojan horse. In theory, both of these situations can lead to mailbox theft. But in practice, it is obvious that the attack cost of the former is much lower. Generally speaking, attackers always take the shortest path to attack. Therefore, it is easier to focus on checking account password brute force cracking. In fact, the follow-up investigation results of this event also support this point.
To sum up, in the process of connecting evidence chains, we should pay attention to the formation of sufficient and necessary relationship between the front and back evidences, otherwise, we should consider to supplement necessary evidences or remove redundant evidences.
2. Deep mining abnormal behavior
For the high-level threats that have occurred outside but are still unknown in the local area, through the introduction of multi-source external threat intelligence and the organic combination of its own security technology system, it has the ability of detection and protection. At present, the application is based on domain name, IP address, sample characteristics and so on. It should be noted that the accuracy of threat intelligence is not the same as the accuracy of determining the lost intelligence. For example, a terminal may resolve a malicious domain name, but it does not necessarily establish a connection, which may be due to access control policies or other unknown reasons; even if a connection is established, it does not necessarily transmit remote control instructions, but also depends on the traffic analysis; even if the traffic layer is judged to be lost, its sample characteristics may not be hit. As mentioned above, the monitoring information is often scattered and redundant, and threat intelligence is judged based on monitoring, so the introduction of threat intelligence is not only to introduce some high-precision detection rules, but also needs a complete analysis and traceability system.
For unknown threats that have not yet been disclosed, through modeling and analysis, based on big data technology, abnormal behaviors are detected from massive logs, with the potential threat detection ability. Unknown threats are generally based on 0 day vulnerabilities or special Trojans. To deal with this kind of high-level threat, we should consider two aspects. One is "dimension reduction defense". First of all, we should be prepared for collapse. Such attacks have technical advantages and high concealment, and cannot be detected by known rules. In this way, we can transfer the perspective from attack characteristics to behavior characteristics. The second is "disturbing thinking". At the specific operation level, the preset traps and lures are helpful to find anomalies.
3. Forecast risk situation
Through modeling analysis, based on big data technology, the future network security situation is predicted. On the one hand, forecasting the threat trend and evaluating the response ability and loss are conducive to the deployment of safe operation resources and better control of potential risks. On the other hand, forecasting the development trend of security incidents is conducive to the formulation of accurate solutions, better control of changes in the situation and safety investment costs.
(3) Security perspective based on operation business division
Security perspective is the classification of security operation. Security operation serves information system and business operation, which is divided into subject, object and pipeline. The main body is the user and terminal, the pipeline is the network environment and access control, and the object is the application system. The security system of application system is huge and involves a wide range, which is generally divided into system, application and data according to the technology stack.
Similar security work has the same characteristics in security technology, such as terminal security focuses on patch and antivirus, while application security focuses on vulnerability attack and Trojan detection. Scientific division of security perspective is conducive to the development of unified log format and structured data processing, and better analysis and display.
3、 Data processing and analysis is the core capability of situation awareness
In the enterprise security operation architecture based on general technology, we put forward the security operation technology architecture, which collects data based on the security protection system and establishes data analysis capability based on the basic technology platform. But from the data source, data processing ability to the target effect mentioned above, data processing is still needed. This is the action path for the security team to establish situational awareness.
(1) Establish log processing life cycle
There are many kinds of collected logs, large amount of data and distributed collection. Generally, preprocessing is needed for further analysis, and a log processing life cycle covering collection, preprocessing, transmission, storage, cleaning and use is established to form a high-quality and stable data source.
1. A wide variety
Logs are generally divided into intelligence, traceability and Forensics (DPI, EDR), alarm and monitoring categories. Each category of logs is collected by a variety of technical means or equipment, which needs to be preprocessed. Through the design of filtering, splitting, merging, replacing and other basic processing methods, as well as the combination and selection of the above methods, structured data is formed after verification.
2. Large amount of data
1) Incremental control
Through preprocessing, we need to remove redundant data information, only keep the core fields with analysis value, effectively reduce the volume of data, reduce the pressure of link bandwidth in the process of data transmission.
2) Stock control
Regularly clean up the stock data to avoid the pressure on storage caused by continuous input of incremental data.
3. Distributed acquisition
For large-scale enterprises with multi-level subsidiaries and dispatched offices, the local organization generally does not have the ability to process big data, so it is necessary to upload logs to the data center for unified modeling and analysis. Even in large-scale data centers, due to the limitations of physical environment and network conditions, there is also a situation of distributed collection. The pre module of log processing needs to be deployed in the collection area to realize the functions of collection, pre-processing and transmission, so as to meet the demand of timely and efficient upload of structured data under the condition of limited local computing resources and transmission bandwidth resources.
(2) Building analysis model based on machine learning
1. Overview
The core of machine learning is "using algorithms to parse data, learn from it, and then make decisions or predictions about something in the world". Specifically, it can be divided into supervised learning, unsupervised learning and reinforcement learning. Among them, supervised learning needs to use labels to mark samples, mainly including classification and regression; unsupervised learning does not use labels for data, mainly including clustering and dimensionality reduction. Each specific type has a variety of different algorithms.
In addition, deep learning and convolution neural network are gradually used in the modeling and analysis of security situation awareness.
2. Establish analysis model based on security scenario
In combination with the objective effect mentioned above, the modeling method adopted for specific requirements can be preliminarily planned:
1) Event classification
By classification, events are marked as different risk levels.
2) Invasion stage
Through classification, events are marked as different intrusion stages; through clustering, correlation is reduced to a complete intrusion process.
3) Abnormal behavior
Through clustering, according to the attributes and behavior objects, abnormal behaviors are analyzed.
4) Situation prediction
Through regression, the data of previous markers are used to predict the future situation.
The goal of the analysis is to realize security scenario analysis and high-level threat perception, while the model selection and modeling process are only means. In theory, as long as the goal can be achieved, the choice of specific means can not be limited to the category of machine learning.
(3) Effective control based on control object
Different from the safety perspective, the control object is mainly the control elements and "grasp" of the safety operation. Through the management and operation of the control object, the event closed-loop processing is completed.
1. Assets
Asset management is an important foundation of security governance, which is of great significance for event location and unified protection. Generally, asset information collection can be carried out in the following three aspects:
1) Basic account
Based on the life cycle of asset management, from procurement to online, maintenance and scrap; based on the construction of network environment, from network topology to address allocation, address conversion and access control; based on the technical architecture of information system, from basic computing environment to operating system, middleware and application technology selection.
2) Active acquisition
Collect asset information through full address scanning and host agent.
3) Passive acquisition
Passive perception of surviving assets through traffic analysis and access control.
2, events
Events are the main content of safe operation. The data is analyzed to form events, summarized to a unified processing platform for response, disposal and knowledge base establishment.
3, strategy
Strategy is the main control means of event handling and security protection. Manage the policies in a unified way, from establishment, audit, distribution, modification, and abolition, and verify the effectiveness of the policies.
4. Intelligence
Collect external multi-source intelligence, establish Threat Intelligence base and distribute it to monitoring and protection equipment.
4、 Automatic security operation is the advanced goal of situation awareness
In order to speed up the above analysis, quickly distribute protection and response strategies, save labor costs, and effectively improve the efficiency and scale of safe operation, it is necessary to automate the operation. At present, there are mainly soar model (security orchestration, automatic response) and OODA model (observe, Orient, decide, act, observe, adjust, decide and act). In terms of specific landing, the former is mainly based on the phantom platform of Splunk, and the latter is mainly based on the IACD framework (integrated adaptive network defense).
In the process of implementing IACD, playbooks should be made first. Based on the above scenario analysis and the accumulation of event handling knowledge in practice, an event analysis and handling process can be formed:
Furthermore, the attack and response processes are analyzed and designed to form workflows, and local instances are formed in combination with the local actual environment for pilot and deployment.
5、 Expand enterprise network security technology architecture
Based on the above requirements, the enterprise network security technology architecture is expanded, and the focus of security construction is shifted from tool set accumulation to operation platform and infrastructure construction, so as to realize the standardization and pooling of security resources.
(1) Content
This layer is actually the combing and integration of existing security means. All kinds of security systems, whether they are "boxes" of commercial procurement or software based on open source or self-developed, have clear tool attributes: first, they are deployed on the "front line" to play the most direct security protection effect; second, they are independent with clear functional characteristics; third, they are replaceable, and similar products can be "plug and play" in principle; fourth, they should be under unified management Li.
These security measures are integrated to form a basic defense in depth system. We describe it as "family barrel set meal" or "big dish plan". Safety measures are embedded in all stages of information construction to form standardized protection measures. In other words, as long as a new endpoint is online, a full set of agent-based endpoint protection means will be installed by default; as long as there are links such as boundary, business flow or data flow, they will be pulled to the flow cleaning resource pool for unified protection and monitoring. In addition, for online services, services, systems and equipment, access to various gateways to achieve unified access control; through active detection to achieve vulnerability detection and asset management, the introduction of threat intelligence.
Based on this, the standardized business of safety has been formed, which can effectively avoid the uneven protection, the difficulty of updating safety products, and is conducive to the horizontal expansion and large-scale export of safety capabilities.
(2) Operation platform
The construction of operation platform mainly realizes the management and scheduling of security resources, as well as the analysis and presentation to connect the operation work. Through data analysis technology, we can achieve the ability of security monitoring, response and early warning, resist external threats, and ensure the safe and stable operation of business.
1. Interface drive
It provides a unified interface for external security resources, mainly including the call and release of resources such as endpoint agent distribution, traffic protection and monitoring access, gateway access, active detection and scanning, so that the business can automatically use pooled security resources.
2. Cluster management
Manage all security resources in a unified way, distribute and update policies, collect logs and conduct real-time processing.
3. Service Bus
For the collected massive logs, message queue and cache are established, and real-time analysis, processing, storage and retrieval are carried out based on big data technology.
4. Unified presentation
Accumulate safety analysis cases, form safety scenarios and analysis rules, and display data processing results in a unified way. According to the security operation structure, establish response and disposal process to control security risks.
(3) Underlying architecture
1. Safety technology
Although a large-scale safety technology framework has been built to improve the safety protection ability, attack and defense technology is still an important basis and starting point for analysis and protection, and risk control is the main line and goal throughout the whole work.
2. High availability
The security technology architecture should meet the high availability requirements. On the one hand, the performance requirements determine that the management platform needs to be deployed in clusters, and a stable and reliable computing infrastructure is required. On the other hand, the goal of security services determines that security resources need to be physically close to the deployment of business systems. In addition to the multi center deployment adopted by the data center itself for high availability, even within the same data center, security resources need to be deployed in multiple physical locations. Therefore, the security technology architecture needs to consider the distributed deployment technology.
3. High concurrency
For massive traffic and logs, a single security tool cannot meet the performance requirements, so it needs to be deployed in a cluster. Through the technology of load balancing, the business pressure scheduling of security resources is realized.
4. Big data
For massive logs, it has the ability of real-time processing, fast retrieval and storage.
(4) Deployment example and business process
1. LVS / nginx is used as the cluster deployment tool for load balancing.
2. Kubernetes is used to provide the basic computing environment for the security platform.
3. Zookeeper is used for unified management of security tool policies.
4. Filebeat is used to collect and transmit logs.
5. Kafka is used as the message queue receiving log.
6. Flink is used to process the logs in real time.
7. Hive is used as big data storage.
8. Logstash is used to receive the real-time processing results of logs.
9. Elastic search is used to store real-time processing results and provide full-text retrieval.
10. Kibana is used to show the data in elastic search.
11. Redis is used as the cache database and MySQL as the main storage.
12. JIRA is used for work order management, follow-up processing of log analysis results, docking with zookeeper, strategy adjustment and distribution.
13. This paper mainly discusses the transformation of enterprise security technology architecture, so the specific selection of tool level is not carried out.
6、 Data processing flow
The security event processing shall form a closed loop according to the process of early warning, protection, detection, response, recovery, counterattack (WPDRRC). Through data processing process, landing safety event life cycle.
(1) Basic data classification
The data collected at the tool level can be classified into the following categories:
Intelligence: mainly external threat intelligence collected through various channels.
Traceability and Forensics: a large number of logs for traceability and forensics generated by link traffic DPI (deep packet analysis) tool and EDR (endpoint detection response) tool, as well as infrastructure operation logs.
Warning category: threat warning detected by various security tools.
Monitoring: monitoring and scanning of various security tool business systems.
(2) Data processing module
The processing level mainly includes the following modules:
Threat Intelligence base: select intelligence data with high reliability and applicability to establish Threat Intelligence base and form early warning information for comparative analysis.
Association analysis engine: Based on alarm data and monitoring data, judge the accuracy, severity and urgency of threat alarm for response. Among them, threat warning with high accuracy is used as internal intelligence data and input into Threat Intelligence base.
Traceability and forensics module: compare the information of threat intelligence base with traceability and forensics data, judge the situation of loss, determine the scope and severity of impact for response.
Response disposal platform: for situations requiring disposal, work orders are formed according to priority for adjustment of tool level strategies.
(3) Data flow process
1. Intelligence data is input into Threat Intelligence base.
2. Traceability forensics data input log receiving module, further input into the full-text search engine.
3. The new intelligence data is compared with the data of traceability and forensics, and the detection results are input into the correlation analysis engine.
4. The new traceability forensics data is compared with the stock threat intelligence database, and the detection results are input into the correlation analysis engine.
5. Alarm type and monitoring type data input correlation analysis engine to analyze substantive high-risk events.
6. The correlation analysis engine comprehensively evaluates the accuracy, severity and urgency of the event, forms the priority, inputs the response and disposal platform, executes the plan to prevent the situation from deteriorating, and inputs the traceability and forensics module.
7. Through the full-text search engine, the traceability forensics module analyzes the event in detail, judges the scope of influence and the development of the situation; develops and improves the solution and inputs the response and disposal platform to eliminate the risk; forms the internal threat information input threat information platform.
8. The response and disposal platform forms alarm log and security tool strategy, which are distributed and executed through work order system.
To sum up, the essence and internal connection of situation awareness construction are considered, and a relatively complete logic and path are formed as the reference for the project design stage. As the slogan of IBM security at rsa2019: "we don't need more tools. We need new rules."
reference material:
- Enterprise security operation architecture based on general technology
Enterprise security operation architecture based on general technology
- On the tactical confrontation in network attack and defense
On the tactical confrontation in network attack and defense
- IACD integrated adaptive network defense framework
IACD integrated adaptive network defense framework
- Key points and code analysis of ten deep learning algorithms https://www.cnblogs.com/sthu/p/8690723.html
Key points and code analysis of ten deep learning algorithms https://www.cnblogs.com/sthu/p/8690723.html
- Realization of structured log transformation based on visual configuration http://dbaplus.cn/news-134-1860-1.html
Realization of structured conversion of log based on visual configuration
http://dbaplus.cn/news-134-1860-1.html
- It's probably the easiest way to get started with machine learning
It's probably the easiest way to get started with machine learning
For a clear and large picture of this article, please pay attention to "cactus intelligence station" via wechat, click the "release" menu to view the download method.
About the author: Dong Yicheng, senior network security engineer, works in the information security department of the financial information center of the people's Bank of China, CISP, winner of the bank science and Technology Development Award. Responsible for the construction of Internet security protection system and security operation, focusing on penetration testing, web security, PKI / Ca fields.
More good articles from cactus intelligence station:
Osquery learning notes: a theoretical study
Ouyangxin: practice and thinking of terminal safety operation
Lu Yi: building flexible information security defense system from the perspective of attack
Long press identification QR code to get more original articles
Welcome to reprint and contact cactus intelligence station
Welcome to praise. A kind of