This series of stories is a pure fiction, if there are similarities, it is a coincidence
Small B is the security siege lion of Q company. Recently, Q company is not peaceful, and there have been several security attacks. But after the event, little B couldn't find the real reason for the attack, so the leader ordered little B to detect and trace the attack event, or he could go home.
Hearing this, little B was sweating, found the record documents of the recent attacks, carefully analyzed the reasons for not finding the truth, and summarized as follows:
- The system does not log: some systems have no log information available.
- No backup of log information: the log is deleted by the attacker or deleted due to insufficient storage space, and no backup log is available.
- Collection dimension is not detailed: the log format is the default configuration, which is not detailed enough to extract too much effective value information.
- Inaccurate collection information: the information of user identification dimensions such as IP is inaccurate and cannot locate the attacker.
- Record result is not uniform: the same field type is not uniform in different logs, so log association cannot be performed.
After finding out the reasons, little B knew that to solve these problems, a unified log analysis platform must be implemented. With this platform, you won't have a black eye in the next attack.
Small B says purpose
To build a unified log analysis platform, small B needs to report to the leader first and get the support of the leader. The first is to explain to the leaders the purpose of building a unified log analysis platform.
Log analysis is a very basic core technology in the enterprise. It is not only used in the security team, but also in the IT R & D team and business team. The purpose is different:
- From the perspective of security: the security team extracts log analysis mainly for the purpose of discovering unknown security events and analyzing the source of known security events. Another important purpose is regulatory compliance requirements at the national level.
- From the perspective of it research and development: the internal non security technology team does log analysis mainly to find the location problems and analyze the known problems, mainly focusing on system monitoring and APM (APM includes all the monitoring items concerned by the research and development team).
- From the perspective of business: the business team's demand for log analysis focuses more on risk control, operation promotion, user portrait, website portrait, etc.
Log lying in the hard disk has no value. Log analysis technology can realize the value of log information. The higher the value of the log is, the more it reflects the company's technical strength.
Update of log analysis
Log analysis is not a new technology, although it has been changed a lot of new clothes. But for a company with only a small security team, don't pursue the map cannon on a large scale. What you need is the idea or tool to solve the problem, even a command script with only 20 lines. Since ancient times, small B has divided the log analysis experience into four times:
Stone Age: in this era, when we do log analysis, we rely more on Excel and terminal commands (awk, grep, sort, uniq, WC, etc.). Here we recommend you to read the web log security analysis skills and web log security analysis (https://xz.aliyun.com/t/1121), which refers to the case of using commands for security analysis.
Iron Age: in this era, when we do log analysis, we rely more on scripting tools (self writing tools), simple interactive tools (logwatch, logparser), etc.
Industrial era: in this era, when we do log analysis, we have made too much progress compared with the previous two times. Various open-source, free and paid software are available, such as elastic series, Splunk, arcsight and so on.
Future era: in this era, I don't know what I will use when doing log analysis, but at present, machine learning and artificial intelligence should be one of the cores.
机器学习、人工智能
Little B thinks that no matter what era we are in, the past ideas and tools are irreplaceable. Because that's an excellent product left by an era after the survival of the fittest. In essence, the products of the new era are also evolved through a long history. (inner OS: a small number of tools or systems are spicy chicken, so please use them carefully! )
How to realize the unified log analysis platform
>>>>Unified log analysis architecture
Unified log analysis architecture
The implementation of unified log analysis platform is different in different enterprises. It is mainly manifested in the applicability of the platform to the enterprise, the technical ability of the enterprise itself, and the different choices of the technical team for the products. But the core architecture is basically as shown in the following figure:
>>>>Difficulties in unified log analysis
Difficulties in unified log analysis
- Difficulty 1: technical difficulties
- How to collect information in complex network environment?
- How to define unified resolution rules for complex log types?
- How to query TB or even Pb level logs quickly?
- What's the best way to show map cannon?
- Be beaten dizzy by alarm everyday, how to save oneself?
- Difficulty 2: people, mainly reflected in the lack of good communication.
- The promotion is not enough: the boss does not pay attention to or fake attention to;
- Driving resistance is too great: others are bosses, you are engineers
As a security engineer, little B naturally thinks that security is the most important (in fact, this is wrong), but little B will also flexibly mix with it R & D team and business team to promote together, with many people and great power. Although everyone's analysis purposes are different, the unified log analysis platform architecture is what everyone needs.
>>>>Analysis thinking in safety scene
Analysis thinking in safety scene
If you have to classify security scenarios, small B can be divided into known scenarios and unknown scenarios. In known scenarios, our common analysis methods include regular expression based analysis, statistical aggregation based analysis, and association based analysis. In unknown scenarios, we mainly use data mining to mine unknown things from data.
Regular expression analysis
This kind of analysis method is mainly applicable to common attack scenarios with characteristics, such as:
- Specific payload scenarios: SQL injection, XSS, bypass WAF, etc. can be summarized as rule-based analysis;
- Specific keyword scenario: crawler (specific UA, cookie, etc.);
Based on statistics and aggregate analysis
This kind of analysis method is: make statistics and aggregation in different dimensions as much as possible, and mine valuable information according to the results of statistical aggregation. The most common is the analysis scenario: the operation information of a client to a server in a unit time.
Based on association analysis
This kind of analysis method needs to have certain basic data, which can be used to draw inferences from other cases, such as
- Association analysis with external Intelligence: enterprises can purchase Threat Intelligence (micro step, threat Hunter) to associate internal log data to find risks, for example, by purchasing malicious domain name intelligence, and then through the office network export traffic and log analysis to see whether there is access record, if there is, it may represent that someone inside has trojan or virus.
- Association analysis with internal intelligence: Based on the results of the previous analysis technology, association analysis is carried out to find other risks, such as finding a malicious user IP, and then finding other behaviors of the IP in the log analysis system, which may have unexpected gains. Pay attention to the attribute information of IP itself in the dimension of IP.
Data mining analysis
- Exception scenario analysis: analyze single point exception, context exception and collective exception by clustering, classification and other data mining methods.
- Unknown scenario analysis: recognize 0day and bypass skills through machine learning algorithm.
Thinking of log analysis: thinking about various scenarios, making rational use of existing information and available information to make information produce value.
>>>>Optimize log platform
Optimize log platform
The unified log analysis platform is not a simple system. It's finished by making a big screen biubiu. Building a log analysis platform is a long-standing project that integrates technology, communication and operation. 50% of the people died at the beginning, 30% died in the middle, 15% died in the first step of success, and only 5% completed the platform.
There are too many log analysis pits. If the leaders support human and financial resources, they can consider buying a set of products and then having someone to maintain it is the best state! If leaders don't support human and financial resources, forget it! It's also good to make use of operation and maintenance.
According to small B, the work of the unified log analysis platform can be simply divided into two stages:
- Platform implementation: log normalization > log collection > log storage > Log Analysis > log display > alarm implementation. In these parts, small B thinks that log normalization is a very important step, because this step involves the most cooperation with other teams. Try to concentrate the work to be done at one time (although this is impossible).
日志规范化 --> 日志采集 --> 日志存储 --> 日志分析 --> 日志展示 --> 告警实现
- Platform Optimization: this stage is to optimize every step in the platform implementation.
For platform optimization, it can be simply divided into (small B can think of so many):
- Normative optimization
- Log type optimization: system, service, application, business and other logs need to be collected;
- Log field optimization: collect as much and useful information as possible;
- Log format Optimization: from txt to JSON (from JSON to protocol buffer);
- Collection Optimization: from rsyslog to logstash, from logstash to flume, filebeat (mainly to optimize the performance of the client system);
- Transmission Optimization: Message Queuing from scratch, from unreliable transmission (UDP) to reliable transmission (TCP), from no encryption to encryption (SSL);
- Storage optimization: from text storage to database storage, from database storage to distributed file system;
- Analysis and Optimization: from single scene to multi scene, from experience to data analysis skills;
- Display Optimization: from map gun to intuitive safety risk display, from single to rich;
- Alarm optimization: from daily alarm to level alarm, from single alarm mode to multi-level alarm;
- Architecture optimization: from single machine to cluster, from single cluster to distributed cluster;
The key is: continuous operation, capacity precipitation and data precipitation
The above are some key points in the business case of unified log analysis platform reported by little B to the leader.
Maybe it's because it's very difficult to write! The topic of log analysis is difficult to write at the current level of small B, so you will have a look.
☑ next: little B is about to realize the unified log analysis platform!
-HISTORY-
Log Analysis series (external 1): nginx obtains real client IP through proxy