We build a target system and deploy an open-source WAF system: naxsi based on the system structure introduced before. Through this simple example, we can see how to get the log data from a typical website server application, how to push it to kafaka, write it to Clickhouse, and what tools we use to read the data in Clickhouse.
0 * 01 summary
The previous candy lab introduced the system server and network deployment structure of the general website, and also introduced the flow of data on the big data platform. We have a panoramic view of the production environment as a whole. Next, we will introduce the landing method of log processing. With a specific example, we will explain how the log of the website is collected into the big data system and how to exchange data between the big data systems.
We build a target system and deploy an open-source WAF system: naxsi based on the system structure introduced before. Through this simple example, we can see how to get the log data from a typical website server application, how to push it to kafaka, write it to Clickhouse, and what tools we use to read the data in Clickhouse.
Naxsi is a soft WAF, actually a C extension of nginx, which is different from other open-source WAFS based on the NGA system Lua. Naxsi creates new keywords for nginx to be used by security personnel to build security policies. The following figure is the structure chart of the general website system we introduced earlier.
0 × 02 install openresty
Naxsi is an extension module of nginx. In fact, the effect achieved by using openresty is equivalent. Besides, we can use openresty related ecological tools. The open source WAF introduced by candy lab before is a system independently developed by Chunge team, not based on pure Lua, but a new WAF system based on small language. The candy lab has also published relevant articles before. If you are interested, you can read the articles written before.
https://openresty.org/download/openresty-1.xx.x.x.tar.gz
yum install -y gcc gcc-c++ readline-devel pcre-devel openssl-devel tcl perl
tar -zxvf openresty-1.xx.x.x.tar.gz
cd /openresty-1.xx.x.x
./configure
Make
Make install
0 × 03 deploy naxsi (WAF)
Naxsi exists as a module of nginx. The same effect can be achieved by compiling and installing naxsi based on openresty through experiments. During installation, it is necessary to install relevant dependencies. In order to reduce the trouble, it is most convenient to run bootstrap directly to install and reduce the process of manually installing dependent software.
In fact, there is a more famous WAF in the open source community, which is mode security, mod The security community supports the security rules very well. We choose to experiment with naxsi or consider that it and nginx exist as a whole. Moreover, we can modify and customize the C module, and add Lua module on some basis. Starting with Lua and C, we have enriched our development tool chain.
wget https://github.com/nbs-system/naxsi/archive/0.56rc1.tar.gz
tar -zxvf naxsi-0.56rc1.tar.gz
CD 0.56rc1
./configure –user=www –group=www –prefix=/usr/local/openresty –with-luajit –with-http_stub_status_module –with-http_ssl_module –with-http_sub_module –with-http_realip_module –add-module=/usr/home/candylab/naxsi-0.56rc1/naxsi_src
gmakegmake install
Configuration WAF of 0 × 04 naxsi
The factor naxsi is written by C. in the configuration file of nginx, there is the keyword of naxsi. If the hit policy returns 412, if not, the request is pushed upstream. In fact, naxsi supports variable accumulation. Naxsi has a core policy code: https://github.com/nbs-system/naxsi/blob/master/naxsi_config / naxsi_core.rules
In addition to WAF itself, naxsi also has other tool chain support, because this time it mainly introduces log push and does not expand other parts of the description.
Http {
include mime.types;
default_type application/octet-stream;
MainRule id:1001 “str:liwq” “msg: test candylab” “mz:$HEADERS_VAR:Cookie” “s:$SQL:2″;
MainRule id:1002 “str:0x” “msg:0x, possible hex encoding” “mz:BODY|URL|ARGS|$HEADERS_VAR:Cookie” “s:$SQL:2″;
server {
listen 8080;
server_name localhost;
location / {
SecRulesEnabled;
DeniedUrl “/RequestDenied”;
CheckRule “$SQL >= 4″ BLOCK;
proxy_pass http://www.candylab.net/;
}
location /RequestDenied {
return 412;
}
}
}
Through the curl trigger policy, a request is made:
curl -v ’127.0.0.1:8080/’ -H’cookie: candylab’ -H’cookie: 0x’
0 × 05 nginx output JSON log
Generally, the format of log files generated by nginx is text. Later, nginx added the support function of generating JSON, so we directly pushed the log generation JSON of nginx to kakka, and then Kafka consumer program wrote the log data to Clickhouse.
First, define the log format:
log_format accessjson escape=json ‘{“source”:”192.168.0.6″, “ip”:”$remote_addr”,”user”:”$remote_user”,”time_local”:”$time_local”,”statuscode”:$status,”bytes_sent”:$bytes_sent,”http_referer”:”$http_referer”,”http_user_agent”:”$http_user_agent”,”request_uri”:”$request_uri”, ”request_time”:$request_time,”gzip_ration”:”$gzip_ratio”,”query_string”:”$query_string”}’;
Second, the reference format:
access_log ./logs/access-json.log accessjson;
Some complete configuration examples.
Http {
include mime.types;
default_type application/octet-stream;
log_format accessjson escape=json ‘{“source”:”192.168.0.6″,”ip”:”$remote_addr”,”user”:”$remote_user”,”time_local”:”$time_local”,”statuscode”:$status,”bytes_sent”:$bytes_sent,”http_referer”:”$http_referer”,”http_user_agent”:”$http_user_agent”, ”request_uri”:”$request_uri”,”request_time”:$request_time,”gzip_ration”:”$gzip_ratio”,”query_string”:”$query_string”}’;
MainRule id:1001 “str:liwq” “msg: test candylab” “mz:$HEADERS_VAR:Cookie” “s:$SQL:2″;
MainRule id:1002 “str:0x” “msg:0x, possible hex encoding” “mz:BODY|URL|ARGS|$HEADERS_VAR:Cookie” “s:$SQL:2″;
server {
listen 8080;
server_name localhost;
access_log ./logs/access-json.log accessjson;
location / {
SecRulesEnabled;
DeniedUrl “/RequestDenied”;
CheckRule “$SQL >= 4″ BLOCK;
proxy_pass http://www.candylab.net/;
}
location /RequestDenied {
return 412;
}
}
}
After that, when openresty generates logs, it will generate log files in JSON format. One thing to note here is that the keyword "escape = JSON" must be added. Only by adding this sentence can some special log characters be escaped in the process of log generation.
When kakfa is not used to receive data, the previous scheme is to let openresty directly push the log to syslog server, and then the listener writes it to es. This kind of trial log collection is also better set:
access_log syslog:server=192.168.0.6:10001;
This setting pushes the nginx logs to the sysog server.
0 × 06 log push Kafka
Previously, we wrote the same collection process for once, as shown in the figure above. Openresty itself has Kafka components, which push data to Kafka. Kafkacat is an independent tool, which can directly transfer the text file general command to kafkacat, and kafkacat to kafaka.
Install kafkacat and download the tool directly to GitHub.
wget https://github.com/edenhill/kafkacat/archive/1.3.1.tar.gz
CD kafkacat
./configure
Make
sudo make install
Name bootstrap to solve the problem of package dependency.
./bootstrap.sh
sudo make install
The key part is how to push the local JSON text file to Kafka, as follows.
tail -F -q access-json.log | kafkacat -b 1.kafka1.candylab.net:9091,2.kafka1.candylab.net:9091,3.kafka1.candylab.net:9091,4.kafka1.candylab.net:9091 -t candylab_topic
Kafkacat is easy to use and efficient.
Installation of 0 × 07 Clickhouse client
As for how Clickhouse collects logs, the previous introduction is shown in the figure above. When the log is pushed to Clickhouse by Kafka consumer program, we can query SQL through the Clickhouse client, just like using MySQL program. In this experiment, Clickhouse client uses docker to exist and run.
docker run -it –rm yandex/clickhouse-client -h candylab.net –port 9006 -m -u candylab –password nicaishisha -d candylab
After executing the above code, docker will download the Clickhouse client image and run the program. When a new prompt appears, we can use SQL to query.
0 × 08 summary
In the past, the candy lab published the whole structure of the system, and this article focuses on how the system can run in the real world through various programs, According to the key figures in the previous articles, we have given specific implementation plans. These tools are basically open-source and free. As long as they are introduced according to the article, the implementation is not difficult. The candy lab will continue to introduce how to use this set of open-source low-cost system to solve the actual problem of log analysis.