IMCAFS

Home

building an open source siem platform for enterprise security construction (2)

Posted by lipsius at 2020-03-02
all

Preface

Siem (security information and Event Management), as the name implies, is a management system for security information and events. For most enterprises, it is not cheap security system. This paper introduces how to use open-source software to analyze data offline and use algorithms to mine unknown attacks based on the author's experience.

Review system architecture

Take the web server log as an example, collect the query log of the web server through logstash, backup to the HDFS cluster in near real time, and analyze the attack behavior offline through Hadoop script.

Custom log format

Open httpd to customize the log format, record user age and referer

<IfModule logio_module>

# You need to enable mod_logio.c to use %I and %O

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio

</IfModule>

CustomLog "logs/access_log" combined

Examples of logs

180.76.152.166 - - [26/Feb/2017:13:12:37 +0800] "GET /wordpress/ HTTP/1.1" 200 17443 "http://180.76.190.79:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"

180.76.152.166 - - [26/Feb/2017:13:12:37 +0800] "GET /wordpress/wp-json/ HTTP/1.1" 200 51789 "-" "print `env`"

180.76.152.166 - - [26/Feb/2017:13:12:38 +0800] "GET /wordpress/wp-admin/load-styles.php?c=0&dir=ltr&load[]=dashicons,buttons,forms,l10n,login&ver=Li4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vLi4vZXRjL3Bhc3N3ZAAucG5n HTTP/1.1" 200 35841 "http://180.76.190.79:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"

180.76.152.166 - - [26/Feb/2017:13:12:38 +0800] "GET /wordpress/ HTTP/1.1" 200 17442 "http://180.76.190.79:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"

testing environment

Add test code 1.php in WordPress directory, the content is phpinfo

Access log for 1.php

[[email protected] logs]# cat access_log | grep 'wp-admin/1.php'

125.33.206.140 - - [26/Feb/2017:13:09:47 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 17 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"

125.33.206.140 - - [26/Feb/2017:13:11:19 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 17 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"

125.33.206.140 - - [26/Feb/2017:13:13:44 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 17 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"

127.0.0.1 - - [26/Feb/2017:13:14:19 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 17 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"

127.0.0.1 - - [26/Feb/2017:13:16:04 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 107519 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"

125.33.206.140 - - [26/Feb/2017:13:16:12 +0800] "GET /wordpress/wp-admin/1.php HTTP/1.1" 200 27499 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"

[[email protected] logs]#

Hadoop offline processing

Hadoop is based on map and reduce model

Map script

localhost:work maidou$ cat mapper-graph.pl

#!/usr/bin/perl -w

#180.76.152.166 - - [26/Feb/2017:13:12:37 +0800] "GET /wordpress/ HTTP/1.1" 200 17443 "http://180.76.190.79:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.21 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.21"

My $line= "";

while($line=<>)

{

if( $line=~/"GET (\S+) HTTP\/1.[01]" 2\d+ \d+ "(\S+)"/ )

{

My $path=$1;

My $ref=$2;

if( $path=~/(\S+)\?(\S+)/  )

{

$path=$1;

}

if( $ref=~/(\S+)\?(\S+)/  )

{

$ref=$1;

}

if( ($ref=~/^http:\/\/180/)||( "-" eq $ref  )  )

{

my $line=$ref."::".$path."\n";

#printf("$ref::$path\n");

print($line);

}

}

}

Reducer script

localhost:work maidou$ cat reducer-graph.pl

#!/usr/bin/perl -w

My%result;

My $line= "";

while($line=<>)

{

if( $line=~/(\S+)\:\:(\S+)/ )

{

unless( exists($result{$line})  )

{

$result{$line}=1;

}

}

}

foreach $key (sort keys %result)

{

if( $key=~/(\S+)\:\:(\S+)/ )

{

My $ref=$1;

My $path=$2;

#Here is an example. Filter the suffix of the webshell file you are concerned about. Common ones are PHP and JSP. There is a risk of omission in the form of white list filtering. You can also filter the file types you ignore in the form of black list filtering

if( $path=~/(\.php)$/  )

{

my $output=$ref." -> ".$path."\n";

print($output);

}

}

}

An example of the generated result is:

localhost:work maidou$ cat r-graph.txt

- -> http://180.76.190.79/wordpress/wp-admin/1.php

- -> http://180.76.190.79/wordpress/wp-admin/admin-ajax.php

- -> http://180.76.190.79/wordpress/wp-admin/customize.php

- -> http://180.76.190.79/wordpress/wp-admin/load-styles.php

- -> http://180.76.190.79/wordpress/wp-admin/post-new.php

- -> http://180.76.190.79/wordpress/wp-login.php

http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/edit-comments.php

http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/profile.php

http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-login.php

http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/xmlrpc.php

http://180.76.190.79/wordpress/wp-admin/ -> http://180.76.190.79/wordpress/wp-admin/edit.php

http://180.76.190.79/wordpress/wp-admin/ -> http://180.76.190.79/wordpress/wp-login.php

http://180.76.190.79/wordpress/wp-admin/customize.php -> http://180.76.190.79/wordpress/wp-admin/load-scripts.php

http://180.76.190.79/wordpress/wp-admin/customize.php -> http://180.76.190.79/wordpress/wp-admin/load-styles.php

http://180.76.190.79/wordpress/wp-admin/edit-comments.php -> http://180.76.190.79/wordpress/wp-admin/load-scripts.php

http://180.76.190.79/wordpress/wp-admin/edit-comments.php -> http://180.76.190.79/wordpress/wp-admin/post-new.php

http://180.76.190.79/wordpress/wp-admin/edit.php -> http://180.76.190.79/wordpress/wp-admin/index.php

http://180.76.190.79/wordpress/wp-admin/edit.php -> http://180.76.190.79/wordpress/wp-admin/post-new.php

http://180.76.190.79/wordpress/wp-admin/index.php -> http://180.76.190.79/wordpress/wp-admin/customize.php

http://180.76.190.79/wordpress/wp-admin/post-new.php -> http://180.76.190.79/wordpress/wp-admin/load-scripts.php

http://180.76.190.79/wordpress/wp-admin/post-new.php -> http://180.76.190.79/wordpress/wp-admin/post.php

http://180.76.190.79/wordpress/wp-admin/post.php -> http://180.76.190.79/wordpress/wp-admin/admin-ajax.php

http://180.76.190.79/wordpress/wp-admin/post.php -> http://180.76.190.79/wordpress/wp-admin/edit.php

http://180.76.190.79/wordpress/wp-admin/post.php -> http://180.76.190.79/wordpress/wp-admin/load-scripts.php

http://180.76.190.79/wordpress/wp-admin/post.php -> http://180.76.190.79/wordpress/wp-admin/post.php

http://180.76.190.79/wordpress/wp-admin/profile.php -> http://180.76.190.79/wordpress/wp-admin/load-scripts.php

http://180.76.190.79/wordpress/wp-login.php -> http://180.76.190.79/wordpress/wp-admin/load-styles.php

http://180.76.190.79/wordpress/wp-login.php -> http://180.76.190.79/wordpress/wp-login.php

http://180.76.190.79:80/ -> http://180.76.190.79/wordpress/wp-admin/load-styles.php

http://180.76.190.79:80/ -> http://180.76.190.79/wordpress/wp-login.php

http://180.76.190.79:80/ -> http://180.76.190.79/wordpress/xmlrpc.php

Graph algorithm

To generate data import graph database neo4j, meet the characteristics of webshell:

In degree and out degree are all 0

In degree and out degree are all 1 and they point to themselves

Neo4j

Neo4j is a high-performance, NoSQL graphic database, which stores structured data on the network rather than in tables. Because of its advantages of embedded, high-performance, lightweight and so on, it has attracted more and more attention.

Neo4j installation

Download the installation package from https://neo4j.com/ and install it. The default configuration is OK

Ne04j boot

Take my Mac as an example. You can start it through GUI. The default password is ne04j / ne04j. The first time you log in, you will need to change the password

GUI management interface

Python API library installation

sudo pip install neo4j-driver

Download JPype

https://pypi.python.org/pypi/JPype1

Install JPype

tar -zxvf JPype1-0.6.2.tar.gz

cd JPype1-0.6.2

sudo python setup.py install

The code of importing data into the graph database is as follows:

B0000000B60544:freebuf liu.yan$ cat load-graph.py

Import re

from neo4j.v1 import GraphDatabase, basic_auth

Nodes={}

Index=1

driver = GraphDatabase.driver("bolt://localhost:7687",auth=basic_auth("neo4j","maidou"))

session = driver.session()

file_object = open('r-graph.txt', 'r')

Try:

for line in file_object:

matchObj = re.match( r'(\S+) -> (\S+)', line, re.M|re.I)

If matchObj:

path = matchObj.group(1);

ref = matchObj.group(2);

if path in nodes.keys():

path_node = nodes[path]

Else:

path_node = "Page%d" % index

nodes[path]=path_node

sql = "create (%s:Page {url:\"%s\" , id:\"%d\",in:0,out:0})" %(path_node,path,index)

index=index+1

session.run(sql)

#print SQL

if ref in nodes.keys():

ref_node = nodes[ref]

Else:

ref_node = "Page%d" % index

nodes[ref]=ref_node

sql = "create (%s:Page {url:\"%s\",id:\"%d\",in:0,out:0})" %(ref_node,ref,index)

index=index+1

session.run(sql)

#print SQL

sql = "create (%s)-[:IN]->(%s)" %(path_node,ref_node)

session.run(sql)

#print SQL

sql = "match (n:Page {url:\"%s\"}) SET n.out=n.out+1" % path

session.run(sql)

#print SQL

sql = "match (n:Page {url:\"%s\"}) SET n.in=n.in+1" % ref

session.run(sql)

#print SQL

Finally:

file_object.close( )

session.close()

The generated digraph is as follows

The nodes with a query in degree of 1 and a query out degree of 0, or the nodes with a query in degree of 1 and a query out degree of 1, point to their own nodes. Since the case where ref is empty is also identified as a "-" node, the nodes with a query in degree of 1 and an query out degree of 0 are all zero.

Optimization point

In the actual use of production environment, we encounter the following types of false positives:

Home page, various index pages

PhpMyAdmin, ZABBIX and other operation and maintenance management background

The console of Hadoop, elk and other open source software

API interface

These can be effectively solved by adding white in a short period of time. What is more troublesome is the impact of scanner on the results. This part needs to remove the interference through the fingerprint of scanner or the use of high-level human-computer algorithm.

Epilogue

Using algorithm to mine unknown attack behavior is a very popular research direction at present. This paper only introduces an algorithm which is easy to understand and implement. This algorithm is not my first one, and many security companies have more or less practice. Limited space, I will continue to introduce other algorithms in other articles on enterprise security construction. Algorithm or machine learning is essentially the trend of scientific laws in large data sets, so it is difficult to achieve accurate alarm. At present, it needs to be assisted by various rules and models, but it is really a magic weapon for mining unknown attacks.