IMCAFS

Home

cloud security monitoring: cloud security series (1)

Posted by punzalan at 2020-02-28
all

(1) Introduction

Organizations start to use cloud computing, which has many benefits, such as cost saving, fast online, on-demand expansion, etc. As organizations begin to use cloud computing, security personnel must update their operations to keep up with the cloud computing model. The attachment part of this article provides the cloud security control of NIST, cloud security alliance, cloud deployment model of enisa and NIST, and cloud security reference suggestions.

In the latest version of "cloud computing top threat" in 2016, the report (CIS, 2016) identified 12 key issues of cloud security. Effective safety monitoring reduces the following risks:

Authentication, certificate and access management

Unsafe API

Account hijacking

Malicious insiders

Advanced persistent threat (APT)

Data loss

Use and malicious use of cloud services

Protecting cloud services involves detailed risk assessment and architecture security solutions to meet business needs. Security monitoring plays an important role in protecting cloud services. This article focuses on how to implement security monitoring solutions for Amazon Web services (AWS) environments.

1.1 cloud security monitoring challenges

The main types of cloud computing solutions are infrastructure as a service (IAAs), platform as a service (PAAS) and software as a service (SaaS). AWS positions itself as a leading cloud service provider, with Microsoft azure and Google cloud ranking second and third respectively.

AWS is able to innovate at a fast speed every day, introducing many new functions and / or services every day. On average, AWS customers can get three new functions every day. The References section provides the different AWS services available and best practices for protecting the AWS environment. Some best practices include encryption, privileged access management, resource isolation, and monitoring.

This paper focuses on the implementation of AWS workload security monitoring. The following departments will highlight key areas of AWS security monitoring in addition to traditional data center monitoring.

AWS management console monitoring

The management of AWS instances and resources is performed by the AWS management console. The main activities you can do with the AWS management console are to create new virtual machines and delete any existing virtual machines and other AWS services. Monitoring unauthorized access to the AWS management console is critical because access to the cloud management platform is like having the key to the cloud computing kingdom.

Application programming interface (API) access monitoring

As organizations move towards cloud solutions, they must adapt to the new Devops architecture. If the current application is transferred to the cloud, it will be difficult for the team to realize the benefits of implementing the cloud platform. The existing application structure must be re architected to adapt to the cloud deployment model. Ideally, cloud solutions use the Devops approach for continuous deployment. This approach enables enterprises to reduce development time and move to fast solutions. For example, some AWS environments use AWS codepipeline and Devops policy for continuous application deployment in AWS environment.

Devops brings new challenges to security monitoring. The number of API calls is increasing due to automation related to AWS codepipeline, infrastructure as code and serverless computing. It is important to monitor these API calls to ensure that there is no unauthorized access. Due to the large amount of activity, it is difficult to use traditional rules and threshold based monitoring to track these events. Machine learning technology is very suitable for monitoring this huge activity by learning different features from data.

AWS serverless computing monitoring

Recently, AWS introduced "serverless" computing; serverless computing relies on AWS lambda to run application code. In serverless computing, there is no server infrastructure; the focus is to monitor the execution, invocation and other parameters of AWS lambda functions related to AWS lambda functions.

AWS identity and access management (IAM) monitoring

AWS IAM enables organizations to control access to AWS services and specific resources. AWS Iam provides the option to configure fine-grained permissions in an AWS environment. It is recommended that you give the minimum privileges to manage the AWS resources required to perform job functions. As an effective information security control, the security team should use many tools provided by AWS, such as access advisor. Providing appropriate access prevents any unauthorized access and effectively monitors AWS resource management access. Monitoring the different management credentials used in the AWS environment is a requirement of various compliance regulations. Machine learning is an ideal choice for monitoring various awscredentials, because it learns from previous events and understands the normal phenomena of identifying exceptions. Financial regulation requirements such as Sarbanes Oxley require organizations to review all privileged access and changes to the AWS environment that hosts financial data as part of security compliance monitoring.

1.2 overall architecture of the proposed solution

The proposed cloud security monitoring solution is to use large data analysis solutions, such as Splunk, Apache spark or Amazon elastic search to load all AWS cloud infrastructure logs. Machine learning model is applied to evaluate the risk to determine suspicious events. Then, according to the event, the security team should use the automatic (lambda function) mail to alarm, and the security team will conduct manual analysis.

Manual benchmarking and configuration of AWS infrastructure security monitoring rules is a challenge due to changes in the AWS environment. Machine learning technology, such as the supervised learning algorithm introduced later in this paper, can automatically learn from data to understand abnormal and high-risk events, so as to meet the security monitoring challenges of cloud security monitoring. Machine learning models can be used to build baselines, conduct risk scoring, identify suspicious events, and use authentication information, location information and activity types to identify.

In this article, Splunk will be used to retrieve all AWS cloudtrail and cloudwatch logs used to implement AWS security monitoring use cases. Machine learning model is applied to identify suspicious activities in AWS cloud infrastructure. The latest version of Splunk 6.5 has a built-in machine learning toolkit that supports a variety of machine learning algorithms. The machine learning model will be applied using the Splunk machine learning toolkit. The steps of using machine learning algorithm are as follows:

A) visualization, combining data cleansing with intelligent feature engineering,

b) Choose the right measure / method to estimate model performance

C) adjust the parameters.

The key concepts proposed are summarized as follows:

1) collect all AWS log data from cloudtrail and cloudwatch to Splunk

2) A machine learning model is applied to build a baseline and score risk, rather than manual rules / thresholds.

Some of the factors that make this possible are:

a) Promote big data technology to enable information security teams to store all types of data on a large scale.

b) Many machine learning solutions are becoming available, such as Microsoft azure ml studio, Amazon machine learning, databricks spark, and the Splunk machine learning toolkit.

With a centralized open source big data analysis solution, the security team can apply machine learning and other statistical techniques to any data set. The main advantage of this solution is that once a successful method is identified by machine learning, the same method can be used to solve similar challenges. For example, if a technology helps to identify suspicious access attempts from AWS cloud based infrastructure identities and access authentication data, the same method can be applied to identify suspicious access attempts from other applications and cloud infrastructures such as Microsoft azure and Google cloud. The next section introduces machine learning techniques and implements two use cases using Splunk.

1.3 risk scoring method

Risk scoring is not a new concept; the information security community has been using risk scoring to identify the most important vulnerabilities and solve problems. In traditional data center monitoring, risk scoring methods rely on understanding the enterprise environment to identify suspicious events. This type of example, based on the understanding of authorized administrators with access rights, generates unauthorized access alarms for key server assets. Based on the known bad patterns, it is useful for the information security community to detect malicious events and score the risk of the known bad patterns. The reference section provides some manual risk scoring using static rules and thresholds in the AWS environment.

The challenge of these standard risk scoring based monitoring is to keep up with the rapid pace of new API calls and permissions launched by AWS. Some of the standards related to cloud security monitoring are identity, data access, operations performed, and geographic location. By using these standards (features) combined with historical data, machine learning technology can learn environment and identify anomalies. Machine learning model can provide risk score according to the learning of previous data. In this paper, the linear regression algorithm is used as an example to develop a machine learning model to predict the risk score. The linear regression algorithm will predict the value. Linear regression algorithm uses linear function to model the relationship between continuous output variables and features (input, interpretation variables). The following section about machine learning describes the algorithm in detail. The model learns from the data; for AWS use cases, it is more effective if compared to manually updating the rules / thresholds for risk scores.

1.4 machine learning

There are two kinds of machine learning: supervised learning and unsupervised learning. In supervised learning, machine learning algorithm will learn from the provided data and tags (classification). The resulting model will attempt to predict tags (classifications) (astroml, 2015). Some commonly used classification algorithms are neural networks, random forests, support vector machines (SVM), decision trees, logistic regression, and naive Bayes. An example of supervised learning is to provide a set of dog and cat pictures to machine learning algorithm, and the tag indicates whether the picture is a cat or a dog. The supervised learning algorithm will learn from dog and cat images and create a prediction model. Apply the new image to the prediction model, and predict whether the provided image is a dog or a cat, as shown in Figure 1:

In this paper, the supervised learning technology using linear regression algorithm will be used to predict the risk score of AWS cloud infrastructure events.

In unsupervised learning, the model attempts to use the function of unlabeled to understand data sets and identify patterns and exceptions of data. Unsupervised learning includes dimensionality reduction, clustering and density estimation (astroml, 2015). An example of unsupervised learning is to provide some dog and cat pictures for machine learning algorithm; it will group cat and dog pictures, as shown in Figure 2:

Unsupervised learning algorithm will help to identify the main features of data sets. It is also very helpful to provide different benefits according to various functions. In the example of dog and cat images, the use of unsupervised learning techniques will help to understand how to most usefully classify several facial features through data classification based on these facial features. In our use cases of AWS cloud infrastructure events, classifying the data according to their login location can provide help on whether they are important functions. Some common unsupervised algorithms are K-means clustering, hierarchical clustering, and hidden Markov model. Figure 3 highlights the different algorithms in the Splunk machine.

Machine learning should be applied to applicable cases and produce results. Machine learning needs a lot of data, and the machine learning model that takes a lot of data will produce clear results. Because machine learning algorithm needs a lot of data to provide a useful model, it needs a lot of patience to get results. Generating a large number of logs in the AWS environment, using the appropriate algorithm to create the model, and a large number of and a variety of data will ensure that there is no overfitting problem. In addition, it should be noted that each AWS environment is different. As an example, most environments use different AWS virtual private cloud (VPC) configurations to differentiate AWS resources based on specific business requirements. Creating a machine learning model using data from the same AWS environment produces the best results.

Recent technological developments (such as Splunk and Apache spark) have enabled rapid deployment of machine learning algorithms on different data types, using the capabilities of many different datasets.

(2) Lab setup

The steps for initial laboratory configuration are explained in Appendix B. In the lab configuration, Splunk is configured to receive logs from AWS cloudtrail. The Splunk machine learning kit is installed and set up.

For further testing, additional logs are generated by adding users, adding instances, and starting a new environment from AWS QuickStart. With the collection of logs De, machine learning is applied to calculate risk scores and detect suspicious events.

(3) Machine learning - process

The following steps highlight how to apply machine learning techniques using the Splunk machine learning toolkit. Machine learning solutions use the same process and can also be applied to security monitoring use cases. The solution can also be implemented using the Apache spark mllib library. One of the challenges of AWS cloudtrail JSON data files in Apache spark is parsing and normalization. AWS has turned AWS cloudtrail logs into open source code for spark data frame (GitHub, 2016). After loading the data into the Apache spark data package, the Apache spark mllib library is ready.

One aspect to remember in machine learning is data cleansing. Data cleansing ensures data consistency and consistency. In many cases, data should be extracted and formatted before being sent to machine learning algorithms. Splunk itself solves data cleansing by indexing data at ingestion time, extracts related fields, and provides mapping from JSON format to standard columns. Machine learning algorithms can work directly from data using columns. Splunk saves a lot of time in data cleansing and formatting compared to many open source solutions, such as Apache spark. Figure 5 below highlights the steps involved in machine learning:

3.1 collect data to Splunk and understand the data

The security team must collect all AWS logs in a central location. Even if the organization is unable to implement any active monitoring, the log will conduct forensic analysis in the subsequent time after the event.

In the initial setup, Splunk is configured to get data from cloudtrail and cloudwatch logs. The Splunk AWS application can be used to explore and understand log events and confirm machine learning capabilities.

3.1.1  AWS CloudTrail

AWS cloudtrail creates logs of all API access requests, AWS resource access, and AWS console login access information. It is important to understand AWS cloudtrail log data to design machine learning algorithms effectively. AWS cloudtrail User Guide (AWS, 2014) provides references and examples of different types of log events. The cloudtrail API call log consists of two parts: record body contents and useridentity element.

The event types analyzed are:

3.2 browsing data

In this particular case, the Splunk app for AWS can be used to study AWS log data. The Splunk app for AWS provides key operational and security insights for Amazon Web services accounts. Figure 6 below shows the different dashboard options available for the Splunk app for AWS. These dashboards help to understand the relevant AWS logs and identify any suspicious activity.

The dashboard in Figure 7 shows the behavior of various users in the AWS environment. Viewing data using different fields (characteristics) will help the security team understand the relevant fields in the log data.

The Splunk app for AWS allows security practitioners to understand and explore data to identify fields that can help identify suspicious AWS activities. These fields will be available to the Splunk machine learning toolkit when developing the model.

3.3 case 1 - detect suspicious AWS console login

3.3.1 defining features

In this case study, the "awsconsesignin" event was explored to understand which areas would help identify any suspicious AWS console login. Some related areas identified are: sourceipaddress useragent useridentity.arn eventtime responseelements.consolelogin

In the above example, we can understand and explore various logs from a security perspective to identify features. The Splunk machine learning toolkit has algorithms such as PCA, which can be used to explore and define features mathematically. Understanding the data from different vantage points will help to identify abnormal activity.

3.3.2 select and apply learning algorithm

This section focuses on the Splunk machine learning toolkit commands required to create, grade, and test models.

Awsconsesignin.csv is generated using AWS logs in the lab environment. You can use the Splunk command to export events to CSV format:

* sourcetype="aws:cloudtrail" eventType=AwsConsoleSignIn | table sourceIPAddress, userAgent, userIdentity.arn, eventTime, responseElements.ConsoleLogin。

Security personnel should add risk scores for these events. Risk scores should be allocated based on the knowledge and environment of the security area. Ideally, safety personnel should assess and assign risk scores based on the scores of environmental events, once a month or other times. Machine learning model needs a lot of labeled risk score data to get useful results.

An example of awsconsesignin.csv is shown in Figure 8:

Create new model - awsconsesignin

The goal of the machine learning model is to predict risk scores and identify suspicious events. The input of this supervised learning model is linear regression with specified risk score and algorithm. The output will be a model for learning from data to predict risk scores.

In the machine learning kit application configuration in Figure 9, select assistants - > predict numerical FIE, and provide the input file awsconsesignin.csv in "enter a search":

inputlookup AwsConsoleSignIn.csv

Awsconsesignin.csv as the input file. The search loads all records in the awsconsesignin.csv file for analysis. Select the following options to create the model:

Algorithm: linear regression

Field to forecast: riskscore

Fields for prediction: "sourceipaddress", "useragent", "useridentity. ARN",

"eventTime", "responseElements.ConsoleLogin"

The resultant set of Splunk commands are below:

| inputlookup AwsConsoleSignIn.csv

| fit LinearRegression fit_intercept=true "risk_score" from "sourceIPAddress","userAgent", "userIdentity.arn" , "eventTime", "responseElements.ConsoleLogin" into "aws_console"

In this particular example, supervised learning is used to provide AWS log data and actual risk scores to the machine learning algorithm linearregression. The output model AWS console predicts the risk score of a given set of functions "sourceipaddress", "useragent", "useridentity: ARN", "eventtime", "responseelements: consolelogin").

Evaluate results and update model

In the configuration in Figure 10, the data is divided into 70% for training and 30% for testing and evaluating the model. Assigning 30% of the data to test and evaluate the model helps to understand the accuracy of the model.

After matching the model, the Splunk machine learning toolkit performs the necessary calculations to measure model performance.

As shown in Figure 11, drawing actual and predicted values on the online graph helps the security team understand the efficiency of the model.

Figure 11 Splunk machine learning Toolkit - actual vs. forecast

These commands apply the model to the dataset in awsconsesignin.csv and plot actual and predicted values to understand the accuracy of the model.

These commands will apply the model to the dataset in awsconsesignin.csv to calculate R ² and root mean squared error (RMSE).

These values contribute to the accuracy of the measurement model.

RMS error and R ^ 2 values provide the concept of error amplitude. R ^ 2 indicates the validity of a set of predictions of actual values. The value is between 0 and 1. A value close to 0 indicates that the model is not well fitted, as shown in Figure 12

After analyzing the performance, if the performance is not satisfactory, other features can be extracted. For example, including a geographic database from maxmind's geolocation data, setting a new context for the IP address will help improve the effectiveness of the model.

Once the performance is satisfactory, the security team should deploy the model using the apply < model > command. After the implementation of the model, the security analyst should constantly adjust the model according to the feedback.

3.4 case 2 - detect suspicious API calls

3.4.1 defining features

Using the security expertise of AWS cloud, we explored the "awsapicall" event to understand which fields will help to identify suspicious AWS API calls. Some of the relevant areas identified are:

3.4.2 select and apply learning algorithm

Generate awsapicall.csv using the AWS logs in the lab environment. The Splunk command can be used to export events to CSV:

* sourcetype="aws:cloudtrail" eventType=AwsAPICall | table sourceIPAddress,eventSource , eventName , userIdentity.arn, eventTime, userAgent, userIdentity.type

Security personnel should add risk scores for these events. Risk scores should be allocated based on the knowledge and environment of the security area. An example record of awsapicall.csv is shown in Figure 13 below.

The goal of the machine learning model is to predict the risk score to identify high suspicious API call sets. In the machine learning kit application, assistants - > predictnumeric fields, and use the following command in the search box to provide the input file awsconsesignin.csv:

inputlookup AwsAPICall.csv

Awsapicall.csv as the input file. This search loads all records in the awsapicall.csv file for analysis. Select the following options to create the model:

Algorithm: linear regression.

Predicted field: riskscore

In the configuration of Figure 14, the fields used for prediction are: "sourceipaddress", "eventsource", "eventName", "useridentity. ARN", "eventtime", "useragent", "useridentity".

| inputlookup AwsAPICall.csv

| fit LinearRegression fit_intercept=true "riskScore" from "sourceIPAddress", "eventSource" ,"eventName" , "userIdentity.arn", "eventTime", "userAgent", "userIdentity.type" into "aws_apicall"

In our example, we use supervised learning to provide AWS log data and actual risk scores for the machine learning algorithm linearregression. The output model aws'console will try to predict the risk score. Given the function set "sourceipaddress", "eventsource", "eventName", "useridentity. ARN", "eventtime", "useragent", "useridentity. Type"

Evaluate results and update model

The data is divided into 70% for training model and 30% for testing and evaluation model. This helps to understand the accuracy of the model.

After installing the model, the Splunk machine learning toolkit performs the necessary calculations to measure model performance. Drawing the actual and predicted values on the online graph, as shown in Figure 15 below, helps to understand the efficiency of the model.

RMS error and R ^ 2 value provide the concept of error size. R ^ 2 indicates the validity of the forecast to the actual value. The value is between 0 and 1. A value close to 1 indicates that the model is a good fit.

After analysis, if the performance is not satisfactory, other features can be extracted. For example, including VPC information about events will help improve the effectiveness of the model.

When the performance is satisfactory, use the apply < model > command to deploy the model. After the implementation of the model, the security analyst should constantly adjust the model according to the feedback.

This section focuses on how to use the Splunk machine learning toolkit to create, evaluate, and deploy machine learning models. The two models generated are examples of very few data records created in the lab to prototype the functionality of the Splunk machine learning toolkit. The security team should test, adapt and deploy the machine learning model according to the AWS environment.

This article only discusses two use cases. Another practical use case that might be useful is to determine an abnormal network traffic session between AWS VPCs. Threat modeling, using attacker tools, technology and process (TTP) inputs, can be used to identify other security monitoring use cases. After determining the use cases, the method discussed in this paper can be used to evaluate the features and apply the machine learning model to the new use cases.

The data demand of machine learning is very large. It will produce useful results to take a lot of data to create machine learning model. In addition, if multiple data sources are used to extract features, higher fidelity can be achieved. For example, attaching geolocation data to an IP address will help improve the effectiveness of the model.

(4) Conclusion

This paper focuses on how to realize the machine learning technology of AWS log. This paper uses machine learning technology to identify suspicious events in IAAs environment. Identity is a new boundary. Using machine learning technology to combine identity data with other data will help security professionals identify suspicious events. As a first step, the safety team members should understand the monitoring needs, understand the data and evaluate the appropriate methods. The security team should consider whether machine learning is appropriate for the nature, exploration, visualization, and selection characteristics of the log as input to the creation of the model. After establishing and testing the model, the security team should apply the model to real-time traffic (data). After using the model, the security team should periodically evaluate the results and adjust the model.

Many machine learning solutions are becoming available, such as Microsoft azure ml studio, Amazon machine learning, databricks spark, and the Splunk machine learning toolkit. All these machine learning tools make the realization of machine learning model very intuitive and easy to use. These user interfaces encapsulate math and coding, such as traditional machine learning application languages (such as R).

Using Amazon machine learning for security monitoring, demo was demonstrated in AWS re invest 2016 (videos from re invest 2016 security and compliance sessions, 2016). With the development of cloud implementation, security teams should also understand new methods and advantages to implement security operations and security monitoring activities. Automation and machine learning are two key areas in the cloud, which provide advantages for defenders.

As a defender, the goal is to deploy defense in depth by setting preventive and detective control in each layer, so that the attacker will spend a high cost to achieve its goal. Machine learning can be a useful tool. After identifying suspicious activity, using forensics, the security team can track and track any activity performed by the attacker and take remedial measures.

Other use cases that may benefit from this solution include risk management, security automation / business processes, user / network behavior analysis, fraud detection, threat tracking, multi-source threat intelligence integration, and event response / forensic analysis.

Original address: https://www.sans.org/reading-room/whitepapers/cloud/cloud-security-monitoring-37672