IMCAFS

Home

what is feature engineering?

Posted by lipsius at 2020-03-23
all

Go straight to the point! No matter machine learning, deep learning or statistical methods, any intelligent system needs data support. The original data is often difficult to be directly used by the algorithm, so feature engineering is particularly important. This paper will introduce the project automation of risk control features, so that the risk control analysts who do not understand the programming can also complete the design and processing of features. What is feature engineering feature engineering is the process of extracting more information from the original data according to the business problems to be solved. For example, raw data is like petroleum, feature engineering is like extracting basic chemical raw materials such as ethylene, propylene, butadiene, benzene, toluene and xylene from petroleum, and model training is like producing a variety of organic chemical raw materials and synthetic materials with basic chemical raw materials. Figure – the purpose of petrochemical process feature engineering is to provide richer information for model training without adding new data. Feature engineering usually includes data cleaning, feature design, feature transformation and feature selection. Graph feature engineering is the precondition of model training. There is a saying in the modeling world that data and features determine the upper limit of machine learning, while models and algorithms just keep approaching the upper limit. The relationship between feature engineering and model training is similar to the above example of petrochemical industry. Petroleum itself can not be directly used, its value depends on the effect of extracting basic chemical raw materials. The quality of model training also depends on the effect of feature design and processing. The traditional implementation of feature engineering is very complex, which requires a lot of code and script development. It's almost impossible for risk control analysts who don't know how to program to directly carry out feature engineering. Risk control analysts are most aware of the relevant business, they know what the problem is, and how to design and process features logically. Figure - the traditional solution to the troubles of risk control analysts requires the involvement of Algorithm Engineers, who describe the business requirements to Algorithm Engineers, who then design and process features according to their own understanding. Figure – in the process of communication, the need of engineers to join risk control business will inevitably cause information asymmetry, and increase a lot of time and labor costs. In many risk control scenarios, risk control personnel compete with risk users in terms of professionalism and time. If risk control analysts can directly process features, the effect will be greatly improved compared with the traditional mode, and the cost of risk control can be greatly reduced. Figure – requirements vs realization of visual configuration of feature engineering we need a platform on which risk control business analysts can complete the design and processing of features through drag and drop visual configuration. For example, for most of our small partners, we don't use Photoshop, but we can achieve professional results according to our own ideas through tools like beautiful pictures and show. Meitu software has changed the effect that can only be realized through complex operation, which needs professional image processing software before, into a simple operation without professional knowledge. The visual configuration platform of feature engineering is just to solve the similar problems. The former feature engineering which can be realized by program can be transformed into the work which can be realized by non programmers through simple operation, and can achieve the same or even better results. One of the core of this platform is to implement a set of configurable rule engine, which will translate the user's configuration into corresponding programs and execute them. Different from general rule engine, such as drools, risk control feature engineering has its own personalized requirements for rule engine. It is not convenient to use general rule engine to achieve these requirements, and some requirements are even difficult to achieve. Therefore, we need to develop a set of risk control rule engine. After the risk control business personnel use this platform, the algorithm engineer does not need to customize the development of Feature Engineering for each risk control demand, and can put valuable time into more valuable work. At the end of automated feature engineering, let's talk about the feature engineering black Technology Auto feature. The platform mentioned above is just an algorithm engineer. Now that we have entered the era of artificial intelligence, let's imagine boldly, is it possible that we don't even need visual configuration to let the platform automatically complete feature engineering? The answer is yes. At the same time, the challenge for platform implementation is huge. Driven by black technology, risk control business analysts only need simple work to complete the automatic design and processing of features. This greatly improves the efficiency and shortens the time. In today's rapidly changing risk control environment, we can better embrace the change and truly achieve the magic height of a foot. Author: Jingdong finance risk management department intelligent risk laboratory Deng Chongxin

Go straight to the point!

No matter machine learning, deep learning or statistical methods, any intelligent system needs data support. The original data is often difficult to be directly used by the algorithm, so feature engineering is particularly important. This paper will introduce the project automation of risk control features, so that the risk control analysts who do not understand the programming can also complete the design and processing of features.

What is characteristic Engineering

Feature engineering is a process to extract more information from the original data according to the business problems to be solved. For example, raw data is like petroleum, feature engineering is like extracting basic chemical raw materials such as ethylene, propylene, butadiene, benzene, toluene and xylene from petroleum, and model training is like producing a variety of organic chemical raw materials and synthetic materials with basic chemical raw materials.

Figure - petrochemical process

The purpose of feature engineering is to provide more information for model training without adding new data.

Feature engineering usually includes data cleaning, feature design, feature transformation and feature selection.

Figure - Feature Engineering

Feature engineering is the precondition of model training. There is a saying in the modeling world that data and features determine the upper limit of machine learning, while models and algorithms just keep approaching the upper limit.

The relationship between feature engineering and model training is similar to the above example of petrochemical industry. Petroleum itself can not be directly used, its value depends on the effect of extracting basic chemical raw materials. The quality of model training also depends on the effect of feature design and processing.

Traditional realization of Feature Engineering

Feature engineering is usually very complex and requires a lot of code and script development. It's almost impossible for risk control analysts who don't know how to program to directly carry out feature engineering. Risk control analysts are most aware of the relevant business, they know what the problem is, and how to design and process features logically.

Figure - troubles of risk control analysts

The traditional solution requires the involvement of Algorithm Engineers. Risk control analysts describe the business requirements to Algorithm Engineers, who then design and process features according to their own understanding.

Figure – Engineer in

In the process of communication, risk control business needs will inevitably cause information asymmetry, and increase a lot of time and human costs. In many risk control scenarios, risk control personnel compete with risk users in terms of professionalism and time. If risk control analysts can directly process features, the effect will be greatly improved compared with the traditional mode, and the cost of risk control can be greatly reduced.

Figure – requirements vs implementation

Visual configuration of Feature Engineering

We need a platform on which risk control business analysts can complete feature design and processing through drag and drop visual configuration.

For example, for most of our small partners, we don't use Photoshop, but we can achieve professional results according to our own ideas through tools like beautiful pictures and show. Meitu software has changed the effect that can only be realized through complex operation, which needs professional image processing software before, into a simple operation without professional knowledge.

The visual configuration platform of feature engineering is just to solve the similar problems. The former feature engineering which can be realized by program can be transformed into the work which can be realized by non programmers through simple operation, and can achieve the same or even better results.

One of the core of this platform is to implement a set of configurable rule engine, which will translate the user's configuration into corresponding programs and execute them.

Different from general rule engine, such as drools, risk control feature engineering has its own personalized requirements for rule engine. It is not convenient to use general rule engine to achieve these requirements, and some requirements are even difficult to achieve. Therefore, we need to develop a set of risk control rule engine.

After the risk control business personnel use this platform, the algorithm engineer does not need to customize the development of Feature Engineering for each risk control demand, and can put valuable time into more valuable work.

Automatic feature Engineering

Finally, let's talk about the feature engineering black Technology Auto feature.

The platform mentioned above is just an algorithm engineer. Now that we have entered the era of artificial intelligence, let's imagine boldly, is it possible that we don't even need visual configuration to let the platform automatically complete feature engineering?

The answer is yes. At the same time, the challenge for platform implementation is huge. Driven by black technology, risk control business analysts only need simple work to complete the automatic design and processing of features.

This greatly improves the efficiency and shortens the time. In today's rapidly changing risk control environment, we can better embrace the change and truly achieve the magic height of a foot.

Author: Jingdong finance risk management department intelligent risk laboratory Deng Chongxin