Understand Feature Engineering for Machine Learning in 5 minutes

Florian Bouron

Published in

Artificial Intelligence in Plain English

4 min readJul 23, 2021

In this article, we will see what is feature engineering and how you can apply it to your machine learning algorithms.

Chemist representing feature engineering

Introduction

Before we move further, we need to define what is a feature in machine learning.

If you are new to machine learning, a feature is an input of a machine learning algorithm.

What is Feature Engineering?

Feature engineering extracts useful features from raw data using math, statistics, and domain knowledge.

For example, if a ratio of two numeric features is important to classifying an instance, then calculating that ratio and including it as a feature may improve the model quality.

This means that you could have two features: square meter and price of flats. You might need to create a feature by getting the price per square meter to improve your model.

How to do feature engineering?

Let’s see different strategies of feature engineering. In this article, we won’t see all the methods, but the most popular ones.

Adding and dropping features:

Let’s assume we do have the following features:

If we want to predict the price of a flat, the number of plants might be irrelevant. In that case, we need to remove this feature from our machine learning model to don’t add extra noise.

This noise is called the curse of dimensionality. This means that as the number of features in the data increases, the number of data points required to build a good model grows exponentially.

We need to choose which features have are the most relevant to our model.

Combining multiple features into one feature:

In the example above, we can see that square meters and square feet are actually the same data but not the same unit. If we give this to our algorithm, it will have to understand that the square meter and square foot are related and are actually the same feature.

That’s why we need to decide on which measurement to take and keep only one.

We could also have two features, number of dogs and number of dogs, and combine them under the number of animals.

Though, combining the features is not every time a good idea. For example, in the case of a date feature, probably the day of the week matter.

You need to remember that quality is better than quantity.

Cleaning existing features:

You need to keep the features that you think are relevant for your model picking up on the right signal in the data.

To do that you can:

Impute missing values.
Remove outliers for not trying to train with data points that are not representative.
Getting rid of the scales, for example, if you have features in centimeters and some other ones in meters, try to convert all of them in centimeters. This is called normalization.
Transform skewed data to make it more compact for our model thanks to an easier distribution.

Binning:

Binning is when you take a numerical measurement and convert it into a category.

Here is an example for home sales:

In that example, we can assume that the sale price depends on the fact that there is a swimming pool.

We can then simplify our model by pre-processing the data and replacing the swimming pool length with a boolean future.

One-hot encoding:

One-hot encoding is a way to represent categorical data in a way that the machine learning algorithm will understand.

Our model understands numbers but not strings, that’s why we need to convert strings to numbers. Though, we cannot assign random numbers to our strings, because our model might give more importance to big numbers than little numbers. That’s why we are going to use a one-hot encoding.

Here is an example about home sales: