“MLshorts” 13: What is actually a “feature” in Machine Learning

A simple and clear explanation

--

Photo by Алекс Арцибашев on Unsplash

What is it? 🤔

In the previous “MLshorts” articles we talked about Feature Engineering, Feature Selection and Feature Scaling, but what is actually a “feature”?

In “MLshorts” 10 I briefly explained it as “a measurable property of the object you try to analyze or predict. It is the known variable X when you try to predict the unknown Y”.

Now that we have a good understanding of the main routines around features, let me give a more robust explanation around them.

A “feature” is just another name for “variable”. Features represent all the already known variables of a dataset, that help you predict the target variable.

For example, when you want to predict the price of a car, the features might be the age of the car, its manufacturer, its transmission type, its engine type etc.

Another example, when you try to predict whether a bank will give loan to a customer, the features might be the customer’s age, income, work status, marital status, place of residence etc.

More formally, features are the individual characteristics or attributes of the data that are used to make predictions or decisions. Just like ingredients in a recipe, features provide the necessary information for the Machine Learning model to understand and analyze the data.

What are the main categories of Features? 📋

A feature can belong to one of these two main categories:

  • Numerical Features: these are represented by numerical values, such as age, height, temperature, income or price. These features can be continuous (e.g., age in years) or discrete (e.g., number of children).
  • Categorical Features: these represent different categories or groups, such as manufacturers, cities, or types of products. These features are typically represented as text or integers, and each category is treated as a separate entity. For example, in a dataset about employees of a global company, the “country residence” feature could have categories like “Italy”, “Switzerland”, “Bulgaria” etc.

Apart from these two main groups, a feature can belong to one of these categories:

  • Ordinal Features: these are similar to categorical features but have an inherent order or ranking. For example, the “education level” feature could have categories like “high school,” “college,” and “graduate school,” with a clear hierarchy.
  • Text Features: these represent textual data, such as product descriptions, customer reviews, or email content. These features require special preprocessing techniques, such as tokenization and vectorization, to convert them into numerical representations (vectors) that can be used by ML models.
  • Datetime Features: these represent dates and times, such as timestamps or event dates, in multiple granularities (daily/monthly/quarterly/yearly etc data). These features can provide valuable temporal information for Time Series analysis or forecasting.
  • Engineered Features: these are derived from Feature Engineering processes, by transforming or combining existing features to extract additional information. For example, combining the “length” and “width” features to calculate the area of an object.

Why are Features important? 💎

Features, just like ingredients in a recipe, provide the necessary information for the model to understand and analyze the data. When building a Machine Learning model, we need to perform a detailed Exploratory Data Analysis (EDA) of the dataset’s features, to understand what is going on, the relations within each other and any patterns with the target variable we want to predict. After model building, we also evaluate the feature importance, to see which are more valuable for the model. So, you understand that by selecting and engineering relevant features, we can improve the model’s performance and make more accurate predictions or decisions.

Was this article valuable for you? Follow, subscribe, connect on LinkedIn/Kaggle and see you in my next “MLshorts” article! 👋

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

--

--