Decisions, Decisions… — part II

The Numerical Decision Tree + Random Forest method

Published in

Artificial Intelligence in Plain English

5 min readJan 18, 2021

In the previous part of this series (the implementation of) the Categorical Decision tree was discussed, but there are also Numerical Decision trees and that is what this article is covering today!

The Numerical Decision Tree

The definition of ID3 described in part I of this series is restricted to taking on a discrete set of values. The target value, that is learned by the decision tree, as well as the attributes that are tested in the decision nodes must be discrete valued. But it is possible to extend the Categorical Decision Tree to also include numerical boundaries, so continuous-valued decision attributes can also be incorporated in the decision tree. These boundaries can be used to select the best Information Gain for both the categorical and numerical splits.

The most important difference between categorical and numerical decision trees is the splitting of the data. With categorical data, the data is split into categories. This is impossible to do with numerical data, for there are an infinite amount of categories. Therefore the Categorical Decision tree uses binary splits, also known as boundary splits, where a new branches are created each split (one for values smaller than the boundary and one for values larger than or equal to the boundary).

By applying repeated binary splits on a numeric variable, different decisions regions can be created. This means that the feature that was split on, will not be ignored in further splits in a Numerical Decision tree, unlike in a Categorical Decision tree. So, in other words, this way of splitting even improves the accuracy.

This time the tree won’t be constructed from scratch, but the scikit-learn library will be used. The DecisionTreeClassifier from this library can build a Decision tree from numerical data. The numerical part of the data split before (in part I) will be trained and the accuracies of both the training and test set will be visualized:

The scikit-learn Decision trees implementation is only fit for numerical features, so it does not actually support categorical data. This implementation can still be made to work by numbering the categories. This is actually strongly discouraged as it will make suggestions about the categorical data, namely their numerical order. Since the used data already has numbers for each category, this implementation can be used.

From this the training and test accuracies can also be printed as well as from the complete mixed data, numerical and categorical:

Trees learn by beginning with the full training set and greedily adding conditions that maximize the following node’s label purity. This way the leaf nodes grow exponentially as more depth is added to the tree. This way the trees are prone to overfit, which can be solved by pruning the tree or setting a threshold on the minimum of the Information Gain.

With DecisionTreeClassifier from the scikit-learn library the min_samples_split function can be used to set the minimum number of samples that are required to be in a node before splitting. That way it stops splitting and makes the predicted label the most common label.

The physical tree can also be plotted with another argument of the DesicionTreeClassifier, with which the max_depth can be set so the tree will stay readable:

It’s more readable in Jupyter Notebook :)

However, reducing overfitting can also be realised by applying the ensemble approach. One of the most common ensemble methods for decision trees is the Random Forest method. This method leads to a better model performance, for in this method a lot of decision trees are being trained on a random subset of the data after which they all predict an unseen sample and the majority vote is used as the final prediction. A single tree is highly sensitive to noise in the training data, unlike the average of many decision trees trained on similar data.

A simple but effective implementation of the Random Forest will be shown, also using the DecisionTreeClassifier. This way first a function called create_forest that accepts the number of trees that should be in the forest and returns a list containing all the DecisionTreeCalssifier objects:

Maximum amount of features changed to the square root of the number of features

Next, the train_forest function is written that accepts the forest and trains each of the trees using the ratio samples from the data:

To predict the target label for the given data, the predict_forest function is written. This function accepts a forest and the data and returns the predicted target label:

In this case there are only 2 possible predictions, True or False, which are equal to 1 or 0. The predictions can therefore just be added to each other and devided by the number of trees so the average vote can be rounded to either 1 or 0

Now a random forest can be created. In this case 1000 trees are created, that each only use 3 or 4 features and train each forest with a 50% random sample of the complete data set. Using these values, the training and test accuracies can be printed out for both the Random Forest method and 1 tree:

As can be derived from comparing the training and test accuracy, the Random Forest method does perform better. This brings me to the end of the Decision Tree series where I discussed both the Categorical Decision tree as well as the Numerical Decision tree. I hope you have a better understanding of what Decision Trees are and how they work.

Decisions, Decisions… — part II

The Numerical Decision Tree + Random Forest method

The Numerical Decision Tree

Written by Mina Suntea