How to build a decision tree — Part 2

Raghavi_bala
3 min readMar 23, 2022

We’re going to consider an example and work on the Math behind constructing a decision tree.

I’ll be using certain terms like Entropy, Gini Impurity, KL Divergence, Information Gain in here. If you don’t know the meaning of these, go ahead and check out the first part where I’ve explained all these terms with formula and also a introduction to decision trees.

If you’re clear with the prerequisites let’s dive in straight.

First let’s calculate the Entropy of target variable. Here Play football is the target variable. There are total 14 data points, out of which 5 -No and 9 -Yes.

You can refer the above Part -1 for fomulas.

Outlook, Temperature, Wind and Humidity are the available features based on which we need to perform the split.

How do we calculate on which to choose first, is based on the Information Gain.

We are going to use ID3 algorithm which uses a top-down greedy approach. That is the trees are split from top to down and it’s called greedy because we choose the feature with maximum information gain.

ID3 uses Information Gain as splitting criteria.

Moving on let’s calculate information gain for each of the feature.

Humidity can take up 2 values, High and Normal. We need to calculate Entropy for both High and Low.

High [3+,4-] and Normal [6+,1-]

If we calculate Entropy for High and Normal seperately, like we did for target variable.

Entropy of High will be 0.985 and Normal will be 0.592.

Take pen and paper and calculate yourslef to understand better.

Now using the Entropy of High and Normal we need to calculate Information Gain.

Here the Entropy of Y is 0.940. And among the total distribution of Humidity, 7 are High and 7 are Normal.And High has a Entropy of 0.985 and Normal has Entropy of 0.592.

Information Gain(Humidity) = 0.940-(7/14)*0.985-(7/14)*0.592 = 0.151

If we do the same for Wind, we get the result as 0.048.

I’d reccomend you to try the calculation and cross check it.

The information gain of Temperature is 0.029 and Outlook is 0.247

Comparing the Information Gain of Temperature (0.029), Outlook (0.247), Humidity (0.151) and Wind (0.048).

Outlook has the maximum Information gain (0.247). Hence we split the tree based on Outlook.

After the first split with Outlook, we’ll be left with Wind (0.048), Humidity (0.151) and Temperature (0.029).

The next split will be based on Humidity as it has the maximum Information Gain among the left outs.

If we keep repeating until we arrive at final Yes / No conclusion, we’ll get our decision tree.

This is the ID3 for the given problem.

Hope you understood clearly.

--

--

Raghavi_bala

Data Science Machine Learning Data & Business Analytics