top of page

How to create Decision Tree?

The decision tree is type of classification method in machine learning for discrete-valued functions that contains two main entities: Nodes and Branches that represents attributes and values respectively.


The order (from top to bottom) of nodes serves as the importance of attributes and the last node (also called leaf note) representing the decision. The decision tree classifies data by following the branches that contain the corresponding values from the data until reaches the leaf node that decides to which class such data should be inserted.




The picture above is an example of the decision tree that classifies whether the weather is suitable for playing tennis. The attribute outlook is more important than humidity and wind since it lies higher than the other attributes. The decision is either Yes or No, hence they are represented as the last node at the decision tree.


One basic algorithm for Decision Tree learning is the ID3 Algorithm: a top-down decision tree construction algorithm recursively using two statistical properties called information gain and entropy. Information Gain measures the entropy reduction and how well the attribute classifies the output values. Attribute with the highest Information Gain value is considered the best one. Generally, ID3 Algorithms has 5 steps for constructing a decision tree:


  1. Calculate Information Gain for each attribute and choose the best attribute

  2. Split the dataset into a subset based on the best attribute.

  3. Create a node for the best attribute

  4. If every row belongs to the same output value, then transform the node into leaf node with respect to the output value.

  5. Repeat for the remaining attributes until all attributes are covered, or every branch has leaf nodes.

Decision Tree provides intuitive rules and visualization without much computational cost and applicable for both numerical and categorical data and multi-output problems. However, this algorithm is prone to overfitting, which resulting very complex decision trees, and unstable since a slight change in data could create a whole different decision tree.

5 views0 comments

Comments


bottom of page