Decision trees are simple yet powerful. They offer advanced machine learning with relatively high interpretability (contrasting powerful “black box” algorithms such as neural networks). If you’ve ever wondered how computers are able to learn for themselves to address important questions, then stick around for a few.
My particular interest is in how Artificial Intelligence can be applied to Healthcare, hence I’ll be using examples of this to show you how decision trees function in the wild.
The name of the game is to predict, or group, patients with certain health characteristics into those with respiratory disease vs those without.
Decision trees were first considered in a statistical sense by a British statistician named William Belson (You can find the paper here). They’ve grown in popularity in recent years due to their implementation within the machine learning world and specifically their use within random forests. Hence, I’ll focus on their application within this sphere.
Fundamentally, you can think of a decision tree as a set of discriminating questions that separate input data. They function to classify categorical or continuous observations into meaningful groups based on their outcome. Their 4 main components are the following:
- Root Nodes
- Internal Nodes
- Leaf Nodes
- Branches