A decision tree is a supervised machine-learning technique for solving
classification and regression issues. It is a tree-like model with internal
nodes representing attribute tests, branches representing test results, and
leaf nodes representing class labels or continuous values. The algorithm
partitions data recursively based on the attribute with the most information
gain or minimal impurity, minimizing entropy. However, it is susceptible to
overfitting and can be improved through pruning or establishing a minimum
number of instances per leaf node.
Unlock the Power of Decision Tree: Dive into 'Decision Tree in Python' on AKSTATS!
Let us see an example of an R program for a Decision Tree classifier using the 'PimaIndiansDiabetes' dataset from the "mlbench" package as the input data. We'll also include accuracy measures and a decision tree diagram.:
Please make sure to install the required packages before running the program by using the
install.packages()
function if necessary.Load required libraries
Load the Pima Indians Diabetes dataset: We then load the "Pima Indians Diabetes" dataset using thelibrary(mlbench) library(rpart) library(rpart.plot) library(caret)
data()
function.data(PimaIndiansDiabetes)
Split data into training and testing sets: Next, we split the data into training and testing sets using the
createDataPartition()
function from the "caret" package.set.seed(123) trainIndex = createDataPartition(PimaIndiansDiabetes$diabetes, p = 0.7, list = FALSE) trainData = PimaIndiansDiabetes[trainIndex, ] testData = PimaIndiansDiabetes[-trainIndex, ] trainData testData
We train the Decision Tree classifier using the
rpart()
function from the "rpart" package. The method
parameter is set to "class" for classification. Note: Refer to the R help command, to know about the sections in the model
Train Decision Tree classifier
model = rpart(diabetes ~ ., data = trainData, method = "class")
Here, if you want to know more about how the model is fitted which means the
prob, loss, and variable importance. You can use
summary(model)
to see it. Not only in this case, but you can also view it in any of the
models, especially in the R program
rpart.plot(model)
Click the photo and zoom it for a better view |
Confusion matrixpredictions = predict(model, testData, type = "class")
confusionMatrix(predictions, testData$diabetes)
📊 The Decision Tree classifier achieved an accuracy of 76% on the PimaIndiansDiabetes dataset! 🎯🔍🔢 The confusion matrix helps know the model's performance.
Check the "Accuracy Measures" post to interpret more about the result.