Decision Tree in R

A decision tree is a supervised machine-learning technique for solving classification and regression issues. It is a tree-like model with internal nodes representing attribute tests, branches representing test results, and leaf nodes representing class labels or continuous values. The algorithm partitions data recursively based on the attribute with the most information gain or minimal impurity, minimizing entropy. However, it is susceptible to overfitting and can be improved through pruning or establishing a minimum number of instances per leaf node.

Unlock the Power of Decision Tree: Dive into 'Decision Tree in Python' on AKSTATS!

Let us see an example of an R program for a Decision Tree classifier using the 'PimaIndiansDiabetes' dataset from the "mlbench" package as the input data. We'll also include accuracy measures and a decision tree diagram.:

Please make sure to install the required packages before running the program by using the install.packages() function if necessary.

Load required libraries

library(mlbench)
library(rpart)
library(rpart.plot)
library(caret)

Load the Pima Indians Diabetes dataset: We then load the "Pima Indians Diabetes" dataset using the data() function.

data(PimaIndiansDiabetes)

Split data into training and testing sets: Next, we split the data into training and testing sets using the createDataPartition() function from the "caret" package.

set.seed(123)
trainIndex = createDataPartition(PimaIndiansDiabetes$diabetes, p = 0.7, list = FALSE)
trainData = PimaIndiansDiabetes[trainIndex, ]
testData = PimaIndiansDiabetes[-trainIndex, ]
trainData
testData

We train the Decision Tree classifier using the rpart() function from the "rpart" package. The method parameter is set to "class" for classification.

Note: Refer to the R help command, to know about the sections in the model

Train Decision Tree classifier

model = rpart(diabetes ~ ., data = trainData, method = "class")

Here, if you want to know more about how the model is fitted which means the prob, loss, and variable importance. You can use summary(model) to see it. Not only in this case, but you can also view it in any of the models, especially in the R program

Plot decision tree

rpart.plot(model)

Click the photo and zoom it for a better view

Make predictions on the test set

predictions = predict(model, testData, type = "class")

Confusion matrix

confusionMatrix(predictions, testData$diabetes)

📊 The Decision Tree classifier achieved an accuracy of 76% on the PimaIndiansDiabetes dataset! 🎯🔍🔢 The confusion matrix helps know the model's performance.

Check the "Accuracy Measures" post to interpret more about the result.

Translate

AKSTATS

Contact Form