Random Forest in R

Random Forest is a machine-learning technique used for classification and regression tasks. It is an ensemble approach that improves performance by combining multiple decision trees. Each tree is trained on a distinct subset of data and features, using randomness to reduce connections and strengthen the model. The algorithm aggregates forecasts to generate final predictions, which are then used in classification tasks and regression tasks. The random forest outperforms other algorithms in handling high-dimensional data, resistance to overfitting, and capturing nonlinear correlations between features.

Unlock the Power of Random Forest: Dive into 'Random Forest in Python' on AKSTATS!

Let us see an example of an R program for a Random Forest using the 'PimaIndiansDiabetes' dataset from the "mlbench" package as the input data. We'll also include accuracy measures and a ROC diagram:
Load required libraries

library(mlbench)
library(randomForest)
library(caret)
library(dplyr)
library(pROC)

Load the Pima Indians Diabetes dataset

data(PimaIndiansDiabetes)

Split data into training and testing sets

set.seed(123)
trainIndex = createDataPartition(PimaIndiansDiabetes$diabetes, p = 0.7, list = FALSE)
trainData = PimaIndiansDiabetes[trainIndex, ]
testData = PimaIndiansDiabetes[-trainIndex, ]

Train Random Forest classifier

model = randomForest(diabetes ~ ., data = trainData)

Make predictions on the test set

predictions = predict(model, testData)

Plot confusion matrix

confusionMatrix(predictions, testData$diabetes)

Variable Importance Plot

varImpPlot(model, main = "Variable Importance")

The above plot lets us understand which variable has influenced the model more.

Plot ROC curve

predictions = ifelse(predictions == "pos", 1, 0)
roc = roc(testData$diabetes, predictions)
plot(roc, col = "blue", main = "Receiver Operating Characteristic (ROC) Curve",
     xlab = "False Positive Rate", ylab = "True Positive Rate",
     print.auc = TRUE, auc.polygon = TRUE, grid = TRUE)

In the case at hand, an AUC value of 0.740 indicates that the binary classification model has some discriminating power and is more effective at differentiating between the two groups than random chance. Although AUC values closer to 1 suggest improved performance, it could not be a perfect classifier. It's vital to take into account additional assessment metrics and the context of the app when evaluating the performance of the model because the precise meaning of the AUC number might also rely on the particular problem and dataset you are working with.

🎯🔍🔢 The confusion matrix helps know the model's performance. Check the "Accuracy Measures" post to interpret more about the result.

Translate

AKSTATS

Contact Form