Random Forest is a machine-learning technique used for classification and regression tasks. It is an ensemble approach that improves performance by combining multiple decision trees. Each tree is trained on a distinct subset of data and features, using randomness to reduce connections and strengthen the model. The algorithm aggregates forecasts to generate final predictions, which are then used in classification tasks and regression tasks. The random forest outperforms other algorithms in handling high-dimensional data, resistance to overfitting, and capturing nonlinear correlations between features.
Unlock the Power of Random Forest: Dive into 'Random Forest in Python' on AKSTATS!
Let us see an example of an R program for a Random Forest using the 'PimaIndiansDiabetes' dataset from the "mlbench" package
as the input data. We'll also include accuracy measures and a ROC
diagram:
Load required libraries
library(mlbench)
library(randomForest)
library(caret)
library(dplyr)
library(pROC)
Load the Pima Indians Diabetes datasetdata(PimaIndiansDiabetes)
Split data into training and testing setsset.seed(123)
trainIndex = createDataPartition(PimaIndiansDiabetes$diabetes, p = 0.7, list = FALSE)
trainData = PimaIndiansDiabetes[trainIndex, ]
testData = PimaIndiansDiabetes[-trainIndex, ]
Train Random Forest classifiermodel = randomForest(diabetes ~ ., data = trainData)
Make predictions on the test setpredictions = predict(model, testData)
Plot confusion matrixconfusionMatrix(predictions, testData$diabetes)
Variable Importance PlotvarImpPlot(model, main = "Variable Importance")
![]() |
The above plot lets us understand which variable has influenced the model more. |
predictions = ifelse(predictions == "pos", 1, 0)
roc = roc(testData$diabetes, predictions)
plot(roc, col = "blue", main = "Receiver Operating Characteristic (ROC) Curve",
xlab = "False Positive Rate", ylab = "True Positive Rate",
print.auc = TRUE, auc.polygon = TRUE, grid = TRUE)