Random Forest is a machine-learning technique used for classification and regression tasks. It is an ensemble approach that improves performance by combining multiple decision trees. Each tree is trained on a distinct subset of data and features, using randomness to reduce connections and strengthen the model. The algorithm aggregates forecasts to generate final predictions, which are then used in classification tasks and regression tasks. The random forest outperforms other algorithms in handling high-dimensional data, resistance to overfitting, and capturing nonlinear correlations between features.
Unlock the Power of Random Forest: Dive into 'Random Forest in Python' on AKSTATS!
Let us see an example of an R program for a Random Forest using the 'PimaIndiansDiabetes' dataset from the "mlbench" package
as the input data. We'll also include accuracy measures and a ROC
diagram:
Load required libraries
Load the Pima Indians Diabetes datasetlibrary(mlbench) library(randomForest) library(caret) library(dplyr) library(pROC)
Split data into training and testing setsdata(PimaIndiansDiabetes)
Train Random Forest classifierset.seed(123) trainIndex = createDataPartition(PimaIndiansDiabetes$diabetes, p = 0.7, list = FALSE) trainData = PimaIndiansDiabetes[trainIndex, ] testData = PimaIndiansDiabetes[-trainIndex, ]
Make predictions on the test setmodel = randomForest(diabetes ~ ., data = trainData)
Plot confusion matrixpredictions = predict(model, testData)
Variable Importance PlotconfusionMatrix(predictions, testData$diabetes)
varImpPlot(model, main = "Variable Importance")
The above plot lets us understand which variable has influenced the model more. |
predictions = ifelse(predictions == "pos", 1, 0) roc = roc(testData$diabetes, predictions) plot(roc, col = "blue", main = "Receiver Operating Characteristic (ROC) Curve", xlab = "False Positive Rate", ylab = "True Positive Rate", print.auc = TRUE, auc.polygon = TRUE, grid = TRUE)