Naive Bayes in R

Naive Bayes is a probabilistic algorithm used in machine learning for classification tasks, based on Bayes' theorem. It assumes that features are conditionally independent, simplifying the calculation of the posterior probability of class labels given to evidence. Naive Bayes estimates the prior probability of each class label based on training data and the conditional probability of each feature given each class label. These probabilities are used to compute the posterior probability of each class label given the features of a test instance. Naive Bayes is widely used in text classification tasks, sentiment analysis, topic classification, image classification, and medical diagnosis. It is computationally efficient and works well with small training datasets.

Unlock the Power of Naive Bayes: Dive into 'Naive Bayes in Python' on AKSTATS!

Let us see an example of an R program for a Decision Tree classifier using the 'iris' dataset:

In this program, we load the required libraries "e1071" for the Naive Bayes classifier. We then load the "iris" dataset using the data() function. Next, we split the data into training and testing sets.

Load the iris dataset

data(iris)

Split the dataset into training and testing sets

set.seed(42)  # For reproducibility
train_indices = sample(1:nrow(iris), 0.8 * nrow(iris))  # 80% for training
train_data = iris[train_indices, ]
test_data = iris[-train_indices, ]

We train the Naive Bayes classifier using the naiveBayes() function from the "e1071" package. Then, we make predictions on the test set using the trained model.

library(e1071)
model = naiveBayes(Species ~ ., data = train_data)

Make predictions on the test set

predictions = predict(model, test_data)

After that, we calculate accuracy using the confusionMatrix() function to know the model performance.

confusion_matrix = table(Actual = test_data$Species, Predicted = predictions)
print(confusion_matrix)

Plot the confusion matrix

library(ggplot2)
library(ggpubr)
confusion_df = as.data.frame.matrix(confusion_matrix)
confusion_df$Actual = rownames(confusion_df)
confusion_df = tidyr::gather(confusion_df, key = "Predicted", value = "Count", -Actual)
confusion_df$Predicted = factor(confusion_df$Predicted, levels = levels(test_data$Species))
confusion_df$Actual = factor(confusion_df$Actual, levels = levels(test_data$Species))

ggplot(confusion_df, aes(x = Predicted, y = Actual, fill = Count)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "white", high = "steelblue") +
  theme_minimal() +
  labs(x = "Predicted", y = "Actual", fill = "Count", title = "Confusion Matrix")

Plot a bar chart of predicted vs. actual classes

predicted_actual = data.frame(Predicted = predictions, Actual = test_data$Species)
ggplot(predicted_actual, aes(x = Predicted, fill = Actual)) +
  geom_bar() +
  scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73")) +
  theme_minimal() +
  labs(x = "Predicted", fill = "Actual", title = "Predicted vs. Actual Classes")

🔍🔢 The confusion matrix helps know the model's performance. Check the "Accuracy Measures" post to interpret more about the result.

Translate

AKSTATS

Contact Form