Naive Bayes is a probabilistic algorithm used in machine learning for classification tasks, based on Bayes' theorem. It assumes that features are conditionally independent, simplifying the calculation of the posterior probability of class labels given to evidence. Naive Bayes estimates the prior probability of each class label based on training data and the conditional probability of each feature given each class label. These probabilities are used to compute the posterior probability of each class label given the features of a test instance. Naive Bayes is widely used in text classification tasks, sentiment analysis, topic classification, image classification, and medical diagnosis. It is computationally efficient and works well with small training datasets.
Unlock the Power of Naive Bayes: Dive into 'Naive Bayes in Python' on AKSTATS!
data()
function. Next, we split the data into training and testing sets.data(iris)
Split the dataset into training and testing setsset.seed(42) # For reproducibility
train_indices = sample(1:nrow(iris), 0.8 * nrow(iris)) # 80% for training
train_data = iris[train_indices, ]
test_data = iris[-train_indices, ]
naiveBayes()
function from the "e1071" package. Then, we make predictions on the test set using the trained model.library(e1071)
model = naiveBayes(Species ~ ., data = train_data)
Make predictions on the test setpredictions = predict(model, test_data)
confusionMatrix()
function to know the model performance. confusion_matrix = table(Actual = test_data$Species, Predicted = predictions)
print(confusion_matrix)
Plot the confusion matrixlibrary(ggplot2)
library(ggpubr)
confusion_df = as.data.frame.matrix(confusion_matrix)
confusion_df$Actual = rownames(confusion_df)
confusion_df = tidyr::gather(confusion_df, key = "Predicted", value = "Count", -Actual)
confusion_df$Predicted = factor(confusion_df$Predicted, levels = levels(test_data$Species))
confusion_df$Actual = factor(confusion_df$Actual, levels = levels(test_data$Species))
ggplot(confusion_df, aes(x = Predicted, y = Actual, fill = Count)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "white", high = "steelblue") +
theme_minimal() +
labs(x = "Predicted", y = "Actual", fill = "Count", title = "Confusion Matrix")
Plot a bar chart of predicted vs. actual classespredicted_actual = data.frame(Predicted = predictions, Actual = test_data$Species)
ggplot(predicted_actual, aes(x = Predicted, fill = Actual)) +
geom_bar() +
scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73")) +
theme_minimal() +
labs(x = "Predicted", fill = "Actual", title = "Predicted vs. Actual Classes")