Naive Bayes is a probabilistic algorithm used in machine learning for classification tasks, based on Bayes' theorem. It assumes that features are conditionally independent, simplifying the calculation of the posterior probability of class labels given to evidence. Naive Bayes estimates the prior probability of each class label based on training data and the conditional probability of each feature given each class label. These probabilities are used to compute the posterior probability of each class label given the features of a test instance. Naive Bayes is widely used in text classification tasks, sentiment analysis, topic classification, image classification, and medical diagnosis. It is computationally efficient and works well with small training datasets.
Unlock the Power of Naive Bayes: Dive into 'Naive Bayes in Python' on AKSTATS!
data()
function. Next, we split the data into training and testing sets.Split the dataset into training and testing setsdata(iris)
set.seed(42) # For reproducibility train_indices = sample(1:nrow(iris), 0.8 * nrow(iris)) # 80% for training train_data = iris[train_indices, ] test_data = iris[-train_indices, ]
naiveBayes()
function from the "e1071" package. Then, we make predictions on the test set using the trained model.Make predictions on the test setlibrary(e1071) model = naiveBayes(Species ~ ., data = train_data)
predictions = predict(model, test_data)
confusionMatrix()
function to know the model performance. Plot the confusion matrixconfusion_matrix = table(Actual = test_data$Species, Predicted = predictions) print(confusion_matrix)
Plot a bar chart of predicted vs. actual classeslibrary(ggplot2) library(ggpubr) confusion_df = as.data.frame.matrix(confusion_matrix) confusion_df$Actual = rownames(confusion_df) confusion_df = tidyr::gather(confusion_df, key = "Predicted", value = "Count", -Actual) confusion_df$Predicted = factor(confusion_df$Predicted, levels = levels(test_data$Species)) confusion_df$Actual = factor(confusion_df$Actual, levels = levels(test_data$Species)) ggplot(confusion_df, aes(x = Predicted, y = Actual, fill = Count)) + geom_tile(color = "white") + scale_fill_gradient(low = "white", high = "steelblue") + theme_minimal() + labs(x = "Predicted", y = "Actual", fill = "Count", title = "Confusion Matrix")
predicted_actual = data.frame(Predicted = predictions, Actual = test_data$Species) ggplot(predicted_actual, aes(x = Predicted, fill = Actual)) + geom_bar() + scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73")) + theme_minimal() + labs(x = "Predicted", fill = "Actual", title = "Predicted vs. Actual Classes")