Support vector Machine (SVM) in Python

Support Vector Machine (SVM) is a widely used machine learning technique for classification and regression tasks. Its key principle is to identify a hyperplane that separates the data points into two classes with the largest margin. The margin is the distance between the hyperplane and the closest data points of both classes. SVM is ideal for linear classification tasks, but if the data points cannot be separated linearly, a kernel function is used to move them to a higher-dimensional feature space where they may become separable.

SVM is widely used in various industries due to its ability to handle complex datasets and high accuracy. SVM can employ various kernel functions, including linear, polynomial, and radial basis functions. Nonetheless, the choice of kernel function may impact the model's accuracy and training speed. In practice, it is necessary to fine-tune the SVM parameters for optimum performance, and it is critical to prevent overfitting by selecting an optimal margin.

"Unlock the power of machine learning with AKSTATS' "Support Vector Machine (SVM) in R" article - Learn the art of classification and regression with the most powerful tool in your data science field!"

Load the required packages

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

Load the dataset

cancer = pd.read_csv(cancar.csv)

cancer.columns = ["id", "diagnosis", "radius_mean", "texture_mean", "perimeter_mean", "area_mean", "smoothness_mean", "compactness_mean", "concavity_mean", "concave_points_mean", "symmetry_mean", "fractal_dimension_mean", "radius_se", "texture_se", "perimeter_se", "area_se", "smoothness_se", "compactness_se", "concavity_se", "concave_points_se", "symmetry_se", "fractal_dimension_se", "radius_worst", "texture_worst", "perimeter_worst", "area_worst", "smoothness_worst", "compactness_worst", "concavity_worst", "concave_points_worst", "symmetry_worst", "fractal_dimension_worst"]

cancer = cancer.drop("id", axis=1)

cancer.head(10)

Preprocess the data

X = cancer.drop("diagnosis", axis=1)
y = cancer["diagnosis"]
imp = SimpleImputer(strategy="mean")
X = imp.fit_transform(X)
scaler = StandardScaler()
X = scaler.fit_transform(X)

Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)

Fit an SVM model with a radial basis function kernel

svm_rbf = SVC(kernel="rbf", gamma="scale")
svm_rbf.fit(X_train, y_train)

Make predictions on the testing set

pred = svm_rbf.predict(X_test)

Calculate accuracy measures

accuracy = accuracy_score(y_test, pred)
precision = precision_score(y_test, pred, pos_label="M")
recall = recall_score(y_test, pred, pos_label="M")
f1 = f1_score(y_test, pred, pos_label="M")
metrics = pd.DataFrame({"Accuracy": [accuracy], "Precision": [precision], "Recall": [recall], "F1": [f1]})
print(metrics)

Calculate and plot the ROC curve

fpr, tpr, thresholds = roc_curve(y_test, svm_rbf.decision_function(X_test), pos_label="M")
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, lw=1, label="ROC curve (area = %0.2f)" % roc_auc)
plt.plot([0, 1], [0, 1], "--", color="gray", label="Random guess")
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()

Summary

The post concludes with an evaluation of the SVM model's performance on the cancer data. The accuracy, precision, recall, and f1 score of the model, and provide a confusion matrix to visualize the results. Overall, the post serves as a comprehensive introduction to SVM in Python for classification tasks, using cancer data as an example.

Summary

Translate

AKSTATS

Contact Form