Q&A 4 How do you evaluate a classification model?

4.1 Explanation

Evaluation metrics for classification include:

Accuracy: Proportion of correct predictions.
Precision: Of all predicted positives, how many were actually positive.
Recall (Sensitivity): Of all actual positives, how many were predicted positive.
F1-Score: Harmonic mean of precision and recall.
Confusion Matrix: Table showing correct vs incorrect predictions per class.

These metrics give a more complete view than accuracy alone, especially for imbalanced datasets.

4.2 Python Code

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd

# Load and subset iris dataset
df = pd.read_csv("data/iris.csv")

# ✅ Use .copy() to avoid SettingWithCopyWarning when modifying DataFrame
binary_df = df[df["species"].isin(["setosa", "versicolor"])].copy()

# Convert class labels to binary (0 = setosa, 1 = versicolor)
binary_df["species"] = binary_df["species"].map({"setosa": 0, "versicolor": 1})

# Split into training and test sets
X = binary_df.drop("species", axis=1)
y = binary_df["species"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)

# Print confusion matrix and full classification report
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Confusion Matrix:
 [[17  0]
 [ 0 13]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        13

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

4.3 R Code

library(readr)
library(caret)

# Load and subset iris dataset
df <- read_csv("data/iris.csv")
df_bin <- subset(df, species %in% c("setosa", "versicolor"))
df_bin$species <- factor(df_bin$species, levels = c("setosa", "versicolor"))

# Split into training and test sets
set.seed(42)
index <- createDataPartition(df_bin$species, p = 0.7, list = FALSE)
train <- df_bin[index, ]
test <- df_bin[-index, ]

# Train logistic regression model
model <- glm(species ~ sepal_length + sepal_width + petal_length + petal_width,
             data = train, family = "binomial")

# Predict probabilities and convert to classes
pred_probs <- predict(model, newdata = test, type = "response")
predicted <- factor(ifelse(pred_probs > 0.5, "versicolor", "setosa"),
                    levels = levels(test$species))

# Evaluate using confusion matrix (includes accuracy, sensitivity, specificity, etc.)
confusionMatrix(predicted, test$species)

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor
  setosa         15          0
  versicolor      0         15
                                     
               Accuracy : 1          
                 95% CI : (0.8843, 1)
    No Information Rate : 0.5        
    P-Value [Acc > NIR] : 9.313e-10  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         
                                     
            Sensitivity : 1.0        
            Specificity : 1.0        
         Pos Pred Value : 1.0        
         Neg Pred Value : 1.0        
             Prevalence : 0.5        
         Detection Rate : 0.5        
   Detection Prevalence : 0.5        
      Balanced Accuracy : 1.0        
                                     
       'Positive' Class : setosa