Q&A 2 How do you split a dataset into training and test sets?
2.1 Explanation
Splitting the data allows you to train a model on one portion and evaluate it on another, unseen portion. This helps estimate real-world performance.
## Python Code
from sklearn.model_selection import train_test_split
import pandas as pd
df = pd.read_csv("data/iris.csv")
X = df.drop("species", axis=1)
y = df["species"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("Training set size:", len(X_train))
Training set size: 105