Preparing the Right Ingredients and Dialing In the Model
In Lesson 1, you learned that classic ML requires manual feature engineering — humans must decide which inputs matter. But how exactly do you do that? And once you have good features, how do you configure the model for the best results?
This lesson walks through both concepts with a concrete example: predicting whether a student will pass or fail an exam.
Imagine you have data about students preparing for an exam:
| Student | Study Hours | Sleep Hours | Date | Attendance | Notes |
|---|---|---|---|---|---|
| A | 4.5 | 7 | 2026-02-15 | 18/20 sessions | "Reviewed chapters 1-5, did practice problems" |
| B | 1.0 | 4 | 2026-03-01 | 9/20 sessions | "Skimmed notes" |
Raw data is messy. "18/20 sessions" is text, dates need context, and notes are unstructured. A model can't use this directly — we need to engineer features from it.
Feature engineering is the process of creating better inputs for your model from raw data. Good features make the difference between a mediocre model and a great one.
| Engineered Feature | How | Why It Helps |
|---|---|---|
attendance_rate |
18/20 = 0.90 | Percentage is more useful than raw counts |
sleep_quality |
Bin into "poor" (<5h), "ok" (5-7h), "good" (7+h) | Captures non-linear effect of sleep |
days_before_exam |
exam_date - study_date = 14 days | Studying early vs cramming matters |
study_intensity |
study_hours / days_before_exam | Captures pacing |
note_length |
Word count of notes | Proxy for engagement |
did_practice_problems |
"practice" in notes → 1, else 0 | Active recall is a strong predictor |
attendance=0.9, sleep=good, days_before=14, intensity=0.32, note_length=6, practice=1
attendance=0.45, sleep=poor, days_before=1, intensity=1.0, note_length=2, practice=0
Even a human can now guess who passes! That's the sign of well-engineered features — they make the patterns obvious, both for humans and for models.
Once you have good features, you need to configure your model. Hyperparameters are settings you choose before training — they control how the model learns.
Let's say you use a Gradient Boosting model (a popular classic ML algorithm). Here are the key knobs to turn:
| Hyperparameter | Too Low | Too High | Sweet Spot |
|---|---|---|---|
n_estimators (# of trees) |
10 → underfits | 5000 → slow, overfits | ~200 |
learning_rate |
0.001 → learns too slowly | 1.0 → overshoots | ~0.1 |
max_depth |
1 → too simple | 20 → memorizes students | ~4 |
You try different combinations and compare results:
learning_rate=0.1
max_depth=4
n_estimators=200
82%
learning_rate=0.05
max_depth=6
n_estimators=300
86%
learning_rate=0.01
max_depth=10
n_estimators=500
79%
(overfitting)
Underfitting: Model is too simple — it misses patterns in the data (too low accuracy on everything).
Overfitting: Model memorizes the training data — it performs great on training data but poorly on new data.
The sweet spot is in between: a model that learns the real patterns without memorizing noise.
Studying the right material for the exam
(what you prepare)
Figuring out the best study method
(how you learn)
Remember how Lesson 1 showed that classic ML requires manual feature engineering while deep learning learns features automatically? You've now seen exactly what that manual process looks like. In the next lesson, you'll learn about the building blocks of neural networks — the technology that makes automatic feature learning possible.
You've seen how classic ML prepares data. Now discover the building blocks of neural networks — the technology that learns features automatically!