Algorithm Explainer

K-Nearest Neighbors

KNN classifies a new point by finding the “k” closest points in the feature space. Here we have k = 3 and use Euclidean distance:

d = √((x₂ - x₁)² + (y₂ - y₁)²)

We'll see each neighbor's distance and watch how the majority vote decides the final label. Click Next Step to walk through each neighbor, or Replay to start over.

Decision Tree

A Decision Tree splits data along feature thresholds to purify classes. Here, imagine we have a dataset of houses with features: House Size (sq ft) (x-axis) vs. House Age (years) (y-axis), with two classes (blue vs. red).

We’ll show two splits (vertical or horizontal lines) that separate red from blue in multiple steps. In practice, more splits (and deeper levels) can yield a more accurate tree, but also risk overfitting.

Random Forest

A Random Forest trains multiple decision trees (often on random subsets of features/data). Each tree splits the space differently. The forest’s final decision is the majority vote across all trees.

Below, we’ll show three trees with different splits. Each step reveals the lines for a new tree. Finally, we show the test point’s predicted label by majority vote.

Linear Regression

Linear Regression fits a line (in 2D) to minimize the sum of squared errors. We can visualize data points (x, y) and see how the line changes as we do steps of gradient descent to find the best slope m and intercept b.

In each step, we’ll show an updated line in the canvas, and in the box below we’ll mention SSE or a simpler conceptual approach.

Logistic Regression

Logistic Regression models the probability that an input belongs to a certain class using a sigmoid function:

p = 1 / (1 + e^(-(w₀ + w₁x₁ + w₂x₂ + ...)))

We'll show points of two classes (red, blue) and a line that emerges from trained weights. Each step reveals more about the boundary or computed probabilities.

Metrics

Classification metrics often revolve around the confusion matrix. We'll illustrate each cell (TP, TN, FP, FN) step by step, then show formula highlights for accuracy, precision, and recall.