Lesson 3: Supervised Learning - From Linear Models to Deep Neural Networks

Introduction

Welcome to Lesson 3! Today, we're diving into supervised learning, starting from simple linear models and working our way up to deep neural networks. By the end of this lesson, you'll understand the progression of machine learning models and be able to implement them yourself!

We'll cover linear regression, logistic regression, and multilayer perceptrons (MLPs). Don't worry if these terms sound intimidating - we'll break them down with simple analogies and hands-on examples.

Linear Regression: Finding the Best Fit Line

Imagine you're trying to guess how much a house costs based only on its size. You might draw a straight line through a scatter plot of house sizes and prices. This is essentially what linear regression does - it finds the best straight line to fit your data.

In machine learning terms, we're trying to find the weight (slope) and bias (y-intercept) that minimize the difference between our predictions and the actual values.

Let's look at a simple implementation of linear regression:


import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Generate sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Make predictions
X_test = np.array([0, 6]).reshape(-1, 1)
y_pred = model.predict(X_test)

# Plot the results
plt.scatter(X, y, color='blue', label='Data points')
plt.plot(X_test, y_pred, color='red', label='Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

print(f"Weight: {model.coef_[0]:.2f}")
print(f"Bias: {model.intercept_:.2f}")

This code creates a simple linear regression model, trains it on some data, and visualizes the results.

Interactive Visualization: Linear Regression

Let's visualize how linear regression works with an interactive plot. Click the "Train Model" button to see how the line adjusts to fit the data points.

Current model: Y = 1.00X + 0.00

Epochs trained: 0

Logistic Regression: Making Binary Decisions

While linear regression predicts continuous values, logistic regression is used for binary classification problems. It's like a yes/no decision maker.

Imagine you're trying to predict whether an email is spam or not based on certain features. Logistic regression would help you draw a decision boundary between the "spam" and "not spam" categories.

Here's a simple implementation of logistic regression:


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate a random dataset
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2,
                           n_clusters_per_class=1, random_state=42)

# Create and train the logistic regression model
model = LogisticRegression()
model.fit(X, y)

# Create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

# Make predictions on the mesh
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the results
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X[:, 0], X[:, 1], c=y, alpha=0.8)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.show()

This code creates a logistic regression model, trains it on some randomly generated data, and visualizes the decision boundary.

Multilayer Perceptrons (MLPs): Going Deeper

Now, let's take a step towards deep learning with Multilayer Perceptrons (MLPs). An MLP is like a more complex version of our linear and logistic regression models.

Imagine a factory assembly line where each station adds more complexity to the product. Similarly, each layer in an MLP transforms the input data, allowing the network to learn more complex patterns.

Let's implement a simple MLP using PyTorch:


import torch
import torch.nn as nn
import torch.optim as optim

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        return x

# Create the model, loss function, and optimizer
input_size = 2
hidden_size = 5
output_size = 1
model = MLP(input_size, hidden_size, output_size)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Generate some sample data
X = torch.randn(100, input_size)
y = (X[:, 0] + X[:, 1]).unsqueeze(1)

# Training loop
for epoch in range(1000):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, y)
    
    # Backward pass and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/1000], Loss: {loss.item():.4f}')

# Test the model
test_input = torch.tensor([[1.0, 2.0]])
prediction = model(test_input)
print(f"Prediction for input {test_input}: {prediction.item():.2f}")

This code defines a simple MLP with one hidden layer, trains it on some sample data, and makes a prediction.

Loss Functions and Optimizers

Two crucial components in training these models are loss functions and optimizers.

A loss function is like a score that tells us how well our model is performing. The lower the score, the better. In our examples, we used Mean Squared Error (MSE) for regression tasks.

An optimizer is like a guide that helps our model find the best path to minimize the loss. It decides how to update the model's parameters based on the loss. We used Stochastic Gradient Descent (SGD) in our examples.

Different problems might require different loss functions and optimizers. For example, binary cross-entropy is commonly used as a loss function for binary classification problems.

Challenge: Extend the MLP

Now it's your turn! Try to extend our simple MLP to solve a more complex problem. Here are some ideas:

Add more hidden layers to the network
Use a different activation function (like sigmoid or tanh)
Try to solve a classification problem instead of regression
Experiment with different optimizers (like Adam or RMSprop)
Implement early stopping to prevent overfitting

This challenge will help you get comfortable with building and modifying neural networks, and understanding how different components affect the model's performance.

Additional Resources

Previous Lesson Next Lesson: Computer Vision with Convolutional Neural Networks (CNNs)