Data science: Simple gradient descent in Python

From convergence to kaboom!

I recently started a Coursera online class called “Machine Learning” given by Andrew Ng at Stanford University. In the first two weeks of the class, one of the topics covered is the gradient descent algorithm for doing linear regression. In order to help me understand the concepts, I wrote a  Python script that implements the algorithm for a simple case.

In this example, the data lies on a perfect straight line itself, so the algorithm should have no problem finding a straight line solution. You can find the details about gradient descent here.

The Python script I wrote was run using IDLE and Python 3.7.2.

Here is the entire script:

# Example Python script that uses the gradient descent algorithm to
# create a linear regression model for the case of a
# single variable (feature).
# In this example we are fitting a line to data that is
# actually on a straight line.

# module import
import random

# global definitions and function definitions

# the size of the training data set
m = 20

# the range of the feature values
featureRange = 10.0

# the actual coefficients for the linear equation
A = 1.0
B = 2.0

# define the actual linear relationship we
# should converge to.
def H(x):
    return A + B*x

# define a function to calculate the hypothesis
def h(z):
    return theta0 + theta1*z

# define the cost function J
def J():
    total = 0.0
    for i in range(m):
        total = total + (h(x[i]) - y[i])**2
    return total/(2*m)

# define a function to calculate the partial derivative of
# J with respect to theta0
def dJ0():
    total = 0.0
    for i in range(m):
        total = total + h(x[i]) - y[i]
    return (total/m)

# define a function to calculate the partial derivative of
# J with respect to theta1
def dJ1():
    total = 0.0
    for i in range(m):
        total = total + (h(x[i]) - y[i])*x[i]
    return (total/m)

# define a function to test for the convergence of gradient descent
def doneFN():
    if J()<0.000012:
        return True
        return False
# initialize the feature and result values
x = [None]*m
y = [None]*m

for i in range(m):
    x[i] = random.random()*featureRange
    y[i] = H(x[i])

# perform the gradient descent calculations
nstep = 0

# make an initial guess for theta 0 and 1
theta0 = 0.5
theta1 = 0.5

# set the learning rate
alpha = 0.05

# loop until the values of theta 0 and 1 converge
while not doneFN():
    nstep = nstep+1

    # calculate the new theta values
    theta0new = theta0 - alpha*dJ0()
    theta1new = theta1 - alpha*dJ1()

    # update the theta values
    theta0 = theta0new
    theta1 = theta1new
    # print the current state    
    print("n = %3d, theta0 = %2.4f, theta1 = %2.4f, J = %3.7f" % \

When I used the script to experiment with different values for the learning rate, I discovered that small changes in the learning rate can cause the algorithm to blow up and not converge. In the graph below, which displays how long the algorithm took to converge versus learning rate, you can see that when I changed the rate from 0.056 to 0.058 I went from nice convergence to no convergence (notice how the curve starts to go up at the last point (0.057)!

Gradient descent convergence
Gradient descent convergence