I recently started a Coursera online class called “Machine Learning” given by Andrew Ng at Stanford University. In the first two weeks of the class, one of the topics covered is the gradient descent algorithm for doing linear regression. In order to help me understand the concepts, I wrote a Python script that implements the algorithm for a simple case.

In this example, the data lies on a perfect straight line itself, so the algorithm should have no problem finding a straight line solution. You can find the details about gradient descent here.

The Python script I wrote was run using IDLE and Python 3.7.2.

Here is the entire script:

# # Example Python script that uses the gradient descent algorithm to # create a linear regression model for the case of a # single variable (feature). # # In this example we are fitting a line to data that is # actually on a straight line. # # module import import random # global definitions and function definitions # the size of the training data set m = 20 # the range of the feature values featureRange = 10.0 # the actual coefficients for the linear equation A = 1.0 B = 2.0 # define the actual linear relationship we # should converge to. def H(x): return A + B*x # define a function to calculate the hypothesis def h(z): return theta0 + theta1*z # define the cost function J def J(): total = 0.0 for i in range(m): total = total + (h(x[i]) - y[i])**2 return total/(2*m) # define a function to calculate the partial derivative of # J with respect to theta0 def dJ0(): total = 0.0 for i in range(m): total = total + h(x[i]) - y[i] return (total/m) # define a function to calculate the partial derivative of # J with respect to theta1 def dJ1(): total = 0.0 for i in range(m): total = total + (h(x[i]) - y[i])*x[i] return (total/m) # define a function to test for the convergence of gradient descent def doneFN(): if J()<0.000012: return True else: return False # initialize the feature and result values x = [None]*m y = [None]*m random.seed(1) for i in range(m): x[i] = random.random()*featureRange y[i] = H(x[i]) # perform the gradient descent calculations nstep = 0 # make an initial guess for theta 0 and 1 theta0 = 0.5 theta1 = 0.5 # set the learning rate alpha = 0.05 # loop until the values of theta 0 and 1 converge while not doneFN(): nstep = nstep+1 # calculate the new theta values theta0new = theta0 - alpha*dJ0() theta1new = theta1 - alpha*dJ1() # update the theta values theta0 = theta0new theta1 = theta1new # print the current state print("n = %3d, theta0 = %2.4f, theta1 = %2.4f, J = %3.7f" % \ (nstep,theta0,theta1,J()))

When I used the script to experiment with different values for the learning rate, I discovered that small changes in the learning rate can cause the algorithm to blow up and not converge. In the graph below, which displays how long the algorithm took to converge versus learning rate, you can see that when I changed the rate from 0.056 to 0.058 I went from nice convergence to no convergence (notice how the curve starts to go up at the last point (0.057)!