Optimization via Gradient Descent
In this Homework, we will consider a general optimization problem:
where, is a differentiable function for which we know how to compute . This is done by the Gradient Descent (GD) method: an iterative algorithm that, given an initial iterate and a positive parameter called step size, computes:
You are asked to implement the GD method in Python and to test it with some exemplar functions. In particular:
-
Write a script that implement the GD algorithm with fixed step size (i.e. no backtracking), with the input-output structure discussed in the first Exercise of the Gradient Descent section (https://devangelista2.github.io/statistical-mathematical-methods/Optimization/GD.html).
-
Write a script that implement the GD algorithm with backtracking, with the input-output structure discussed in the second Exercise of the Gradient Descent section (https://devangelista2.github.io/statistical-mathematical-methods/Optimization/GD.html).
-
Test the algorithm above on the following functions:
-
such that:
for which the true solution is .
-
such that:
for which the true solution is .
-
such that:
where is the Vandermonde matrix associated with the vector that contains equispaced values in the interval , and is computed by first setting , and then . Try for different values of (e.g. ).
-
such that:
where and are the same of the exercise above, while is a fixed value in the interval . Try different values of and comment the result.
-
such that:
-
-
For each of the functions above, test the GD method with and without backtracking, trying different values for the step size when backtracking is not employed. Comment on the results.
-
Plot the value of as a function of , check that it goes to zero, and compare the convergence speed (in terms of the number of iterations ) for the different values of and with backtracking.
-
For each of the points above, use:
x0
= (except for function 5, which is discussed in the following point),kmax
= 100,tolf
=tolx
=1e-5
. Also, when the true solution is given, plot the error as a function of .
-
Plot the graph of the non-convex function 5 in the interval , and test the convergence of GD with different values of
x0
(of your choice) and different step-sizes. When is the convergence point the global minimum? -
Hard (optional): For functions 1 and 2, show the contour plot around the true minimum and visualize the path described by the iterations, i.e. representing on the contour plot the position of each iterate computed by the GD algorithm. See the
plt.contour
documentation.
Optimization via Stochastic Gradient Descent
Consider a dataset , where:
together with a model , with vector of parameters . Training a ML model requires solving:
Since the optimization problem above is written as a sum of independent terms that only depends on the single datapoints, it satisfies the hypothesis for the application of the Stochastic Gradient Descent (SGD) algorithm, which articulates as follows:
-
Given an integer
batch_size
, randomly extract a sub-dataset such that from the original dataset. Note that the random sampling at each iteration has to be done without replacement. -
Compute the gradient of the loss function on the sampled batch as:
-
Compute one single iteration of the GD algorithm on the direction described by :
-
Repeat until the full dataset has been extracted. When this happens, we say that we completed an epoch of the SGD method. Repeat this procedure for a number of epochs equal to a parameter
n_epochs
, given as input.
Consider the dataset poly_regression_large.csv
, provided on Virtuale, and let be a polynomial regression model, as discussed in https://devangelista2.github.io/statistical-mathematical-methods/regression_classification/regression.html.
-
Split the dataset into training and test set as in the Homework 2, with a proportion of 80% training and 20% test.
-
Fix a degree for the polynomial.
-
Train the polynomial regression model on the training set via the Stochastic Gradient Descent algorithm.
-
Train the polynomial regression model on the training set via the Gradient Descent algorithm.
-
Train the polynomial regression model on the
poly_regression_small.csv
dataset. Use the full dataset for this test, without splitting it into training and test set. -
Compare the performance of the three regression model computed above. In particular, if is the test set from the
poly_regression_large.csv
dataset, for each of the model, compute:where is the number of elements in the test set, are the input and output elements in the test set. Comment the performance of the three models.
-
Repeat the experiment by varying the degree of the polynomial. Comment the results.
-
Set (so that the polynomial regression model is a polynomial of degree 4). Compare the parameters learned by the three models with the true parameter .
Other assignments: