Correspondent labs:


todo check with the original assignment (some eqs may be inaccurate)

Maximum Likelihood Estimation (MLE) and Maximum a Posteriori (MAP)

Suppose you are given a dataset , where

and , , for any . Moreover, suppose that:

where is random Gaussian Noise and . It is known that the Maximum Likelihood Estimation (MLE) approach works by defining the conditional probability of given , , and then optimizes the parameters to maximize this probability distribution over . Moreover, it is also known that this approach can be made equivalent to the deterministic approach to solve such problems (the Least Square method) by taking the negative-log of . Indeed, by assuming that the noise is Gaussian for any , we have:

Thus,

Given that , with , we have shown that the problem above becomes

where is the Vandermonde matrix associated with , i.e., the matrix whose -th column is , , and .

Note that the above equation is equivalent to the function you optimized in Exercise 3 of Lab 3 with GD, with , and .

Maximum A Posteriori (MAP)

When it is unclear how to set the parameter and it is impossible to use the error plot, it is required to use the Maximum A Posteriori (MAP) approach. To show how it works, suppose that we know that the parameters are normally distributed . Then we can use Bayes Theorem to express the a posteriori probability on given and as

The MAP solution searches for a set of parameters that maximizes . Following the same reasoning as before,

Implementation Task

Given the two optimization problems above, you are required to implement a program that compares the two solutions, which we will refer to as and . To do that:

  1. Define a test problem in the following way:

    • Let the user fix a positive integer , and define (you can also consider different );
    • Define an input dataset , where the are uniformly distributed data points in the interval , where are values that the user can select;
    • Given a set of functions , define the Generalized Vandermonde matrix , whose element in position is . In particular, write a function defining the classical Vandermonde matrix where ;
    • Given a variance defined by the user, compute , where is Gaussian distributed noise with variance . Try the following experiments for different values of . Note that the test problem defined in this way is very similar to what we did to define a test problem in the first Lab.
  2. We now build a dataset such that is the best solution to the least squares problem .

  3. Pretend not to know the correct value of . The first task is to try to guess it and use it to approximate the true solution by MLE and MAP. To do that:

    • Write a function that takes as input the training data and and returns the MLE solution (with Gaussian assumption) for that problem. Note that the loss function can be optimized by GD, SGD, or Normal Equations.
    • Write a function that takes as input a set of -dimensional parameter vector and a test set and returns the average absolute error of the polynomial regressor over , computed as:
      • For different values of , plot the training data points and the test data points with different colors and visualize (as a continuous line) the learned regression model . Comment on the results.
      • For increasing values of , use the functions defined above to compute the training and test error, where the test set is generated by sampling new points on the same interval of the training set and generating the corresponding with the same procedure as the training set. Plot the two errors with respect to . Comment on the results.
      • Write a function that takes as input the training data , , and and returns the MAP solution (with Gaussian assumption) for that problem. Note that the loss function can be optimized by GD, SGD, or Normal Equations.
      • For lower, equal, and greater than the correct degree of the test polynomial, plot the training data points and the test data points with different colors, and visualize (as a continuous line) the learned regression model with different values of . Comment on the results.
      • For being way greater than the correct degree of the polynomial, compute the MLE and MAP solution. Compare the test error of the two, for different values of (in the case of MAP).
      • For greater than the true degree of the polynomial, define where has been padded with zeros to match the shape of . Compute and for increasing values of and different values of .
      • Compare the results obtained by increasing the number of data points.
      • Compare the results obtained by the three algorithms GD, SGD, and Normal Equations.

Note: When the value of a parameter is not explicitly specified, you can set it as you want. Suggestion: Repeat for different values of the parameter.


Other assignments: