PHYS5001 ADVANCED DATA ANALYSISMock Data ChallengeFor the Mo

Homework answers / question archive / PHYS5001 ADVANCED DATA ANALYSIS Mock Data Challenge For the Mock Data Challenge, you should fill the Mock Data Challenge submission template (available in Moodle) with your code, results and interpretation

PHYS5001 ADVANCED DATA ANALYSIS Mock Data Challenge For the Mock Data Challenge, you should fill the Mock Data Challenge submission template (available in Moodle) with your code, results and interpretation

Computer Science

Share With

PHYS5001 ADVANCED DATA ANALYSIS Mock Data Challenge For the Mock Data Challenge, you should fill the Mock Data Challenge submission template (available in Moodle) with your code, results and interpretation. The completed template should be uploaded electronically to Moodle by 5pm on Thursday April 14th 2022. There are two main parts to the mock data challenge:

Part 1

Step 1.1: Use ordinary least squares to fit the linear model y = a + bx to the Part 1 mock data (MDC1.txt)

(a) compute LS estimators of a and b,

(b) estimate the variance of the (assumed Gaussian) noise which has been added to the mock y values

Step 1.2: By casting the data analysis challenge not as a least squares problem, but as a maximum likelihood problem, form an appropriate likelihood function for the mock data, which depends on the parameters (a,b).

Then, by computing the log likelihood on a rectangular grid of values of a and b (you need to think carefully about the range of a and b values you should consider, and the spacing between them), and in turn computing the value of chi-squared for each (a,b) pair on your grid, you should find the minimum value of chi-squared.

You then should turn your grid of values into a rectangular array of Delta chi-squared values. Finally, using the information in the table in Lecture 6, you should compute and plot Bayesian credible regions for the parameters at e.g. 68.3%, 95.4%, 99.73%.

Step 1.3: Finally, using the Metropolis algorithm, and assuming a Gaussian likelihood function for the model parameters a and b, write an MCMC code to generate a sample from the likelihood function – thinking carefully about your choices of proposal density and prior range for a and b. Use this sample to estimate the mean values, errors and covariance of the parameters a and b from their sampled marginal distributions.

Devise a method for estimating and Bayesian credible regions for the parameters, using your MCMC sample.

Part 2

Step 2.1: Similar to Step 1.3 – fit a quadratic model, of the form y = a + bx + cx2

, to the Part 2 data

(MDC2.txt), using e.g. the Metropolis algorithm to sample the posterior distributions of the parameters a, b and c and generating plots of the marginal posterior for each pair of parameters – thinking carefully about how to estimate the variance of the (assumed Gaussian) noise that has been added to the mock y values. Note: you can use the information on parameters a and b obtained from Part 1.

Step 2.2: Compute the marginal likelihoods (evidences) for the linear and quadratic models. Use the computed evidences to construct the Bayes factor and interpret your result. Hints: Note that there are many ways to compute the marginal likelihoods. The most robust approach is to compute the log likelihood values on a grid of parameter values for both the linear and quadratic models. For the linear model, you can use the values from Step 1.2. Then, compute the marginal likelihood for each model by evaluating the integration numerically across the chosen range of priors. To avoid numerical precision errors in during integration, you will want to use some version of the logsumexp1 function (eg from the scipy package). Also, to avoid long computation times, you will typically only want about 100 samples of each parameter on the grid.

Each step described above contributes 20% to your grade for the Mock Data Challenge.