Assignment 8 For the Homework assignment, only do the heart failure linear model prediction part plus finish (!!!!) kNN variations For the heart failure assignment, pick up X and Y depending on your ID (last digit i

Question

Assignment 8

For the Homework assignment, only do the

heart failure linear model prediction part

plus finish (!!!!) kNN variations

For the heart failure assignment, pick up X and Y depending on your ID (last digit i.e. 1)

you pick up the group

In this assignment, we will apply F-test to detect whether there is a (statistically) significant change in the pricing behavior of your stock within some time interval. F-tests are often used to test equivalence of models that have been fitted to data using the least squares (such as linear regression). Some examples include

testing whether regression fits the data well
testing for equality of means of normally distributed populations
testing whether two regression lines fits data better than one column

We will focus on the last item. For each time period T (e.g. month), we want to check if there is a (statistically) significant change in pricing pattern for your stock.

We proceed as follows. Assume that the time period con- tains n days and let P1, . . . , Pn denote the (adjusted closing) prices for days i = 1, . . . , n. We construct a simple linear re- gression model for the price Pi = α · i + β. This model has two unknown parameters: slope α and intercept β. Therefore, for this model, the number of degrees of freedom d=2. In general, if we have a linear regression on m variables, we would need to compute m slope coefficients and intercept - in this case d = m + 1. Let SSE(T ) denote the sum of the squared resid- uals (”loss” function) for the regression line that ”fits” prices P1,...,Pn.

Next, we look for a day 1 < k < n where we suspect there is a change in linear trend. To find such a day, we divide our period T into two time periods: T1 containing days 1, . . . , k and T2 containing days k + 1, . . . , n. Within each period, we construct two regressions and compute the corresponding loss functions SSE(T1) and SSE(T2). We look for k that minimizes the total loss from using two regressions SSE(T1)+SSE(T2). Note that for each regression, the number of degrees of freedom is d1 = 2 and d2 = 2.

Once we computed our ”break” day candidate k, we construct the following F statistics. To simplify the notation, let us define L = SSE(T), L1 = SSE(T1) and L2 = SSE(T2). For a single line, we have parameters to estimate, namely slope and intercept. For a single model, we need d = 2 parameters and for the 2-segment model we need the d1 + d2 = 4 parameters where d1 = d(L1) = 2 and d2 = d(L2) = 2 parameters to estimate. If there are n data points, we compute the following F statistics:

Under the null hypethesis that two regression lines do not pro- vide a significantly better fit than one regression line, F will have an (Fisher) F -distribution with (2, n − 4) degrees of free- dom. The null hypethesis is rejected if the F is greater than some critical value (e.g. 0.05)

In Python you can compute the F -distribution as follows

from scipy.stats import f as fisher_f

p_value = fisher_f.cdf(f_statistics , 2, n-4)

Questions:

take years 1 and 2. For each month, compute the ”candidate” days and decide whether there is a significant change of pricing trend in each month. Use 0.1 as critical value.
how many months exhibit significant price changes for your sotck ticker.
3. are there more ”changes” in year 1 or in year 2?

Questions:

take weekly data for year 1. For each W = 5,6,...,12 and for each d = 1, 2, 3 construct the corresponding polynomials Use these polynomials to predict weekly labels. Plot the accuracy - on x axis you have W and you plot three curves for accuracy (separate curve for each d)
for each d take the best W that gives you the highest accu- racy. Use this W to predict labels for year 2. What is your accuracy?
compute confusion matrices (for each d) for year 2
implement three trading strategies for year 2 (for each d using the ”best” values for W from year 1 that you have computed)