Homework answers / question archive / Part I: Multiple Choice Questions (20 Questions worth 40 points in total, 2 points each) 1

Part I: Multiple Choice Questions (20 Questions worth 40 points in total, 2 points each) 1

Economics

Share With

Part I: Multiple Choice Questions (20 Questions worth 40 points in total, 2 points each) 1. What is the R command for setting the working directory? a. setwd b. getwd c. assign d. install 2. What is the R package for downloading data from Yahoo Finance? a. quantmod b. fBasics c. forecast d. download 3. What is the R command for listing the first 6 rows of the dataset? a. tail b. head c. list d. dim 4. Which implies heavy tails in distribution? a. high mean b. high variance c. high skewness d. high kurtosis 5. Which tests for autocorrelations to be zero? a. Jarque-Bera test b. Ljung-Box test c. Augmented Dicky-Fuller (ADF) test d. Dicky-Fuller test 6. Which of the following cannot be used for identifying AR order? a. partial ACF b. ACF c. AIC d. BIC 7. Which is the R command to estimate the model ?? = ?0 + ?1??$1+?2??$2+?&??$& + ?? − ?1??$1+?2??$2? Suppose x is the time series ?? in the R command. a. arima(x,order=c(2,3,0)) b. arima(x,order=c(3,2,0)) c. arima(x,order=c(3,0,2)) d. arima(x,order=c(0,3,2)) 8. When will the MA(1) model (?? = ?0 + ?? − ?1??$1) be weakly stationary? a. always weakly stationary b. when |?0| < 1 c. when |?1| < 1 d. when |?0| < 1 and |?1| < 1 Please read the following R scripts for the analysis of a time series (da) and answer questions from 9 to 14. > setwd("C:/Users/dingluo/teaching/ef4822/spring2020") > library(fBasics) > data=read.table("m-ew6299.txt") > da=data[,1] > ts.plot(da) > pacf(da) > ar(da,method="mle") Call: ar(x = da, method = "mle") Coefficients: 1 0.2266 Order selected 1 sigma^2 estimated as 29.68 > m1=arima(da,order=c(1,0,0)) > m1 Call: arima(x = da, order = c(1, 0, 0)) Coefficients: ar1 intercept 0.2267 1.0626 s.e. 0.0456 0.3297 sigma^2 estimated as 29.68: log likelihood = -1420.11, aic = 2846.22 > tsdiag(m1) > tsdiag(m1,gof=24) > acf(da) > m2=arima(da,order=c(0,0,1)) > m2 Call: arima(x = da, order = c(0, 0, 1)) Coefficients: ma1 intercept 0.2385 1.0605 s.e. 0.0449 0.3153 sigma^2 estimated as 29.59: log likelihood = -1419.37, aic = 2844.73 > tsdiag(m2) > tsdiag(m2,gof=24) 9. Which is the mean of the time series (da) in the estimated AR(1) model (m1)? a. 1.0626 b. 1.0626/(1-0.2267) c. 0.3297 d. 0.3297/(1-0.0456) 10. Which is the variance of the time series (da) in the estimated AR(1) model (m1)? a. 29.68 b. 29.68/(1 − 0.22672) c. 29.59 d. 29.59/(1 + 0.23852) 11. Which is the mean of the time series (da) in the estimated MA(1) model (m2)? a. 1.0605 b. 1.0605/(1-0.2385) c. 0.3153 d. 0.3153/(1-0.0449) 12. Which is the variance of the time series (da) in the estimated MA(1) model (m2)? a. 29.59 b. 29.59 ∗ (1 + 0.23852) c. 29.59/(1 − 0.23852) d. 29.59/(1 + 0.23852) 13. Which is/are used to check whether the estimated model MA(1) (m2) is adequate? a. tsdiag(m2) b. tsdiag(m2,gof=24) c. acf(da) d. both a and b 14. Is the time series (da) weakly stationary? a. yes b. no c. not sure d. yes for AR(1), no for MA(1) Please read the following R scripts for using log dividend-to-price ratio to predict stock market excess return and answer questions from 15 to 20. > setwd("~/Dropbox/Teaching/EF4822_Spring2020") > da=read.csv("PredictorData2018part.csv") > > > > > > CRSP_SPvw=da[,20] Rfree=da[,12] exret=CRSP_SPvw-Rfree D12=da[,3] Index=da[,2] dp=log(D12/Index) # stock market excess return # log dividend-to-price ratio > T=length(exret) > lmdp=lm(exret[2:T]~dp[1:T-1]) > View(lmdp) > summary(lmdp) Call: lm(formula = exret[2:T] ~ dp[1:T - 1]) Residuals: Min 1Q -0.60678 -0.13020 Median 0.02396 3Q 0.14358 Max 0.39421 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.33301 0.15107 2.204 0.0301 * dp[1:T - 1] 0.07474 0.04429 1.688 0.0950 . --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1973 on 89 degrees of freedom Multiple R-squared: 0.03101, Adjusted R-squared: 0.02012 F-statistic: 2.848 on 1 and 89 DF, p-value: 0.09498 > anova(lmdp) Analysis of Variance Table Response: exret[2:T] Df Sum Sq Mean Sq F value Pr(>F) dp[1:T - 1] 1 0.1109 0.110873 2.8481 0.09498 . Residuals 89 3.4646 0.038928 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > plot(x=dp[1:T-1],y=exret[2:T],main="exret~dp") > abline(lm(exret[2:T]~dp[1:T-1])) 15. Does log dividend-to-price ratio predict stock market excess return at 5% significance level? a. yes b. no c. not sure 16. For the estimated predictability model (lmdp in the R script), we could write the model as ??'1 = α + ? × log(?? /?? ) + ??'1 . Which of the following is false? a. α = 0.33301 b. ? = 0.07474 c. α is significantly different from zero at 1% level d. ? has a standard error 0.1973 17. Should you invest more or less in the stock market when log dividend-to-price is high? a. more b. less c. not sure 18. What fraction of variance in the stock market excess return is explained by the return predictability model? a. 33.3% b. 7.47% c. 3.1% d. 9.5% 19. Which is the total sum of squares in the stock market excess return? a. 0.1109 b. 3.4646 c. 3.5755 d. 3.3537 20. Which is the command to plot the estimated model? a. lmdp=lm(exret[2:T]~dp[1:T-1]) b. plot(x=dp[1:T-1],y=exret[2:T],main="exret~dp") c. abline(lm(exret[2:T]~dp[1:T-1])) d. View(lmdp) Part II: Long Questions (60 points) Notes: Please show all relevant steps in deriving the final answers. 1. (30 points) Suppose that the monthly log return of a security ?? follows the AR(1) model ?? = ?? + 0.2??$1, where {??} is a Gaussian white noise series with mean zero and variance 0.01. (a) Compute the mean and variance of the return series. (6 points) (b) Compute the lag-1 and lag-2 autocorrelations of the return series. (10 points) (c) Assume that ?100 = 0.02. Compute the 1-step- and 2-step-ahead forecasts of the return at the forecast origin ? = 100. (8 points) (d) What are the standard deviations of the associated forecast errors? (6 points) 2. (30 points) Suppose that the monthly log return of a security ?? follows the model ?? = 0.02 + 0.1??$1 + ?? + 0.2??$1, where {??} is a Gaussian white noise series with mean zero and variance 0.01. (a) Compute the mean and variance of the return series. (6 points) (b) Compute the autocorrelations of the return series for all lags. (14 points) (c) Compute the 1-step- and all the multistep-ahead forecasts of the return at the forecast origin ? = ?. (10 points) Formula Sheet Note: ?, ? and ? are constants and ?? is a time series in the following formulas. 1. Mean: ?(???) = ??(??) 2. Mean: ?(? + ??) = ? + ?(??) 2 3. Variance: ???(? ) = ? KL? − ?(? )M N ? ? ? 4. Variance: ???(???) = 5. Variance: ???(?+??) = ???(??) ?2???(??) 6. Variance: ???(??? + ???$?) = ?2???(??) + ?2???(??$?) + 2?????(??, ??$?) 7. Variance: ???(?? + ??$?) = ???(??) + ???(??$?) + 2???(??, ??$?), (i.e., ? = ? = 1 in 6) 8. Variance: ???(??? + ???$? + ???'? ) = ?2 ???(?? ) + ?2 ???(??$? ) + ? 2 ???(??'? ) + 2?????(?? , ??$? ) + 2?????(?? , ??'? ) + 2?????(??$? , ??'? ) 9. Covariance: ???(??, ??$?) = ?[(?? − ?(??))(??$? − ?(??))] 10. Covariance: ???(??, ??$?) = ?(????$?) − ?(??)?(??$?) 11. Covariance: ???(? + ??, ??$?) = ???(??, ??$?) 12. Covariance: ???(???, ??$?) = ????(??, ??$?) 13. Covariance: ???(??? + ???'? , ???$? ) = ?????(?? , ??$? ) + ?????(??'? , ??$? ) 14. Covariance: ???(?? + ??'? , ??$? ) = ???(?? , ??$? ) + ???(??'? , ??$? ), (i.e., ? = ? = ? = 1 in 13) 15. Lag-? autocorrelation: ?? = ???(??, ??$?)/???(??) 16. Conditional Expectation: For any ?? which is known at time ?, ?(??|??) = ??. For example, ?(?2?|??) = ??2 ?(? |? ) = ? ? ? ? ?(???|??) = ??? ?(?+???|??) = ? + ??? 17. Conditional Expectation (law of iterated expectations): ?(?? ) = ? X?L?? Y??$* MZ , ? = 1,2,3, … ??$* is the information set at time ? − ?, which includes all information up to ? − ?. When ? = 1, ?(?? ) = ?L?(?? |??$1 )M. EF4822 Financial Econometrics Week 2: Introduction to R Program R website: https://www.r-project.org/ R manuals: https://cran.r-project.org/manuals.html Youtube video: Intro to R https://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP RStudio Desktop (free): https://rstudio.com/products/rstudio/download/ In this note, we briefly introduce the R program to be used extensively in the course. Specific packages and their commands for performing statistical analyses discussed in the lectures will be given when needed. Our goal is to make the empirical analysis as easy as possible so that students can reproduce the results shown in the lecture notes and textbook. R is a free software available from http://www.r-project.org. It runs on many operating systems, including Linux, MacOS X, and Windows. One can click CRAN on the above web page to select a nearby CRAN Mirror to download and install the software and selected packages. The simplest way to install the program is to follow the online instructions and to use the default options. Because R is an open-source software, it contains thousands of packages developed by researchers around the world for various statistical analyses. For financial time series analysis, the Rmetrics of Dr. Diethelm Wuertz and his associates has many useful packages, including fBasics and fGarch. We use many functions of these packages in the lectures. We also use some other packages that are powerful and easy to use in R, e.g., the evir package for extreme value analysis in R and the rugarch package for additional volatility models. The R commands are case sensitive and must be followed exactly. 0.1 Installation of R packages Using default options in R installation creates an icon on the desktop of a computer. One can start the R program simply by double clicking the R icon. For Windows, a RGui window will appear with command menu and the R Console. To install packages, one can click on the command Packages to select Install packages. A pop-up window appears asking users to select a R mirror (similar to R installation mentioned before). With a selected mirror, another pop-up window appears that contains all available packages. One can click on the desired packages for installation. You only need to install a package once on each machine. With packages installed, one can load them into R by clicking on the command Packages followed by clicking Load packages. A pop-up window appears that contains all installed packages for users to choose. An alternative approach to load a package is to use the command library or require. See the demonstration below. 0.2 The quantmod Package To begin with, we consider a useful R package for downloading financial data directly from some open sources, including Yahoo Finance, and the Federal Reserve 1 Economic Data (FRED) of Federal Reserve Bank of St. Louis. The package is quantmod by Jeffry A. Ryan. It is highly recommended that one installs it. Once installed, the quantmod package allows users, with Internet connection, to use tick symbols to access daily stock data from Yahoo Finance and to use series name to access thousands of economic and financial time series from FRED. The command is getSymbols. The package also has some nice functions, e.g., obtaining time series plots of closing price and trading volume. The command is chartSeries. The default option of these two commands is sufficient for basic analysis of financial time series. One can use subcommands to further enhance the capabilities of the package such as specifying the time span of interest in getSymbols. Interested readers may consult the document associated with the package for description of the commands available. Here we provide a simple demonstration. Figure 1 shows the time plots of daily closing price and trading volume of Apple stock from January 3, 2008 to January 28, 2015. The plot also shows the price and volume of the last observation. The subcommand theme=‘‘white’’ of chartSeries is used to set the background of the time plot. The default is black. Figure 2 shows the time plot of monthly U.S. unemployment rates from January 1948 to November 2011. Figure 3 shows the time plot of Hong Kong Hang Seng index from January 2, 2007 to Jan 17, 2020. These are obtained from Yahoo Finance. Since there is no volume, the subcommand TA=NULL is used to omit the time plot of volume in chartSeries. The commands head and tail show, respectively, the first and the last six rows of the data. R Demonstration with quantmod package Output edited. > denotes R prompt and explanation starts with %. > install.packages('quantmod') > library(quantmod) % You may use the command "require(quantmod)" too. > getSymbols("AAPL",from="2008-01-03",to="2015-01-29") % Apple stock price, use default: from Yahoo [1] "AAPL" > dim(AAPL) % See the size of the loaded data set [1] 1780 6 > head(AAPL) % The first 6 row of data AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted 2008-01-03 6.978929 7.049643 6.881786 6.961786 842066400 6.005591 2008-01-04 6.837500 6.892857 6.388928 6.430357 1455832000 5.547154 2008-01-07 6.473214 6.557143 6.079643 6.344285 2072193200 5.472904 2008-01-08 6.433571 6.516428 6.100000 6.116071 1523816000 5.276035 2008-01-09 6.117857 6.410714 6.010714 6.407143 1813882000 5.527128 2008-01-10 6.342143 6.464286 6.264643 6.357857 1482975200 5.484612 > tail(AAPL) % The last 6 row of data AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted 2015-01-21 2015-01-22 2015-01-23 2015-01-26 27.2375 27.5650 28.0750 28.4350 27.7650 28.1175 28.4375 28.5900 27.0675 27.4300 27.8825 28.2000 27.3875 28.1000 28.2450 28.2750 2 194303600 215185600 185859200 222460000 24.94865 25.59771 25.72979 25.75712 2015-01-27 2015-01-28 28.1050 29.4075 28.1200 29.5300 27.2575 28.8275 27.2850 28.8275 382274800 585908400 24.85528 26.26042 > chartSeries(AAPL,theme="white") % Obtain time plot of closing price and trading volume > getSymbols("UNRATE",src="FRED") % Load monthly unemployment rate from FRED (Federal Reserve Bank) [1] "UNRATE" > dim(UNRATE) [1] 864 1 > chartSeries(UNRATE,theme="white") > head(UNRATE) UNRATE 1948-01-01 3.4 1948-02-01 3.8 1948-03-01 4.0 1948-04-01 3.9 1948-05-01 3.5 1948-06-01 3.6 > tail(UNRATE) UNRATE 10.2 2020-07-01 8.4 2020-08-01 7.8 2020-09-01 6.9 2020-10-01 6.7 2020-11-01 6.7 2020-12-01 > getSymbols("INTC",to="2020-01-18") %% Intel stock price, from Yahoo finance [1] "INTC" > dim(INTC) %% Default time span: Jan. 03, 2007 to last day available. [1] 3284 6 > head(INTC) INTC.Open INTC.High INTC.Low INTC.Close INTC.Volume INTC.Adjusted 13.45666 2007-01-03 20.45 20.88 20.14 20.35 69001200 13.99889 2007-01-04 20.63 21.33 20.56 21.17 88902300 13.95261 2007-01-05 21.09 21.15 20.76 21.10 64550800 13.89310 2007-01-08 21.25 21.34 20.95 21.01 52839100 13.90632 2007-01-09 21.18 21.21 20.86 21.03 54381000 14.23033 2007-01-10 21.09 21.62 21.03 21.52 76346900 > tail(INTC) INTC.Open INTC.High INTC.Low INTC.Close INTC.Volume INTC.Adjusted 2020-01-10 59.57 60.08 58.87 58.94 15200600 57.51000 2020-01-13 59.17 59.78 59.08 59.59 16453300 58.14423 57.98811 2020-01-14 59.49 59.74 59.19 59.43 17051200 57.51000 2020-01-15 59.30 59.65 58.75 58.94 18498800 2020-01-16 59.26 59.84 59.07 59.66 21365500 58.21253 21803400 2020-01-17 59.98 60.00 59.24 59.60 58.15398 3 AAPL [2008−01−03/2015−01−28] 700 Last 115.31 600 500 400 300 200 100 800 600 Volume (millions): 145,448,000 400 200 0 Jan 03 2008 Jul 01 2009 Jan 03 2011 Jul 02 2012 Jan 02 2014 Figure 1: Time plots of daily closing price and trading volume of Apple stock from January 3, 2008 to January 28, 2015. > getSymbols("^HSI",to="2020-01-18") % Hang Seng Index from Yahoo Finance [1] "HSI" > head(HSI) HSI.Open HSI.High HSI.Low HSI.Close HSI.Volume HSI.Adjusted 2007-01-02 20004.84 20323.59 19990.28 20310.18 1264596800 20310.18 2007-01-03 20353.42 20554.58 20249.61 20413.39 1673968900 20413.39 2007-01-04 20415.67 20463.18 19948.02 20025.58 2130510200 20025.58 2007-01-05 19890.15 20213.64 19757.24 20211.28 1959114400 20211.28 2007-01-08 19915.00 20085.58 19844.31 20029.66 1438589300 20029.66 2007-01-09 20162.54 20173.06 19794.29 19898.08 1524412800 19898.08 > tail(HSI) HSI.Open HSI.High HSI.Low HSI.Close HSI.Volume HSI.Adjusted 2020-01-10 28665.14 28665.14 28504.27 28638.20 1448401000 28638.20 2020-01-13 28772.37 28971.40 28671.84 28954.94 1765055700 28954.94 2020-01-14 29149.53 29149.53 28790.49 28885.14 1643504700 28885.14 2020-01-15 28891.07 28972.68 28619.10 28773.59 1240120700 28773.59 2020-01-16 28806.12 28987.73 28709.57 28883.04 1620926200 28883.04 2020-01-17 28988.16 29101.15 28813.13 29056.42 1545082800 29056.42 > chartSeries(HSI,theme="white",TA=NULL) % Obtain time plot without trading volume Remark: The Quantmod package updates financial data daily. The default option of getSymbols is to download data to the most recent one available. 4 UNRATE [1948−01−01/2014−12−01] Last 5.6 10 8 6 4 Jan 1948 Jan 1960 Jan 1975 Jan 1990 Jan 2005 Figure 2: Time plot of U.S. monthly unemployment rates from January 1948 to December 2019. TNX [2007−01−03/2015−01−28] 5 4 3 2 Jan 03 2007 Jan 02 2009 Jan 03 2011 Jan 02 2013 Dec 31 2014 Figure 3: Time plot of Hong Kong Hang Seng Index from January 2, 2007 to January 17, 2020. 5 0.3 Some Basic R commands After starting R, the first thing to do is to set the working directory. By working directory, we mean the computer directory where data sets reside and output will be stored. This can be done in two ways. The first method is to click on the command File. A pop-up window appears that allows one to select the desired directory. The second method is to type in the desired directory in the R Console using the command setwd, which stands for set working directory. See the demonstration below. R is an object oriented program. It handles many types of object. For the purposes of the course, we do not need to study details of an object in R. Explanations will be given when needed. It suffices now to say that R allows one to assign values to variables and refer to them by names. The assignment operator is getwd() [1] "C:/Users/dingluo/Dropbox/Teaching/EF4822_Spring2021" > x x % See the value of x. [1] 10 % Here [1] signifies the first element. > 1+2 % Basic operation: addition [1] 3 > 10/2 % Basic operation: division [1] 5 % Use * and ^ for multiplication and power, respectively. % Use log for the natural logarithm. > da=read.table(’d-ibm-0110.txt’,header=T) % Load text data with names. > head(da) % See the first 6 rows date return 1 20010102 -0.002206 2 20010103 0.115696 .... 6 20010109 -0.010688 > dim(da) % Dimension of the data object "da". [1] 2515 2 > da head(da) % See the first 6 rows Date VIX.Open VIX.High VIX.Low VIX.Close 6 AAPLrtn [2007−01−04/2011−12−02] Last 0.00455230136879425 0.10 0.05 0.00 −0.05 −0.10 −0.15 −0.20 Jan 04 2007 Jul 01 2008 Jan 04 2010 Jul 01 2011 Figure 4: Time plot of daily log returns of Apple stock from January 4, 2007 to January 27, 2015. 1 1/2/2004 2 1/5/2004 .... 6 1/9/2004 1 17.96 18.45 18.68 18.49 17.54 17.44 18.22 17.49 16.15 16.88 15.57 16.75 Examples of Financial Data In this section, we examine some of the return series in finance. Figure 4 shows the time plot of daily log returns of Apple stock from January 4, 2007 to January 27, 2015. As defined before, daily l og r eturns are s imply t he change s eries of l og prices. I n R, a change s eries can easily be obtained by t aking t he difference of t he l og prices. Specifically, r t =l n(Pt)−ln(Pt−1), where Pt i s t he s tock price at t ime t . Note t hat i n t he demonstration, I used adjusted daily price t o compute log returns because adjusted price takes into consideration the stock splits, if any, during the sample period. From the plot, we see that (a) there exist some large outlying observations and (b) the returns were volatile in certain periods, but stable in others. The latter characteristic is referred to as volatility clustering in asset returns. The former, on the other hand, are indicative that the returns have heavy tails. Figure 5 shows the time plot of daily changes in yield-to-maturity (YTM) of the 10-year Treasury notes also from January 4, 2007 to January 27, 2015. The changes in YTM exhibit similar characteristics as those of daily returns of Apple stock. Figure 6 provides the time plot of daily log returns of the Dollar-Euro exchange rate. Again, the log returns of exchange rates have the same features as those of the daily log returns of stock. The daily Dollar-Euro exchange rate is given in Figure 7. The exchange rates are downloaded from the database FRED. 7 TNX.rtn [2007−01−04/2015−01−28] Last −0.11 0.2 0.0 −0.2 −0.4 Jan 04 2007 Jan 02 2009 Jan 03 2011 Jan 02 2013 Dec 31 2014 Figure 5: Time plot of daily changes in the yield to maturity for the U.S. 10-year Treasury notes from January 4, 2007 to January 27, 2015. USEU.rtn [1999−01−05/2015−01−23] 0.04 0.02 0.00 −0.02 Jan 05 1999 Jan 03 2003 Jan 03 2007 Jan 04 2011 Jan 05 2015 Figure 6: Time plot of daily log returns of the Dollar-Euro exchange rates from January 5, 1999 to January 08, 2021. The rate is dollars per Euro. 8 DEXUSEU [1999−01−04/2015−01−23] 1.6 1.4 1.2 1.0 0.8 Jan 04 1999 Jan 02 2003 Jan 02 2007 Jan 03 2011 Jan 02 2015 Figure 7: Time plot of daily Dollar-Euro exchange rates from January 4, 1999 to January 08, 2021. The rate is dollars per Euro. R Demonstration > require(quantmod) > getSymbols("AAPL",from="2007-01-03",to="2015-01-28") %Specify period [1] "AAPL" > AAPL.rtn=diff(log(AAPL$AAPL.Adjusted)) % Compute log returns > chartSeries(AAPL.rtn,theme="white") > getSymbols("^TNX",from="2007-01-03",to="2015-01-28") [1] "TNX" > TNX.rtn=diff(TNX$TNX.Adjusted) % Compute changes > chartSeries(TNX.rtn,theme="white") > getSymbols("DEXUSEU",src="FRED") %Obtain exchange rates from FRED [1] "DEXUSEU" > head(DEXUSEU) DEXUSEU 1999-01-04 1.1812 1999-01-05 1.1760 .... 1999-01-11 1.1534 > tail(DEXUSEU) DEXUSEU 2021-01-01 NA .... 2021-01-08 1.2252 > USEU.rtn=diff(log(DEXUSEU$DEXUSEU)) > chartSeries(DEXUSEU,theme="white") 9 > chartSeries(USEU.rtn,theme="white") In-Class Exercises 1. Download US S&P 500 index data, calculate log return, and plot both the index level and index return. Calculate cumulated and average log return. 2. Download Hong Kong Hang Seng index data, calculate log return, and plot both the index level and index return. Calculate cumulated and average log return. 3. Download US GDP data, calculate log growth, and plot both the GDP level and growth. 4. Download Hong Kong GDP data, calculate log growth, and plot both the GDP level and growth. 10 Week 10: R Program This week’s R program illustrates how we can estimate CAPM for Apple stock and Microsoft stock. > setwd("~/Dropbox/Teaching/EF4822_Spring2021") > sp500=read.csv("sp500.csv",header=T) > head(sp500) Date Open High Low Close Adj.Close Volume 1 1/1/1990 353.40 360.59 319.83 329.08 329.08 3793250000 2 1/2/1990 329.08 336.09 322.10 331.89 331.89 2961970000 3 1/3/1990 331.89 344.49 331.08 339.94 339.94 3283280000 4 1/4/1990 339.94 347.30 327.76 330.80 330.80 2801220000 5 1/5/1990 330.80 362.26 330.80 361.23 361.23 3596680000 6 1/6/1990 361.26 368.78 351.23 358.02 358.02 3226280000 > tail(sp500) Date Open High Low Close Adj.Close Volume 344 1/8/2018 2821.17 2916.50 2796.34 2901.52 2901.52 69238220000 345 1/9/2018 2896.96 2940.91 2864.12 2913.98 2913.98 62492080000 346 1/10/2018 2926.29 2939.86 2603.54 2711.74 2711.74 91327930000 347 1/11/2018 2717.58 2815.15 2631.09 2760.17 2760.17 80080110000 348 1/12/2018 2790.50 2800.18 2346.58 2506.85 2506.85 83519570000 349 1/1/2019 2476.96 2675.47 2443.96 2638.70 2638.70 57251830000 > > > > > ret_sp500=diff(log(sp500$Adj.Close)) # S&P 500 index log return ts.plot(ret_sp500) tb3m=read.csv("tb3m.csv",header=T) rf=tb3m[,2] # 3-month T-bill rate as Risk-free rate exret_sp500=ret_sp500-rf # S&P 500 index excess return # > > > Estimating CAPM for Apple stock apple=read.csv("appl.csv",header=T) ret_apple=diff(log(apple$Adj.Close)) exret_apple=ret_apple-rf # Apple stock excess return > lmapple=lm(exret_apple~exret_sp500) > View(lmapple) > summary(lmapple) Call: lm(formula = exret_apple ~ exret_sp500) Residuals: Min 1Q -0.80346 -0.05755 Coefficients: Median 0.00549 3Q 0.05914 Max 0.31825 Estimate Std. Error t value Pr(>|t|) (Intercept) 0.013770 0.006361 2.165 0.0311 * exret_sp500 1.278327 0.153669 8.319 2.07e-15 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1182 on 346 degrees of freedom Multiple R-squared: 0.1667, Adjusted R-squared: 0.1643 F-statistic: 69.2 on 1 and 346 DF, p-value: 2.073e-15 > anova(lmapple) Analysis of Variance Table Response: exret_apple Df Sum Sq Mean Sq F value Pr(>F) exret_sp500 1 0.9667 0.96673 69.201 2.073e-15 *** Residuals 346 4.8336 0.01397 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > plot(x=exret_sp500,y=exret_apple,main="exret_apple~extret_sp500") > abline(lmapple) # > > > Estimating CAPM for Microsoft stock msft=read.csv("msft.csv",header=T) ret_msft=diff(log(msft$Adj.Close)) exret_msft=ret_msft-rf # Microsoft stock excess return > lmmsft=lm(exret_msft~exret_sp500) > View(lmmsft) > summary(lmmsft) Call: lm(formula = exret_msft ~ exret_sp500) Residuals: Min 1Q -0.38961 -0.04186 Median 0.00067 3Q 0.03728 Max 0.29219 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.008750 0.003862 2.266 0.0241 * exret_sp500 1.241483 0.093293 13.307 anova(lmmsft) Analysis of Variance Table Response: exret_msft Df Sum Sq exret_sp500 1 0.91181 Residuals 346 1.78154 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 Mean Sq F value Pr(>F) 0.91181 177.09 < 2.2e-16 *** 0.00515 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > plot(x=exret_sp500,y=exret_msft,main="exret_msft~extret_sp500") > abline(lmmsft) I. Definitions of Returns 2 Returns Return Rt→t+1 = Rt+1 = Pt+1 + Dt+1 −1 Pt Gross return: 1 + Rt+1 ≥ 0 Compound return over k periods Rt→t+k = (1 + Rt+1 )(1 + Rt+2 ) · · · (1 + Rt+k ) − 1 = k Y (1 + Rt+j ) − 1 j=1 3 Log Returns Log return, also called continuously compounded return rt+1 = log(1 + Rt+1 ) = log(Pt+1 + Dt+1 ) − log(Pt ) Log compound return over k periods ? rt→t+k = log(1 + Rt→t+k ) = log ? k Y ? (1 + Rt+j )? j=1 = k X log(1 + Rt+j ) = j=1 k X j=1 4 rt+j Portfolio Returns Portfolio return (i indexes assets) Rpt+1 = N X wit Rit+1 , where i=1 N X wit = 1 i=1 Equally-weighted portfolio, wit = 1/N it Value-weighted portfolio: wit = PNM VM V i=1 it Log portfolio returns rpt+1 = log(1+Rpt+1 ) = log N X i=1 5 ! wit (1 + Rit+1 ) 6= N X i=1 wit rit+1 Excess Returns Excess return is over a benchmark return (eg, Treasury bill) e Rit = Rit − R0t Corresponds to payoff on a zero-cost portfolio that goes long in asset i and short in the benchmark asset You short one dollar in the benchmark and receive $1, you use this $1 to buy asset i Log excess return e rit = rit − r0t Compound excess return over k periods e Rit→t+k = (1 + Rit→t+k ) − (1 + R0t→t+k ) Log excess return over k periods e rit→t+k = k X j=1 (rit+j − r0t+j ) = 6 k X j=1 e rit+j Real Returns Nominal (gross) return 1 + Rt+1 = $receivedt+1 $paidt Real rates of returns are defined in terms of real $’s or goods: goods receivedt+1 real 1 + Rt+1 = goods paidt Inflation: 1 + Πt+1 = CPIt+1 , CPIt where CPIt = $t goodst Therefore, real 1 + Rt+1 = 1 + Rt+1 real or, in logs, rt+1 = rt+1 − πt+1 1 + Πt+1 7 II. Review of Statistics 8 Overview Rather than get caught up in the math (probability and measure theory), we’ll look at random variables, distributions, and statistics from a computational viewpoint More specifically, rather than talk about events directly, we will refer to random variables and (implicitly) the events determined by them 9 Random Variables Definition The sample space is the set of all possible outcomes A random variable (RV), X, is a real-valued function whose values can be assigned a probability to any interval of the form (−∞, c]. Example: Log Stock Return Sample space is R = (−∞, ∞) Random variable here is X = ln(Pt + Dt ) − ln(Pt−1 ) Example: Firm Bankruptcy Sample space is {Operating, Def ault} Define X by X(O) = 0 and X(D) = 1, then X is a random variable 10 Discrete Random Variables A random variable is discrete if it takes on a countable number of values If X is discrete, it takes values x1 , x2 , . . . with the associated probabilities f (x1 ), f (x2 ), . . . The points x1 , x2 , . . . are called the points of support Example: Firm Bankruptcy If outcomes X(O) = 0 and X(D) = 1 are equal, then f (0) = f (1) = 1/2 The points of support are x1 = 0 and x2 = 1 11 Distribution and Density Functions Let F (c) give the probability that X ≤ c and call F (·) the (cumulative) distribution function (cdf) 1. They are increasing and right-continuous with left limits 2. F (−∞) = 0 and F (∞) = 1 If a RV P is discrete, then its cdf is a step function: F (c) = xi Zα/2 or p-value is less than α. 3. A joint test (Jarque-Bera test): JB = (K ∗)2 + (S ∗)2 ∼ χ22 if normality holds, where χ22 denotes a chi-squared distribution with 2 degrees of freedom. Decision rule: Reject Ho of normality if JB > χ22(α) or pvalue is less than α. Empirical properties of returns Data sources: Use packages, e.g. quantmod • Yahoo Finance: https://finance.yahoo.com/ • CRSP: Center for Research in Security Prices (Wharton WRDS) https://wrds-web.wharton.upenn.edu/wrds/ • Various web sites, e.g. Federal Reserve Bank at St. Louis https://research.stlouisfed.org/fred2/ • Data sets of textbooks: http://faculty.chicagobooth.edu/ruey.tsay/teaching/fts3/ Empirical dist of asset returns tends to be skewed to the left with heavy tails and has a higher peak than normal dist. Demonstration of Data Analysis 31 0.06 0.06 0.05 0.05 density 0.03 0.04 0.04 density 0.03 0.02 0.02 0.0 0.01 0.01 0.0 −40 −20 0 20 simple return 40 −40 −20 0 log return 20 40 Figure 1: Comparison of empirical IBM return densities (solid) with Normal densities (dashed) 32 R demonstration: Use monthly IBM stock returns from 1967 to 2008. **** Task: (a) (b) (c) (d) (e) (f) Set the working directory Load the library ‘‘fBasics’’. Compute summary (or descriptive) statistics Perform test for mean return being zero. Perform normality test using the Jaque-Bera method. Perform skewness and kurtosis tests. > setwd("C:/Users/rst/teaching/bs41202/sp2017") library(fBasics) da=read.table("m-ibm-6815.txt",header=T) > head(da) PERMNO date PRC ASKHI BIDLO RET vwretd ewretd sprtrn 1 12490 19680131 594.50 623.0 588.75 -0.051834 -0.036330 0.023902 -0.043848 2 12490 19680229 580.00 599.5 571.00 -0.022204 -0.033624 -0.056118 -0.031223 3 12490 19680329 612.50 612.5 562.00 0.056034 0.005116 -0.011218 0.009400 4 12490 19680430 677.50 677.5 630.00 0.106122 0.094148 0.143031 0.081929 5 12490 19680531 357.00 696.0 329.50 0.055793 0.027041 0.091309 0.011169 6 12490 19680628 353.75 375.0 346.50 -0.009104 0.011527 0.016225 0.009120 > dim(da) [1] 576 9 > ibm=da$RET % Simple IBM return > lnIBM ts.plot(ibm,main="Monthly IBM simple returns: 1968-2015") % Time plot > mean(ibm) [1] 0.008255663 > var(ibm) [1] 0.004909968 > skewness(ibm) [1] 0.2687105 attr(,"method") [1] "moment" > kurtosis(ibm) [1] 2.058484 attr(,"method") [1] "excess" > basicStats(ibm) ibm nobs 576.000000 NAs 0.000000 Minimum -0.261905 Maximum 0.353799 1. Quartile -0.034392 33 3. Quartile 0.048252 Mean 0.008256 Median 0.005600 Sum 4.755262 SE Mean 0.002920 LCL Mean 0.002521 UCL Mean 0.013990 Variance 0.004910 Stdev 0.070071 Skewness 0.268710 Kurtosis 2.058484 > basicStats(lnIBM) % log return lnIBM nobs 576.000000 NAs 0.000000 Minimum -0.303683 Maximum 0.302915 1. Quartile -0.034997 3. Quartile 0.047124 Mean 0.005813 Median 0.005585 Sum 3.348008 SE Mean 0.002898 LCL Mean 0.000120 UCL Mean 0.011505 Variance 0.004839 Stdev 0.069560 Skewness -0.137286 Kurtosis 1.910438 > t.test(lnIBM) %% Test mean=0 vs mean .not. zero One Sample t-test data: lnIBM t = 2.0055, df = 575, p-value = 0.04538 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.0001199015 0.0115051252 sample estimates: mean of x 0.005812513 > normalTest(lnIBM,method=’jb’) Title: Jarque - Bera Normalality Test Test Results: 34 STATISTIC: X-squared: 90.988 P VALUE: Asymptotic p Value: < 2.2e-16 > s3=skewness(lnIBM); T tst tst [1] -1.345125 > pv pv [1] 0.1785849 > k4 tst tst [1] 9.359197 >q() % quit R. 35 Chapter 2: Linear Time Series (TS) Models Financial TS: collection of a financial measurement over time Example: log return rt Data: {r1, r2, · · · , rT } (T data points) Purpose: What is the information contained in {rt}? Basic concepts • Stationarity: – Strict: distributions are time-invariant – Weak: first 2 moments are time-invariant What does weak stationarity mean in practice? Past: time plot of {rt} varies around a fixed level within a finite range! Future: the first 2 moments of future rt are the same as those of the data so that meaningful inferences can be made. • Mean (or expectation) of returns: µ = E(rt) • Variance (variability) of returns: Var(rt) = E[(rt − µ)2] • Sample mean and sample variance are used to estimate the mean and variance of returns. T T 1 X 1 X r? = rt & Var(rt) = (rt − r?)2 T t=1 T − 1 t=1 1 • Test Ho : µ = 0 vs Ha : µ 6= 0. Compute r? r? t= =r std(r?) Var(rt)/T Compare t ratio with N (0, 1) dist. Decision rule: Reject Ho of zero mean if |t| > Zα/2 or p-value is less than α. • Lag-k autocovariance: γk = Cov(rt, rt−k ) = E[(rt − µ)(rt−k − µ)]. • Serial (or auto-) correlations: ρ` = cov(rt, rt−`) var(rt) Note: ρ0 = 1 and ρk = ρ−k for k 6= 0. Why? Existence of serial correlations implies that the return is predictable, indicating market inefficiency. • Sample autocorrelation function (ACF) PT −` ρb` = t=1 (rt − r?)(rt+` − r?) , PT 2 r?) (r − t=1 t where r? is the sample mean & T is the sample size. • Test zero serial correlations (market efficiency) – Individual test: for example, Ho : ρ1 = 0 vs Ha : ρ1 6= 0 t= ρ?1 r 1/T 2 = √ T ρ?1 Asym. N (0, 1). Decision rule: Reject Ho if |t| > Zα/2 or p-value less than α. – Joint test (Ljung-Box statistics): Ho : ρ1 = · · · = ρm = 0 vs Ha : ρi 6= 0 ρ?2` Q(m) = T (T + 2) `=1 T − ` m X Asym. chi-squared dist with m degrees of freedom. Decision rule: Reject Ho if Q(m) > χ2m(α) or p-value is less than α. • Sources of serial correlations in financial TS – Nonsynchronous trading – Bid-ask bounce – Risk premium, etc. Thus, significant sample ACF does not necessarily imply market inefficiency. Example: Monthly returns of IBM stock from 1926 to 1997. • Rt: Q(5) = 5.4(0.37) and Q(10) = 14.1(0.17) • rt: Q(5) = 5.8(0.33) and Q(10) = 13.7(0.19) Remark: What is p-value? How to use it? Implication: Monthly IBM stock returns do not have significant serial correlations. Example: Monthly returns of CRSP value-weighted index from 1926 to 1997. 3 • Rt: Q(5) = 27.8 and Q(10) = 36.0 • rt: Q(5) = 26.9 and Q(10) = 32.7 All highly significant. Implication: there exist significant serial correlations in the value-weighted index returns. (Nonsynchronous trading might explain the existence of the serial correlations, among other reasons.) Similar result is also found in equal-weighted index returns. R demonstration: IBM monthly simple returns from 1968 to 2015 > da=read.table("m-ibm-6815.txt",header=T) > ibm=da$RET > acf(ibm) %% Plot not shown > m1 names(m1) [1] "acf" "type" "n.used" "lag" "series" "snames" > m1$acf [,1] [1,] 1.0000000000 % lag 0 [2,] -0.0068713539 % lag 1 [3,] -0.0002212888 .... [28,] 0.0159729906 The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. > m2 names(m2) [1] "acf" "type" "n.used" "lag" > m2$acf [,1] [1,] -0.0068713539 [2,] -0.0002685169 [3,] 0.0310623477 .... [27,] 0.0127614307 "series" "snames" > Box.test(ibm,lag=10) % Box-Pierce Q(m) test Box-Pierce test data: ibm X-squared = 7.1714, df = 10, p-value = 0.7092 > Box.test(ibm,lag=10,type=’Ljung’) % Ljung-Box Q(m) test Box-Ljung test data: ibm 4 X-squared = 7.2759, df = 10, p-value = 0.6992 Back-shift (lag) operator A useful notation in TS analysis. • Definition: Brt = rt−1 or Lrt = rt−1 • B 2rt = B(Brt) = Brt−1 = rt−2. B (or L) means time shift! Brt is the value of the series at time t − 1. Suppose that the daily log returns are Day 1 2 3 4 rt 0.017 −0.005 −0.014 0.021 Answer the following questions: • r2 = • Br3 = • B 2r5 = Question: What is B2? What are the important statistics in practice? Conditional quantities, not unconditional A proper perspective: at a time point t • Available data: {r1, r2, · · · , rt−1} ≡ Ft−1 • The return is decomposed into two parts as rt = predictable part + not predictable part = function of elements of Ft−1 + at 5 In other words, given information Ft−1 rt = µt + at = E(rt|Ft−1) + σtt – µt: conditional mean of rt – at: shock or innovation at time t – t: an iid sequence with mean zero and variance 1 – σt: conditional standard deviation (commonly called volatility in finance) Traditional TS modeling is concerned with µt: Model for µt: mean equation Volatility modeling concerns σt. Model for σt2: volatility equation Univariate TS analysis serves two purposes • a model for µt • understanding models for σt2: properties, forecasting, etc. Linear time series: rt is linear if • the predictable part is a linear function of Ft−1 • {at} are independent and have the same dist. (iid) Mathematically, it means rt can be written as rt = µ + ∞ X i=0 ψiat−i, where µ is a constant, ψ0 = 1 and {at} is an iid sequence with mean zero and well-defined distribution. 6 In the economic literature, at is the shock (or innovation) at time t and {ψi} are the impulse responses of rt. White noise: iid sequence (with finite variance), which is the building block of linear TS models. White noise is not predictable, but has zero mean and finite variance. Univariate linear time series models 1. autoregressive (AR) models 2. moving-average (MA) models 3. mixed ARMA models Example Quarterly growth rate of U.S. real gross national product (GNP), seasonally adjusted, from the second quarter of 1947 to the first quarter of 1991. An AR(3) model for the data is rt = 0.005 + 0.35rt−1 + 0.18rt−2 − 0.14rt−3 + at, σ?a = 0.01, where {at} denotes a white noise with variance σa2. Given rn, rn−1 & rn−2, we can predict rn+1 as r?n+1 = 0.005 + 0.35rn + 0.18rn−1 − 0.14rn−2. Other implications of the model? 7 In this course, we use statistical methods to find models that fit the data well for making inference, e.g. prediction. On the other hand, there exists economic theory that leads to time-series models for economic variables. For instance, consider the real business-cycle theory in macroeconomics. Under some simplifying assumptions, one can show that ln(Yt), where Yt is the output (GDP), follows an AR(2) model. See Advanced Macroeconomics by David Romer (2006, 3rd, pp. 190). Example: Monthly simple return of Center for Research in Security Prices (CRSP) equal-weighted index Rt = 0.013 + at + 0.178at−1 − 0.13at−3 + 0.135at−9, σ?a = 0.073 Checking: Q(10) = 11.4(0.122) for the residual series at. Implications of the model? Statistical significance vs economic significance. In this course, we shall discuss some reasons for the observed serial dependence in index returns. See, for example, Chapter 5 on nonsynchronous trading. Important properties of a model • Stationarity condition • Basic properties: mean, variance, serial dependence • Empirical model building: specification, estimation, & checking • Forecasting 8 Simple AR models: (Regression with lagged variables.) Motivating example: The growth rate of U.S. quarterly real GNP from 1947 to 1991. Recall that the model discussed before is rt = 0.005 + 0.35rt−1 + 0.18rt−2 − 0.14rt−3 + at, σ?a = 0.01. This is called an AR(3) model because the growth rate rt depends on the growth rates of the past three quarters. How do we specify this model from the data? Is it adequate for the data? What are the implications of the model? These are the questions we shall address in this lecture. Another example: U.S. monthly unemployment rate. AR(1) model: 1. Form: rt = φ0 + φ1rt−1 + at, where φ0 and φ1 are real numbers, which are referred to as “parameters” (to be estimated from the data in an application). For example, rt = 0.05 + 0.4rt−1 + at 2. Stationarity: necessary and sufficient condition |φ1| < 1. Why? 3. Mean: E(rt) = φ0 1−φ1 9 0.01 −0.02 −0.01 0.00 gnp 0.02 0.03 0.04 U.S. quarterly real GNP growth rate: 1947.II to 1991.I 1950 1960 1970 1980 1990 Time 0.01 0.03 −0.02 x[3:176] 0.01 0.03 −0.02 x Figure 1: U.S. quarterly growth rate of real GNP: 1947-1991 0 50 100 150 −0.02 0.00 0.00 0.04 0.02 0.04 0 x[1:175] Series x 10 15 0.0 0.4 0.8 ACF 0.01 0.03 −0.02 0.02 x[1:174] −0.02 x[2:176] Index 5 20 Lag Figure 2: Various plots of U.S. quarterly growth rate of real GNP: 1947-1991 10 4 6 rate 8 10 Monthly U.S. unemployment rate 1950 1960 1970 1980 1990 2000 2010 year Figure 3: U.S. monthly unemployment rate (total civilian, 16 and older) from January 1948 to February, 2017. 4. Alternative representation: Let E(rt) = µ be the mean of rt so that µ = φ0/(1 − φ1). Equivalently, φ0 = µ(1 − φ1). Plugging in the model, we have (rt − µ) = φ1(rt−1 − µ) + at. (1) This model also has two parameters (µ and φ1). It explicitly uses the mean of the series. It is less commonly used in the literature, but is the model representation used in R. 5. Variance: Var(rt) = σa2 . 1−φ21 6. Autocorrelations: ρ1 = φ1, ρ2 = φ21, etc. In general, ρk = φk1 and ACF ρk decays exponentially as k increases, 7. Forecast (minimum squared error): Suppose the forecast origin is n. For simplicity, we shall use the model representation in (1) 11 and write xt = rt −µ. The model then becomes xt = φ1xt−1 +at. Note that forecast of rt is simply the forecast of xt plus µ. (a) 1-step ahead forecast at time n: x?n(1) = φ1xn (b) 1-step ahead forecast error: en(1) = xn+1 − x?n(1) = an+1 Thus, an+1 is the un-predictable part of xn+1. It is the shock at time n + 1! (c) Variance of 1-step ahead forecast error: Var[en(1)] = Var(an+1) = σa2. (d) 2-step ahead forecast: x?n(2) = φ1x?n(1) = φ21xn. (e) 2-step ahead forecast error: en(2) = xn+2 − x?n(2) = an+2 + φ1an+1 (f) Variance of 2-step ahead forecast error: Var[en(2)] = (1 + φ21)σa2 which is greater than or equal to Var[en(1)], implying that uncertainty in forecasts increases as the number of steps increases. 12 (g) Behavior of multi-step ahead forecasts. In general, for the `-step ahead forecast at n, we have x?n(`) = φ`1xn, the forecast error en(`) = an+` + φ1an+`−1 + · · · + φ`−1 1 an+1 , and the variance of forecast error 2(`−1) Var[en(`)] = (1 + φ21 + · · · + φ1 )σa2. In particular, as ` → ∞, x?n(`) → 0, i.e., r?n(`) → µ. This is called the mean-reversion of the AR(1) process. The variance of forecast error approaches Var[en(`)] = 1 2 σ = Var(rt). a 1 − φ21 In practice, it means that for the long-term forecasts serial dependence is not important. The forecast is just the sample mean and the uncertainty is simply the uncertainty about the series. 8. A compact form: (1 − φ1B)rt = φ0 + at. Half-life: A common way to quantify the speed of mean reversion is the half-life, which is defined as the number of periods needed so 13 that the magnitude of the forecast becomes half of that of the forecast origin. For an AR(1) model, this mean 1 xn(k) = xn. 2 Thus, φk1 xn = 12 xn. Consequently, the half-life of the AR(1) model is k = ln(0.5) ln(|φ1 |) . For example, if φ1 = 0.5, the k = 1. If φ1 = 0.9, then k ≈ 6.58. AR(2) model: 1. Form: rt = φ0 + φ1rt−1 + φ2rt−2 + at, or (1 − φ1B − φ2B 2)rt = φ0 + at. 2. Stationarity condition: (factor of polynomial) 3. Characteristic equation: (1 − φ1x − φ2x2) = 0 4. Mean: E(rt) = φ0 1−φ1 −φ2 5. Mean-adjusted format: Using φ0 = µ − φ1µ − φ2µ, we can write the AR(2) model as (rt − µ) = φ1(rt−1 − µ) + φ2(rt−2 − µ) + at. This form is often used in the finance literature to highlight the mean-reverting property of a stationary AR(2) model. 6. ACF: ρ0 = 1, ρ1 = φ1 1−φ2 , ρ` = φ1ρ`−1 + φ2ρ`−1, 14 ` ≥ 2. 7. Stochastic business cycle: if φ21 + 4φ2 < 0, then rt shows characteristics of business cycles with average length 2π √ k= , cos−1[φ1/(2 −φ2)] where the cosine inverse is stated in radian. If we denote the √ solutions of the polynomial as a ± bi, where i = −1, then we have φ1 = 2a and φ2 = −(a2 + b2) so that 2π √ . cos−1(a/ a2 + b2) √ In R or S-Plus, one can obtain a2 + b2 using the command k= Mod. 8. Forecasts: Similar to AR(1) models Simulation in R: Use the command arima.sim 1. y1=arima.sim(model=list(ar=c(1.3,-.4)),1000) 2. y2=arima.sim(model=list(ar=c(.8,-.7)),1000) Check the ACF and PACF of the above two simulated series. Discussion: (Reference only) An AR(2) model can be written as an AR(1) model if one expands the dimension. Specifically, we have rt − µ = φ1(rt−1 − µ) + φ2(rt−2 − µ) + at rt−1 − µ = rt−1 − µ, (an identity.) 15 Now, putting the two equations together, we have ? ? ? ? rt − µ rt−1 − µ ? ? ? ? ? = ?? ? φ1 φ2 1 0 ?? ?? ?? ?? ? ? ? + ?? rt−1 − µ ?? rt−2 − µ ? at 0 ? ? ? ? . This is a 2-dimensional AR(1) model. Several properties of the AR(2) model can be obtained from the expanded AR(1) model. Building an AR model • Order specification 1. Partial ACF: (naive, but effective) – Use consecutive fittings – See Text (p. 40) for details – Key feature: PACF cuts off at lag p for an AR(p) model. – Illustration: See the PACF of the U.S. quarterly growth rate of GNP. 2. Akaike information criterion 2` , T for an AR(`) model, where σ?`2 is the MLE of residual variAIC(`) = ln(σ?`2) + ance. Find the AR order with minimum AIC for ` ∈ [0, · · · , P ]. 3. BIC criterion: BIC(`) = ln(σ?`2) + 16 ` ln(T ) . T −0.1 0.0 Partial ACF 0.1 0.2 0.3 Series : dgnp 0 5 10 Lag 15 20 R command: ar(rt, method=’’mle’’,order.max=12) • Needs a constant term? Check the sample mean. • Estimation: least squares method or maximum likelihood method • Model checking: 1. Residual: obs minus the fit, i.e. 1-step ahead forecast errors at each time point. 2. Residual should be close to white noise if the model is adequate. Use Ljung-Box statistics of residuals, but degrees of freedom is m − g, where g is the number of AR coefficients used in the model. 17 Example: Analysis of U.S. GNP growth rate series. R demonstration: > setwd("your working directory") > library(fBasics) > da=read.table("dgnp82.dat") > x=da[,1] > par(mfcol=c(2,2)) % put 4 plots on a page ### See Figure 2 of the lecture note 2. > plot(x,type=’l’) % first plot > plot(x[1:175],x[2:176]) % 2nd plot > plot(x[1:174],x[3:176]) % 3rd plot > acf(x,lag=12) % 4th plot > pacf(x,lag.max=12) % Compute PACF (not shown in this handout) > Box.test(x,lag=10,type=’Ljung’) % Compute Q(10) statistics Box-Ljung test data: x X-squared = 43.2345, df = 10, p-value = 4.515e-06 > m1=ar(x,method=’mle’) % Automatic AR fitting using AIC criterion. > m1 Call: ar(x = x, method = "mle") Coefficients: 1 2 3 % An AR(3) is specified. 0.3480 0.1793 -0.1423 Order selected 3 > names(m1) [1] "order" [6] "n.used" [11] "series" sigma^2 estimated as "ar" "order.max" "frequency" > plot(m1$resid,type=’l’) 9.427e-05 "var.pred" "partialacf" "call" "x.mean" "aic" "resid" "method" "asy.var.coef" % Plot residuals of the fitted model (not shown) > Box.test(m1$resid,lag=10,type=’Ljung’) % Model checking Box-Ljung test data: m1$resid X-squared = 7.0808, df = 10, p-value = 0.7178 > m2=arima(x,order=c(3,0,0)) % Another approach with order given. > m2 Call: arima(x = x, order = c(3, 0, 0)) Coefficients: 18 s.e. ar1 0.3480 0.0745 ar2 0.1793 0.0778 ar3 -0.1423 0.0745 intercept 0.0077 0.0012 % Fitted model is % y(t)=0.348y(t-1)+0.179y(t-2) % -0.142y(t-3)+a(t), % where y(t) = x(t)-0.0077 sigma^2 estimated as 9.427e-05: log likelihood = 565.84, aic = -1121.68 > names(m2) [1] "coef" "sigma2" "var.coef" "mask" "loglik" "aic" [7] "arma" "residuals" "call" "series" "code" "n.cond" [13] "model" > Box.test(m2$residuals,lag=10,type=’Ljung’) Box-Ljung test data: m2$residuals X-squared = 7.0169, df = 10, p-value = 0.7239 > ts.plot(m2$residuals) % Residual plot > tsdiag(m2) % obtain 3 plots of model checking (not shown in handout). > p1=c(1,-m2$coef[1:3]) % Further analysis of the fitted model. > roots=polyroot(p1) > roots [1] 1.590253+1.063882e+00i -1.920152-3.530887e-17i 1.590253-1.063882e+00i > Mod(roots) [1] 1.913308 1.920152 1.913308 > k=2*pi/acos(1.590253/1.913308) > k [1] 10.65638 > predict(m2,8) % Prediction 1-step to 8-step ahead. $pred Time Series: Start = 177 End = 184 Frequency = 1 [1] 0.001236254 0.004555519 0.007454906 0.007958518 [5] 0.008181442 0.007936845 0.007820046 0.007703826 $se Time Series: Start = 177 End = 184 Frequency = 1 [1] 0.009709322 0.010280510 0.010686305 0.010688994 19 [5] 0.010689733 0.010694771 0.010695511 0.010696190 Another example: Monthly U.S. unemployment rate from January 1948 to February, 2017. I use this example to emphasize two messages: (1) Modeling and prediction using AR models, including model simplification; (2) handling outliers. Demonstration: > > > > > require(quantmod) get Symbols("UNRATE",src="FRED") chartSeries(UNRATE) unrate tdx plot(tdx,unrate,type=’l’,xlab=’year’,ylab=’rate’) > title(main="Monthly U.S. unemployment rate") > ar(unrate,method="mle") Call:ar(x = unrate, method = "mle") Coefficients: 1 2 3 4 5 6 7 8 0.9946 0.2152 -0.0713 -0.0533 0.0494 -0.1275 -0.0610 0.0513 9 10 11 -0.0077 -0.1048 0.1003 Order selected 11 sigma^2 estimated as 0.03719 > m1 m1 Call:arima(x = unrate, order = c(11, 0, 0)) Coefficients: ar1 ar2 ar3 ar4 ar5 0.9945 0.2152 -0.0712 -0.0532 0.0493 s.e. 0.0346 0.0488 0.0495 0.0495 0.0497 ar9 ar10 ar11 intercept 20 ar6 -0.1275 0.0495 ar7 -0.0610 0.0496 ar8 0.0513 0.0496 s.e. -0.0077 0.0496 -0.1047 0.0490 0.1004 0.0348 5.6715 0.4417 sigma^2 estimated as 0.03718: log likelihood = 186.03, aic = -346.07 > names(m1) [1] "coef" "sigma2" "var.coef" "mask" "loglik" "aic" [7] "arma" "residuals" "call" "series" "code" "n.cond" [13] "nobs" "model" > tsdiag(m1,gof=24) > c1 m2 m2 Call:arima(x = unrate, order = c(11, 0, 0), fixed = c1) Coefficients: ar1 ar2 ar3 0.9967 0.2045 -0.0800 s.e. 0.0343 0.0481 0.0427 ar11 intercept 0.0998 5.6702 s.e. 0.0342 0.4416 ar4 0 0 ar5 0 0 ar6 -0.1369 0.0291 ar7 0 0 sigma^2 estimated as 0.03733: log likelihood = 184.37, > tsdiag(m2) > tsdiag(m2,gof=24) > pm2 names(pm2) [1] "pred" "se" > low upp names(pm2) [1] "pred" "se" > pm2$pred Time Series: Start = 831 End = 834 Frequency = 1 [1] 4.737312 4.710012 4.745765 4.759146 > pm2$se Time Series: Start = 831 End = 834 Frequency = 1 [1] 0.1932128 0.2727943 0.3577585 0.4391267 > low Time Series: 21 ar8 0 0 ar9 0 0 ar10 -0.0989 0.0407 aic = -352.74 Start = 831 End = 834 Frequency = 1 [1] 4.358614 4.175335 4.044559 3.898457 > upp Time Series: Start = 831 End = 834 Frequency = 1 [1] 5.116009 5.244688 5.446972 5.619834 ######################## Handling outliers > which.min(m2$residuals) ### locate the minimum of residuals [1] 23 > I23 I23[23] c1 m3 m3 Call: arima(x = unrate, order = c(11, 0, 0), xreg = I23, fixed = c1) Coefficients: ar1 ar2 ar3 ar4 1.0449 0.1219 -0.0472 0 s.e. 0.0349 0.0515 0.0434 0 ar11 intercept I23 0.1025 5.6709 -0.7749 s.e. 0.0345 0.4428 0.1338 ar5 0 0 ar6 -0.1345 0.0287 ar7 0 0 ar8 0 0 ar9 0 0 ar10 -0.1021 0.0413 sigma^2 estimated as 0.03591: log likelihood = 200.43, aic = -382.87 > tsdiag(m3,gof=24) > which.max(m3$residuals) ### locate the maximum of the residuals [1] 22 > I22 I22[22] c1 X m4 m4 Call:arima(x = unrate, order = c(11, 0, 0), xreg = X, fixed = c1) Coefficients: ar1 ar2 ar3 ar4 ar5 ar6 1.0764 0.1170 -0.0955 0 0 -0.1069 s.e. 0.0346 0.0507 0.0431 0 0 0.0283 ar11 intercept I23 I22 0.0901 5.6690 -0.2580 1.1729 22 ar7 0 0 ar8 0 0 ar9 0 0 ar10 -0.0951 0.0416 −8 −4 0 4 Standardized Residuals 0 200 400 600 800 Time ACF 0.0 0.4 0.8 ACF of Residuals 0 5 10 15 20 25 30 Lag p value 0.0 0.4 0.8 p values for Ljung−Box statistic 5 10 15 20 lag Figure 4: Model checking for AR(11) model fitted to UNRATE series. s.e. 0.0348 0.4388 0.1367 sigma^2 estimated as 0.03305: > tsdiag(m4,gof=24) 0.1375 log likelihood = 234.86, aic = -449.73 Moving-average (MA) model Model with finite memory! Some daily stock returns have minor serial correlations and can be modeled as MA or AR models. MA(1) model • Form: rt = µ + at − θat−1 • Stationarity: always stationary. • Mean (or expectation): E(rt) = µ • Variance: Var(rt) = (1 + θ2)σa2. • Autocovariance: 23 1. Lag 1: Cov(rt, rt−1) = −θσa2 2. Lag `: Cov(rt, rt−`) = 0 for ` > 1. Thus, rt is not related to rt−2, rt−3, · · ·. • ACF: ρ1 = −θ , 1+θ2 ρ` = 0 for ` > 1. Finite memory! MA(1) models do not remember what happen two time periods ago. • Forecast (at origin t = n): 1. 1-step ahead: r?n(1) = µ − θan. Why? Because at time n, an is known, but an+1 is not. 2. 1-step ahead forecast error: en(1) = an+1 with variance σa2. 3. Multi-step ahead: r?n(`) = µ for ` > 1. Thus, for an MA(1) model, the multi-step ahead forecasts are just the mean of the series. Why? Because the model has memory of 1 time period. 4. Multi-step ahead forecast error: en(`) = an+` − θan+`−1 5. Variance of multi-step ahead forecast error: (1 + θ2)σa2 = variance of rt. • Invertibility: – Concept: rt is a proper linear combination of at and the past observations {rt−1, rt−2, · · ·}. 24 – Why is it important? It provides a simple way to obtain the shock at. For an invertible model, the dependence of rt on rt−` converges to zero as ` increases. – Condition: |θ| < 1. – Invertibility of MA models is the dual property of stationarity for AR models. MA(2) model • Form: rt = µ + at − θ1at−1 − θ2at−2. or rt = µ + (1 − θ1B − θ2B 2)at. • Stationary with E(rt) = µ. • Variance: Var(rt) = (1 + θ12 + θ22)σa2. • ACF: ρ2 = 6 0,but ρ` = 0 for ` > 2. • Forecasts go the the mean after 2 periods. Building an MA model • Specification: Use sample ACF Sample ACFs are all small after lag q for an MA(q) series. (See test of ACF.) • Constant term? Check the sample mean. 25 • Estimation: use maximum likelihood method – Conditional: Assume at = 0 for t ≤ 0 – Exact: Treat at with t ≤ 0 as parameters, estimate them to obtain the likelihood function. Exact method is preferred, but it is more computing intensive. • Model checking: examine residuals (to be white noise) • Forecast: use the residuals as {at} (which can be obtained from the data and fitted parameters) to perform forecasts. Model form in R: R parameterizes the MA(q) model as rt = µ + at + θ1at−1 + · · · + θq at−q , instead of the usual minus sign in θ. Consequently, care needs to be exercised in writing down a fitted MA parameter in R. For instance, an estimate θ?1 = −0.5 of an MA(1) in R indicates the model is rt = at − 0.5at−1. Example:Daily log return of the value-weighted index R demonstration > setwd("your working directory") > library(fBasics) > da=read.table("d-ibmvwew6202.txt") > dim(da) [1] 10194 4 > > > > vw=log(1+da[,3])*100 % Compute percentage log returns of the vw index. acf(vw,lag.max=10) % ACF plot is not shon in this handout. m1=arima(vw,order=c(0,0,1)) % fits an MA(1) model m1 26 Call: arima(x = vw, order = c(0, 0, 1)) Coefficients: ma1 intercept 0.1465 0.0396 % The model is vw(t) = 0.0396+a(t)+0.1465a(t-1). s.e. 0.0099 0.0100 sigma^2 estimated as 0.7785: log likelihood = -13188.48, aic = 26382.96 > tsdiag(m1) > predict(m1,5) $pred Time Series: Start = 10195 End = 10199 Frequency = 1 [1] 0.05036298 0.03960887 0.03960887 0.03960887 0.03960887 $se Time Series: Start = 10195 End = 10199 Frequency = 1 [1] 0.8823290 0.8917523 0.8917523 0.8917523 0.8917523 Mixed ARMA model: A compact form for flexible models. Focus on the ARMA(1,1) model for 1. simplicity 2. useful for understanding GARCH models in Ch. 3 for volatility modeling. ARMA(1,1) model • Form: (1 − φ1B)rt = φ0 + (1 − θB)at or rt = φ1rt−1 + φ0 + at − θ1at−1. 27 A combination of an AR(1) on the LHS and an MA(1) on the RHS. • Stationarity: same as AR(1) • Invertibility: same as MA(1) • Mean: as AR(1), i.e. E(rt) = φ0 1−φ1 • Variance: given in the text • ACF: Satisfies ρk = φ1ρk−1 for k > 1, but ρ1 = φ1 − [θ1σa2/Var(rt)] 6= φ1. This is the difference between AR(1) and ARMA(1,1) models. • PACF: does not cut off at finite lags. Building an ARMA(1,1) model • Specification: use EACF or AIC • Use the command auto.arima of the package forecast. • Estimation: cond. or exact likelihood method • Model checking: as before • Forecast: MA(1) affects the 1-step ahead forecast. Others are similar to those of AR(1) models. Three model representations: 28 • ARMA form: compact, useful in estimation and forecasting • AR representation: (by long division) rt = φ0 + at + π1rt−1 + π2rt−2 + · · · It tells how rt depends on its past values. • MA representation: (by long division) rt = µ + at + ψ1at−1 + ψ2at−2 + · · · It tells how rt depends on the past shocks. For a stationary series, ψi converges to zero as i → ∞. Thus, the effect of any shock is transitory. The MA representation is particularly useful in computing variances of forecast errors. For a `-step ahead forecast, the forecast error is en(`) = an+` + ψ1an+`−1 + · · · + ψ`−1an+1. The variance of forecast error is 2 Var[en(`)] = (1 + ψ12 + · · · + ψ`−1 )σa2. Unit-root Nonstationarity Random walk • Form pt = pt−1 + at • Unit root? It is an AR(1) model with coefficient φ1 = 1. 29 • Nonstationary: Why? Because the variance of rt diverges to infinity as t increases. • Strong memory: sample ACF approaches 1 for any finite lag. • Repeated substitution shows pt = ∞ X i=0 at−i = ∞ X i=0 ψiat−i where ψi = 1 for all i. Thus, ψi does not converge to zero. The effect of any shock is permanent. Random walk with drift • Form: pt = µ + pt−1 + at, µ 6= 0. • Has a unit root • Nonstationary • Strong memory • Has a time trend with slope µ. Why? differencing • 1st difference: rt = pt − pt−1 If pt is the log price, then the 1st difference is simply the log return. Typically, 1st difference means the “change” or “increment” of the original series. 30 • Seasonal difference: yt = pt − pt−s, where s is the periodicity, e.g. s = 4 for quarterly series and s = 12 for monthly series. If pt denotes quarterly earnings, then yt is the change in earning from the same quarter one year before. Meaning of the constant term in a model • MA model: mean • AR model: related to mean • 1st differenced: time slope, etc. Practical implication in financial time series Example: Monthly log returns of General Electrics (GE) from 1926 to 1999 (74 years) Sample mean: 1.04%, std(µ?) = 0.26 Very significant! is about 12.45% a year $1 investment in the beginning of 1926 is worth • annual compounded payment: $5907 • quarterly compounded payment: $8720 • monthly compounded payment: $9570 • Continuously compounded? 31 Unit-root test Let pt be the log price of an asset. To test that pt is not predictable (i.e. has a unit root), two models are commonly employed: pt = φ1pt−1 + et pt = φ0 + φ1pt−1 + et. The hypothesis of interest is Ho : φ1 = 1 vs Ha : φ1 < 1. Dickey-Fuller test is the usual t-ratio of the OLS estimate of φ1 being 1. This is the DF unit-root test. The t-ratio, however, has a nonstandard limiting distribution. Let ?pt = pt − pt−1. Then, the augmented DF unit-root test for an AR(p) model is based on ?pt = ct + βpt−1 + p−1 X i=1 φi?pt−i + et. The t-ratio of the OLS estimate of β is the ADF unit-root test statistic. Again, the statistic has a non-standard limiting distribution. Example: Consider the log series of U.S. quaterly real GDP series from 1947.I to 2009.IV. (data from Federal Reserve Bank of St. Louis). See q-gdpc96.txt on the course web. R demonstration > library(fUnitRoots) > help(UnitrootTests) % See the tests available >da=read.table(‘‘q-gdpc96.txt’’,header=T) >gdp=log(da[,4]) > adfTest(gdp,lag=4,type=c("c")) #Assume an AR(4) model for the series. Title: Augmented Dickey-Fuller Test 32 Test Results: PARAMETER: Lag Order: 4 STATISTIC: Dickey-Fuller: -1.7433 P VALUE: 0.4076 # cannot reject the null hypothesis of a unit root. *** A more careful analysis > x=diff(gdp) > ord=ar(x) # identify an AR model for the differenced series. > ord Call:ar(x = x) Coefficients: 1 2 0.3429 0.1238 3 -0.1226 Order selected 3 sigma^2 estimated as 8.522e-05 > # An AR(3) for the differenced data is confirmed. # Our previous analysis is justified. Discussion: The command arima on R. 1. Dealing with the constant term. If there is any differencing, no constant is used. The subcommand include.mean=T in the arima command. 2. Fixing some parameters. Use subcommand fixed in arima. See the unemployment rate series used in AR modeling. 33 Week 12: R Program > setwd("~/Dropbox/Teaching/EF4822_Spring2021") > da=read.csv("PredictorData2018part.csv") > head(da) yyyy Index D12 E12 b.m tbl AAA BAA lty cay 1 1927 17.66 0.77 1.11 0.3746886 0.0317 0.0446 0.0532 0.0316 NaN 2 1928 24.35 0.85 1.38 0.2596667 0.0426 0.0461 0.0560 0.0340 NaN 3 1929 21.45 0.97 1.61 0.3384578 0.0303 0.0467 0.0595 0.0340 NaN 4 1930 15.34 0.98 0.97 0.5547454 0.0148 0.0452 0.0671 0.0330 NaN 5 1931 8.12 0.82 0.61 1.1707317 0.0241 0.0532 0.1042 0.0407 NaN 6 1932 6.89 0.50 0.41 1.4420843 0.0004 0.0459 0.0842 0.0315 NaN ntis Rfree infl eqis ltr corpr 1 0.076474752 0.0317 -0.022598870 0.26551235 0.089448628 0.07443637 2 0.063068738 0.0426 -0.011560694 0.49742929 0.000827246 0.02841156 3 0.163522172 0.0303 0.005847953 0.72059294 0.034099467 0.03273004 4 0.113885891 0.0148 -0.063953488 0.30784749 0.046429195 0.07975053 5 -0.012944196 0.0241 -0.093167702 0.14466470 -0.053157349 -0.01850982 6 -0.005031571 0.0004 -0.102739726 0.03726708 0.168452113 0.10820224 svar csp ik CRSP_SPvw CRSP_SPvwx 1 0.009419065 NaN NaN 0.35879164 0.2945602 2 0.019799325 NaN NaN 0.38844041 0.3331307 3 0.124614012 NaN NaN -0.08834698 -0.1213454 4 0.066648919 NaN NaN -0.26302852 -0.2958606 5 0.159402740 NaN NaN -0.45525321 -0.4892035 6 0.307451657 NaN NaN -0.08890738 -0.1483694 > > > > > > > CRSP_SPvw=da[,20] Rfree=da[,12] exret=CRSP_SPvw-Rfree D12=da[,3] Index=da[,2] dp=log(D12/Index) bm=da[,5] # stock market excess return # log dividend-to-price ratio # book-to-market ratio > T=length(exret) # > > > use log dividend-to-price ratio to predict market excess return lmdp=lm(exret[2:T]~dp[1:T-1]) View(lmdp) summary(lmdp) Call: lm(formula = exret[2:T] ~ dp[1:T - 1]) Residuals: Min 1Q -0.60678 -0.13020 Median 0.02396 3Q 0.14358 Max 0.39421 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.33301 0.15107 2.204 0.0301 * dp[1:T - 1] 0.07474 0.04429 1.688 0.0950 . --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1973 on 89 degrees of freedom Multiple R-squared: 0.03101, Adjusted R-squared: 0.02012 F-statistic: 2.848 on 1 and 89 DF, p-value: 0.09498 > anova(lmdp) Analysis of Variance Table Response: exret[2:T] Df Sum Sq Mean Sq F value Pr(>F) dp[1:T - 1] 1 0.1109 0.110873 2.8481 0.09498 . Residuals 89 3.4646 0.038928 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > plot(x=dp[1:T-1],y=exret[2:T],main="exret~dp") > abline(lm(exret[2:T]~dp[1:T-1])) > time = 1928:2018 > plot(x=time,y=exret[2:T],type="l") > lines(time,lmdp$fitted.values,col="blue") # use book-to-market ratio to predict market excess return > lmbm=lm(exret[2:T]~bm[1:T-1]) > summary(lmbm) Call: lm(formula = exret[2:T] ~ bm[1:T - 1]) Residuals: Min 1Q -0.5587 -0.1417 Median 0.0096 3Q 0.1400 Max 0.4011 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.01411 0.04853 -0.291 0.7719 bm[1:T - 1] 0.16851 0.07839 2.150 0.0343 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1954 on 89 degrees of freedom Multiple R-squared: 0.04936, Adjusted R-squared: 0.03868 F-statistic: 4.621 on 1 and 89 DF, p-value: 0.03429 > anova(lmbm) Analysis of Variance Table Response: exret[2:T] Df Sum Sq Mean Sq F value Pr(>F) bm[1:T - 1] 1 0.1765 0.176487 4.6212 0.03429 * Residuals 89 3.3990 0.038191 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > plot(x=bm[1:T-1],y=exret[2:T],main="exret~bm") > abline(lm(exret[2:T]~bm[1:T-1])) > time = 1928:2018 > plot(x=time,y=exret[2:T],type="l") > lines(time,lmbm$fitted.values,col="blue") Part I: Multiple Choice Questions (20 Questions worth 40 points in total, 2 points each) 1. Which of the following is false for a white noise process? a. zero mean b. finite variance c. independent and identically distributed (iid) d. volatility clustering 2. Which is/are time-invariant for a weakly stationary time series? a. mean b. variance c. autocovariance d. both a and b e. a, b and c 3. When will the ARMA(1,1) model (???? = φ0 + φ1 ????−1 + ???? − θ1 ????−1) be weakly stationary? a. always weakly stationary b. when |φ1 | < 1 c. when |θ1 | < 1 d. when |φ1 | < 1 and |θ1 | < 1 e. none of the above is correct 4. Which of the following property is not true for a random walk process? a. has a unit root b. nonstationary c. strong memory d. predictable 5. Which tests whether a time series has a unit root? a. Jarque-Bera test b. Ljung-Box test c. Augmented Dicky-Fuller (ADF) test d. Lagrange Multiplier (LM) test 3 Please read the following R scripts for the analysis of a time series (x) and answer questions from 6 to 14. > setwd("C:/Users/dingluo/teaching/ef4822/spring2021") > library(fBasics) > da=read.table("dgnp82.txt") > x=da[,1] > s3=skewness(x) > T=length(x) > tst=s3/sqrt(6/T) > tst [1] -0.8245784 > pv=2*pnorm(tst) > pv [1] 0.409611 > k4=kurtosis(x) > tst=k4/sqrt(24/T) > tst [1] 1.118305 > pv=2*pnorm(-tst) > pv [1] 0.2634367 > Box.test(x,lag=10) Box-Pierce test data: x X-squared = 42.265, df = 10, p-value = 6.727e-06 > mm=acf(x) > mm$acf [,1] [1,] 1.00000000 [2,] 0.37687036 [3,] 0.25391195 [4,] 0.01252511 …… [23,] -0.02155372 > acf(x,lag=12) > pacf(x,lag.max=12) > m1=ar(x,method='mle') > m1 Call: ar(x = x, method = "mle") Coefficients: 1 2 3 4 0.3480 0.1793 -0.1423 Order selected 3 sigma^2 estimated as 9.427e-05 > m2=arima(x,order=c(3,0,0)) > m2 Call: arima(x = x, order = c(3, 0, 0)) Coefficients: ar1 ar2 ar3 intercept 0.3480 0.1793 -0.1423 0.0077 s.e. 0.0745 0.0778 0.0745 0.0012 sigma^2 estimated as 9.427e-05: log likelihood = 565.84, aic = -1121.68 > Box.test(m2$residuals,lag=10,type='Ljung') Box-Ljung test data: m2$residuals X-squared = 7.0169, df = 10, p-value = 0.7239 > ts.plot(m2$residuals) > tsdiag(m2) > predict(m2,8) $pred Time Series: Start = 177 End = 184 Frequency = 1 [1] 0.001236254 0.004555519 0.007454906 0.007958518 [5] 0.008181442 0.007936845 0.007820046 0.007703826 $se Time Series: Start = 177 End = 184 Frequency = 1 [1] 0.009709322 0.010280510 0.010686305 0.010688994 [5] 0.010689733 0.010694771 0.010695511 0.010696190 6. Which of the following is equivalent to the line of command for skewness test: pv=2*pnorm(tst)? a. pv=2*pnorm(-tst) b. pv=2*(1-pnorm(-tst)) c. pv=2*(1-pnorm(tst)) d. none of the above is correct 5 7. Does the distribution of the time series (x) have heavy tails at 5% significance level? a. yes b. no c. not sure 8. Does the time series (x) have significant autocorrelations for the first 10 lags at 5% significance level? a. yes b. no c. not sure 9. How many data observations does the time series (x) have? a. 175 b. 176 c. 177 d. 178 10. Which is the lag-3 autocorrelation of the time series (x)? a. 1.00000000 b. 0.37687036 c. 0.25391195 d. 0.01252511 11. Which is the mean of the time series (x)? a. 0.0077 b. 0.01180982 c. 0.02330508 d. 0.01252033 12. Which is the variance of the time series (x)? a. 0.00009427 b. 0.0001072596 c. 0.0001140595 d. 0.0001087317 6 13. Is the estimated model adequate? a. yes b. no c. not sure 14. Which is the 3-step ahead forecast for the time series (x)? a. 0.001236254 b. 0.007454906 c. 0.009709322 d. 0.010686305 Please read the following R scripts for estimating CAPM for IBM stock and answer questions from 15 to 20. > setwd("~/Dropbox/Teaching/EF4822_Spring2021") > sp500=read.csv("sp500.csv",header=T) > head(sp500) Date Open 1 1/1/1990 353.40 2 1/2/1990 329.08 3 1/3/1990 331.89 4 1/4/1990 339.94 5 1/5/1990 330.80 6 1/6/1990 3 61.26 > tail(sp500) Date 344 1/8/2018 345 1/9/2018 346 1/10/2018 347 1/11/2018 348 1/12/2018 349 1/1/2019 High 360.59 336.09 344.49 347.30 362.26 368.78 Open 2821.17 2896.96 2926.29 2717.58 2790.50 2476.96 Low 319.83 322.10 331.08 327.76 330.80 351.23 High 2916.50 2940.91 2939.86 2815.15 2800.18 2675.47 Close Adj.Close Volume 329.08 329.08 3793250000 331.89 331.89 2961970000 339.94 339.94 3283280000 330.80 330.80 2801220000 361.23 361.23 3596680000 358.02 358.02 3226280000 Low 2796.34 2864.12 2603.54 2631.09 2346.58 2443.96 Close Adj.Close 2901.52 2901.52 2913.98 2913.98 2711.74 2711.74 2760.17 2760.17 2506.85 2506.85 2638.70 2638.70 Volume 69238220000 62492080000 91327930000 80080110000 83519570000 57251830000 > ret_sp500=diff(log(sp500$Adj.Close)) # S&P 500 index log return > ts.plot(ret_sp500) > tb3m=read.csv("tb3m.csv",header=T) > rf=tb3m[,2] # 3-month T-bill rate as Risk-free rate > exret_sp500=ret_sp500-rf # S&P 500 index excess return 7 # Estimating CAPM for IBM stock > ibm =read.csv("ibm.csv",header=T) > ret_ibm=diff(log(ibm$Adj.Close)) > exret_ibm=ret_ibm-rf # IBM stock excess return > lmibm=lm(exret_ibm~exret_sp500) > View(lmibm) > summary(lmibm) Call: lm(formula = exret_ibm ~ exret_sp500) Residuals: Min 1Q Median 3Q Max -0.80346 -0.05755 0.00549 0.05914 0.31825 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.013770 0.006361 2.165 0.0311 * exret_sp500 1.278327 0.153669 8.319 2.07e-15 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1182 on 346 degrees of freedom Multiple R-squared: 0.1667, Adjusted R-squared: 0.1643 F-statistic: 69.2 on 1 and 346 DF, p-value: 2.073e-15 > anova(lmibm) Analysis of Variance Table Response: exret_ibm Df Sum Sq Mean Sq F value Pr(>F) exret_sp500 1 0.9667 0.96673 69.201 2.073e-15 *** Residuals 346 4.8336 0.01397 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > plot(x=exret_sp500,y=exret_ibm,main="exret_ibm~extret_sp500") > abline(lmibm) 8 15. How many data observations does the variable ret_sp500 have? a. 346 b. 347 c. 348 d. 349 e. not sure 16. Is IBM stock excess return positively or negatively correlated with S&P 500 index excess return? a. positively b. negatively c. zero correlation d. not sure 17. Is the CAPM beta for IBM stock significant at 5% level? a. yes b. no c. not sure ?? 18. For the estimated CAPM model for IBM stock, we could write the model as ????????,?? = ??? + ?? ??? × ????????,?? + ????? . Which of the following is/are true? a. ??? = 0.013770 b. the mean of ????? is zero c. the total sum of squares is 5.8003 d. both a and b e. a, b, and c are all true 19. What fraction of variance in the excess return of IBM stock is due to systematic risk? a. 83.33% b. 16.67% c. 16.43% d. 83.57% 20. Which is the command to estimate the CAPM model for IBM stock? a. abline(lmibm) b. lmibm=lm(exret_ibm~exret_sp500) c. summary(lmibm) d. anova(lmibm) 9 Part II: Long Questions (60 points) Notes: Please show all relevant steps in deriving the final answers. 1. (30 points) Suppose Company 1's stock return ?? is a random variable and takes three possible values: {-0.1, 0.1, 0.2}. And Company 2's stock return ?? is a random variable and takes two possible values: {-0.3, 0.4}. The joint probability distribution ??(??, ??) is given as follows: ??(−0.1, −0.3) = 0.1, ??(0.1, −0.3) = 0.2, ??(0.2, −0.3) = 0.2, ??(−0.1,0.4) = 0.2, ??(0.1,0.4) = 0.2, ??(0.2,0.4) = 0.1. Please calculate the following: (a) Marginal distributions: ???? (??) and ???? (??). (4 points) (b) Mean: ??(??) and ??(??). (4 points) (c) Variance: ??????(??) and ??????(??). (4 points) (d) Covariance: ??????(??, ??). (2 points) (e) Conditional expectations: ??(??|?? = −0.3) and ??(??|?? = 0.4). (8 points) (f) Conditional variances: ??????(??|?? = −0.3) and ??????(??|?? = 0.4). (8 points) 2. (30 points) Suppose that the monthly log return of a security ???? follows the model ???? = 0.03 + 0.1????−3 + ???? + 0.2????−3 , where {???? } is a Gaussian white noise series with mean zero and variance 0.01. (a) Compute the mean and variance of the return series. (6 points) (b) Compute the autocorrelations of the return series for all lags. (14 points) (c) Compute the 1-step- and all the multistep-ahead forecasts of the return at the forecast origin ?? = ?. (10 points) 10 Formula Sheet Note: ??, ?? and ?? are constants and ???? is a time series in the following formulas. 1. Mean: ??(?????? ) = ????(???? ) 2. Mean: ??(?? + ???? ) = ?? + ??(???? ) 2 3. Variance: ??????(???? ) = ?? ?????? − ??(???? )? ? 4. Variance: ??????(?????? ) = ??2 ??????(???? ) 5. Variance: ??????(??+???? ) = ??????(???? ) 6. Variance: ??????(?????? + ??????−?? ) = ??2 ??????(???? ) + ?? 2 ??????(????−?? ) + 2??????????(???? , ????−?? ) 7. Variance: ??????(???? + ????−?? ) = ??????(???? ) + ??????(????−?? ) + 2??????(???? , ????−?? ), (i.e., ?? = ?? = 1 in 6) 8. Variance: ??????(?????? + ??????−?? + ??????+?? ) = ??2 ??????(???? ) + ?? 2 ??????(????−?? ) + ?? 2 ??????(????+?? ) + 2??????????(???? , ????−?? ) + 2??????????(???? , ????+?? ) + 2??????????(????−?? , ????+?? ) 9. Covariance: ??????(???? , ????−?? ) = ??[(???? − ??(???? ))(????−?? − ??(???? ))] 10. Covariance: ??????(???? , ????−?? ) = ??(???? ????−?? ) − ??(???? )??(????−?? ) 11. 12. 13. 14. Covariance: ??????(?? + ???? , ????−?? ) = ??????(???? , ????−?? ) Covariance: ??????(?????? , ????−?? ) = ????????(???? , ????−?? ) Covariance: ??????(?????? + ??????+?? , ??????−?? ) = ??????????(???? , ????−?? ) + ??????????(????+?? , ????−?? ) Covariance: ??????(???? + ????+?? , ????−?? ) = ??????(???? , ????−?? ) + ??????(????+?? , ????−?? ), (i.e., ?? = ?? = ?? = 1 in 13) 15. Lag-?? autocorrelation: ???? = ??????(???? , ????−?? )/??????(???? ) 16. Conditional Expectation: For any ???? which is known at time ??, ??(???? |???? ) = ???? . ???? denotes the information set available at time t. For example, ??(???? |???? ) = ???? ??(????2 |???? ) = ????2 ??(?????? |???? ) = ?????? ??(??+?????? |???? ) = ?? + ?????? 17. Conditional Expectation (law of iterated expectations): ??(???? ) = ?? ???????? ?????−?? ?? , ?? = 1,2,3, … ????−?? is the information set at time ?? − ??, which includes all information up to ?? − ??. When ?? = 1, ??(???? ) = ?????(???? |????−1 )?.

Part I: Multiple Choice Questions (20 Questions worth 40 points in total, 2 points each) 1

Economics

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions