Problem 1 – TRUE or FALSE (Answer these on this midterm sheet) – 1 pt each a) A t-test assumes that the population standard deviation, ?, is known b) One can reject the null hypothesis when the p-value is less than the level of significance the hypothesis test is performed at c) When performing multiple comparisons, the Family Wise Error Rate is probability of making at least on Type I error d) When constructing a t-interval for a population mean, ?, increasing the sample size decreases the precision e) Increasing the level of significance when performing a z-test entails an increase in the statistical power f) An F-distribution is a continuous distribution on the non-negative real number g) ANOVA is a method used to compare variances between two or more groups h) A Binomial random variable is a discrete random variable i) Statistical Power is 1 minus the probability of a Type I Error j) A ?2 distribution is a right skewed continuous distribution ( )True ( )False ( )True ( )False ( )True ( )False ( )True ( )False ( )True ( )False ( ( ( ( ( )True ( )True ( )True ( )True ( )True ( )False )False )False )False )False Problem 2 –Four brands of high-end earbuds are compared for sound quality

Question

Problem 1 – TRUE or FALSE (Answer these on this midterm sheet) – 1 pt each a) A t-test assumes that the population standard deviation, ?, is known b) One can reject the null hypothesis when the p-value is less than the level of significance the hypothesis test is performed at c) When performing multiple comparisons, the Family Wise Error Rate is probability of making at least on Type I error d) When constructing a t-interval for a population mean, ?, increasing the sample size decreases the precision e) Increasing the level of significance when performing a z-test entails an increase in the statistical power f) An F-distribution is a continuous distribution on the non-negative real number g) ANOVA is a method used to compare variances between two or more groups h) A Binomial random variable is a discrete random variable i) Statistical Power is 1 minus the probability of a Type I Error j) A ?2 distribution is a right skewed continuous distribution ( )True ( )False ( )True ( )False ( )True ( )False ( )True ( )False ( )True ( )False ( ( ( ( ( )True ( )True ( )True ( )True ( )True ( )False )False )False )False )False Problem 2 –Four brands of high-end earbuds are compared for sound quality. The four brands are Ear Light, Loud n’ Clear, Sound Aid and Crystal Clear. Without going into unnecessary details, the quality of sound can be determined objectively by measuring audio signals received by a robot head wearing the earbuds and then comparing them with the known signal wave that was sent. The unit of measure to quantify sound quality is referred to as a “Quali”, where a lower Quali value coincides with better sound quality. Below is a table displaying summary statistics for an experiment performing sound quality assessments performed on the four earbud brands. Earbud Type Ear Light Loud n' Clear Sound Aid Crystal Clear Sample Size 5 4 7 6 Sample Average 12 17 16 15 Sample Standard Deviation 3.082207001 3.16227766 1.414213562 1.673320053 a) Determine at a level of significance ?0= 0.05, whether there is any statistically significant difference in the sound quality of earbuds brands (The upper 5% cutoff value for an F-distribution on 3 numerator degrees of freedom and 18 denominator degrees of freedom is 3.16). – 4 pts b) If one wanted to determine which pairs of earbuds had statistically significant difference in sound quality, what level of significance should the individual pair-wise hypothesis tests be performed at to control the Family Wise Error Rate at 5% using the Bonferroni Method? (YOU DO NOT NEED TO PERFORM ANY OF THE PAIRWISE COMPARISONS). – 1 pt Problem 3: Multiple choice problems (There will be only one answer) – 2 pts each I) Which of the following statements is true regarding a dotplot? ____ A. Dotplots depict the distribution of numerical data. B. Scatterplot is another name for a dotplot. C. Dotplots can only be constructed for data that is discrete. D. Dotplots can be used to confirm that a data sample arises from a normal distribution. II) When constructing a QQ-plot for determining whether it is plausible to assume a dataset arises from a normal distribution, which of the following statements is true? ____ A. The empirical quantiles of the data must be on the vertical axis and the theoretical quantiles of a standard normal random variable must be on the horizontal axis. B. The empirical quantiles of the data must be on the horizontal axis and the theoretical quantiles of a standard normal random variable must be on the vertical axis. C. The empirical quantiles of the data can be on either axis and the other axis can display the theoretical quantiles of any normal random variable. D. The number of quantiles depicted on the QQ-plot (i.e. number of points), must be equal to the number of observations in your dataset. III) Given an iid (independent and identically distributed) sample from a normal distribution, the sampling distribution of the sample standard deviation follows a … ____ A. t-distribution B. F-distribution C. ?2-distribution D. None of the above IV) Which of the following is the correct frequentist interpretation of a (1 – ?0)% confidence interval for some parameter of interest? ____ A. The probability the parameter of interest falls within the confidence interval is ?0 B. The probability the parameter of interest falls within the confidence interval is (1 – ?0) C. The probability the confidence interval covers the true value of the parameter is (1 – ?0) D. The probability the confidence interval covers the true value of the parameter is ?0 V) Which of the following is NOT an assumption of the ANOVA model? A. The observations within each group are an iid sample. B. Observations in different groups are independent. C. The number of observations in each group is the same. D. The population standard deviation within each group is the same. ____ Problem 4 – State the Central Limit Theorem and why it is important. Be sure to list all the assumptions that are needed for the result of the Central Limit Theorem. - 4 pts Problem 5 – The stacked barchart below reports the results of a random survey conducted in the cities of Byzantium and Constantinople. Individuals were randomly selected and asked whether they had been vaccinated against Bubonic Plague or not. (a) How many individuals were sampled in Byzantium and Constantinople respectively? - 1 pts (b) Provide an estimate of the percentage of individuals within each town who are vaccinated against Bubonic Plague. - 1 pts (c) An epidemiologist wishes to prove that there is a difference in the percentage of individuals vaccinated in the two cities. Frame his research question as a hypothesis testing problem, explicitly describing the parameter/s involved and the respective hypotheses. - 1 pt (d) Carry out the hypothesis test you specified in part (c) at level of significance ?0 = 0.2 using the data provided in the barchart. Would you accept/reject the null hypothesis in part (c)? Show and clearly explain all your work leading to your conclusion (The upper 10% cutoff value for a standard normal distribution is 1.28). – 3 pts Problem 6 – Recall the focus of methods such as z-tests, t-tests, ANOVA, etc. all focus on analyzing the population mean, ?, of some variable of interest in a population. Explain why many statistical methods are framed in terms of the population mean, ?, Discuss not only the practical reasons but also any theoretical reasons as well. – 4 pts Problem 7 – For each of the scenarios below, circle which statistical method would be the best approach to use to answer the research question posed by putting an X to the right of the most appropriate method. – 2 pts each (a) An IT company wishes to improve its customer service by increasing the number of customer service representatives at its call center to handle calls. In order to inform the number of customer service representatives they need to hire, they needed to get a sense of how long customers are waiting on hold when calling their IT help line. They want to be 90% certain that they hire enough customer service representative so that no customer is ever put on hold. Consequently, they conducted a survey to ascertain how long customers were put on hold. They randomly selected 75 incoming calls to its help line that were put on hold, and recorded the duration they were on hold. Let ? denote the mean waiting time on hold. What statistical method should they employ? (Pick one) Compute a 90% Upper Bound for ? _______ Perform a Right-Tailed Test for ? _______ Perform a Left-Tailed Test for ? _______ Compute a 90% Lower Bound for ? _______ (b) An aspirin manufacturer claims its bottles contain 500 grains of aspirin. Let ? represent the true mean weight of a tablet of aspirin. Since each bottle contains 100 tablets, if the manufactures claim is true then true mean weight of the tablets should be 5 grains. Each of 100 tablets taken from a very large lot is weighed, resulting in a sample average weight of 4.87 grains and a sample standard deviation of 0.35 grain. An investigator wishes to know whether this data provide strong enough evidence to conclude that the company is short-changing the consumer. How should the investigator proceed? (Pick one) Perform a Left-Tailed z-Test for ? _______ Perform a Left-Tailed t-Test for ? _______ Perform a Right-Tailed t-Test for ? _______ Compute a 90% Lower Bound for ? _______ (c) In married couples, does one spouse (the husband or the wife) tend to live longer? By accessing public death registries, a Sociologist was able to obtain a simple random sample of 100 married couples who were born within 1 month of each other. Her goal is to answer the previous question. She recorded the age each partner passed away as well as which partner outlived their spouse. What is the best way to analyze this data to answer the question? (Pick one) Perform a two sample, two-sided hypothesis test of equality of mean death ages _______ Perform a two-sided, paired hypothesis test where the null is that the common mean difference of death ages is 0 _______ Perform a right-tailed test on the population proportion of couples where wives outlive their husbands _______ Perform a left-tailed test on the population proportion of couples where husbands outlive their wives _______ (d) Hedgehog Hedge Fund uses complex stock model to inform their trading strategies. One particular model used for the car manufacturer, Rocket Motors, utilize various economic variables as model inputs. They are debating whether to take a position involving Rocket Motors stock. Consequently, they wish to be able to get a range of values for Rocket Motor’s future stock price which will cover the actual stock price with 95% certainty. As there is a lot of money at stake, the firm hires a Quantitative Analyst to tackle the previous problem. Recognizing that the model input that most effects the value of the estimated stock price is the ‘typical’ price of a gallon of unleaded regular gas, he requests the company’s market research group to provide him with the prices of a gallon of unleaded regular gas for a simple random sample of 1,000 gas stations across the US. How should he use this data to proceed with his task? (Pick one) Calculate the sample average of the gas prices _______ Calculate the sample median of the gas prices _______ Compute a two-sided 95% confidence interval for the mean gas price _______ Calculate the sample standard deviation of the gas prices _______ (e) An hourglass is a time-keeping device from antiquity used to keep track of time. The device comprises of two glass bulbs arranged in a figure-eight pattern connected by a thin neck. Inside the bulbs is sand. When all the sand is collected in one bulb, the hourglass is turned upside down so that the bulb containing all the sand is on top. The sand will then spill through the neck to the lower bulb by the force of gravity. As the name implies, it should take an hour for all the sand to pour through the glass. However, as the device is crude, there actual time it takes for all the glass to empty from the top bulb could vary slightly from run-to-run due to slight irregularities in how the glass is distributed in the bulb (i.e. if it is piled highest towards the sides of the glass versus in the middle) There are currently two main manufacturers of hour-glasses – Time Stands Still and Sands of Time. An enthusiast of time pieces wishes to determine if one company’s hourglass provides more precise estimates of time than its competitor. He purchases an hourglass from Time Stands Still, and one from Sands of Time. In his experiment, he activates a given hourglass and uses a stopwatch to determine the exact time it takes for the sand to empty from the top bulb to the bottom. He does this 8 times for the Time Stands Still hourglass, and 6 times for the Sands of Time version. How can he determine whether one hourglass is a more precise measure of an hour then the other? (Pick one) Perform a two sample, two-sided hypothesis test of equality of population means _______ Perform an ANOVA (Analysis of Variance) _______ Perform a two sample, two-sided hypothesis test of equality of population standard deviations where the null is that the difference of population standard deviations is 0 _______ Perform a two sample, two-sided hypothesis test of equality of population standard deviations where the null is that the ratio of population standard deviations is 1 _______ Problem 8 – Below are QQ Plots for two different data sets. Which of the datasets appears to be normally distributed, Dataset I or Dataset II? Explain your answer. – 3 pts Problem 9 – Famed Climatologist, Dr. Lumvoir, wishes to publish the results of his recent study on climate change. The experiment entailed measuring the temperature at a fixed location at exactly the same time of day for one month, a task which he relegated to his Graduate Assistant (GA). The GA compiled the data and calculated the following summary statistics which he reported to Dr. Lumvoir: Average = 30.6ºF Median = 28ºF Mode = 29ºF Range = 15ºF IQR = 10ºF Standard Deviation = 13.5ºF Unfortunately, Dr. Lumvoir cannot use the summary statistics as is because they are in degrees Fahrenheit whereas scientific journals require data measurements to be reported using the metric system (recall temperature is measured in degrees Celsius in the metric system with the conversion formula from Celsius to Fahrenheit given by ºF = ºC x 1.8 + 32). 1) Dr. Lumvoir does not have access to raw data, so he asks his GA to recalculate the above summary statistics. Unfortunately, the GA is currently away on spring break. The submission deadline for the journal report is only a day away. Can Dr. Lumvoir somehow do the Fahrenheit to Celsius conversion for the above summary statistics without having the raw data? If so, what are the corresponding summary statistics when converted to degrees Celsius? – 3 pts 2) To support his hypothesis, Dr. Lumvoir, performed the below hypothesis test, where ? represents the mean temperature. H0: ? >= 32ºF vs. HA: ? < 32ºF at level of significance ?0 = 0.05 Using the data measured in degrees Fahrenheit, he came to the conclusion to reject the null hypothesis. However, as he must now convert the data to degrees Celsius, would his result change? Explain why or why not. (Hint: Think about how the forms of the above hypotheses would change and then look at the form of the standardized test statistic). – 2 pts Problem 10 – The scatter plot below is for a bivariate sample where the horizontal axis is years of education and the vertical axis is yearly salary 10 years post schooling. The line in red is the least squares regression line estimated using the data. Comment on whether the assumptions necessary to perform inference on the regression model are satisfied. – 3 pts

Why Choose Us?

0% AI Guarantee

24/7 Support

Plagiarism Free

Expert Tutors

100% Confidential

On-Time Delivery

Expert Solution

Archived Solution