1. Use the following data to answer the questions listed below
17 13
18 16
20 24
15 19
19 12
10 16
26 27
13 23
17 15
24 20
14 21
26 22
a. Using excel, compute the mean, median, range, and standard deviation of this sample.
b. Create a table organized into bins starting with 10 and going up by 4 (10-13, 14-17, 18-21, etc.)
c. Create a histogram that shows the distributions from part b. Describe the distribution shown on your output.
2. Consider the following data on income and savings in your answers to the questions below
Income ($ thousands) |
Savings ($ thousands) |
50 |
10 |
51 |
11 |
52 |
13 |
55 |
14 |
56 |
15 |
58 |
15 |
60 |
16 |
62 |
16 |
64 |
17 |
67 |
17 |
a. Using excel, calculate the correlation coefficient for the same. Interpret this correlation coefficient and describe the relationship between income and savings.
b. Show a scatter diagram of the relationship between income and savings.
3. Why are ethics so important in business analytics? Give an example of how algorithms may be biased and why this becomes an ethical problem for businesses. What can be done to correct this?
4. The regional manager of a company wishes to determine the time spent at each division in the car production process. A month-long study resulted in the following data related to the percentage of time spent at three divisions (car body construction, paint shop, and assembly) at four locations of production plants.
Production Plants |
Car Body Construction (%) |
Paint Shop (%) |
Assembly (%) |
Michigan |
35 |
45 |
20 |
Kentucky |
37 |
41 |
22 |
Illinois |
33 |
39 |
28 |
Ohio |
36 |
40 |
24 |
a. Create a stacked-bar chart with production plants along the vertical axis. Reformat the bar chart to best display these data by adding required labels and chart title.
b. Create a clustered-bar chart with production plants along the vertical axis and clusters of divisions. Reformat the bar chart to best display these data by adding required labels and chart title
c. Create multiple bar charts where each production plant becomes a single bar chart showing the percentage of time spent at the divisions. Reformat the bar charts to best display these data by adding required labels and chart title
d. Which form of bar chart (stacked, clustered, or multiple) is preferable for these data? Why?
5. A research center is interested in investigating the height and age of children who are between 5 to 9 years old.
In order to do this, a sample of 15 children is selected and the data are given below.
Age (in years) |
Height (inches) |
7 |
47.3 |
8 |
48.8 |
5 |
41.3 |
8 |
50.4 |
8 |
51 |
7 |
47.1 |
7 |
46.9 |
7 |
48 |
9 |
51.2 |
8 |
51.2 |
5 |
40.3 |
8 |
48.9 |
6 |
45.2 |
5 |
41.9 |
8 |
49.6 |
a. Develop a scatter chart with age as the independent variable. What does the scatter chart indicate about the relationship between the height and age of children?
b. Use the data to develop an estimated regression equation that could be used to estimate the height based on the age. What is the estimated regression model? (Note: You can use Excel to calculate)
c. Calculate SSR and SST (show your work, calculate by hand/excel math).
d. How much of the variation in the sample values of height does the model estimated in part (b) explain?
6. A survey conducted by a research team was to investigate how the education level, tenure in current employment, and age are related to annual income. A sample of 20 employees is selected and the data are given below.
Education (No. of years) |
Length of tenure in current employment (No. of years) |
Age (No. of years) |
Annual income ($) |
17 |
8 |
40 |
124,000 |
12 |
12 |
41 |
30,000 |
20 |
9 |
44 |
193,000 |
14 |
4 |
42 |
88,000 |
12 |
1 |
19 |
27,000 |
14 |
9 |
28 |
43,000 |
12 |
8 |
43 |
96,000 |
18 |
10 |
37 |
110,000 |
16 |
12 |
36 |
88,000 |
11 |
7 |
39 |
36,000 |
16 |
14 |
36 |
81,000 |
12 |
4 |
22 |
38,000 |
16 |
17 |
45 |
140,000 |
13 |
7 |
42 |
11,000 |
11 |
6 |
18 |
21,000 |
20 |
4 |
40 |
151,000 |
19 |
7 |
35 |
124,000 |
16 |
12 |
38 |
48,000 |
12 |
2 |
19 |
26,000 |
10 |
6 |
44 |
124,000 |
a. Check if the F test leads to conclude that an overall regression relationship exists. If yes, use the t test to determine the significance of each independent variable. What is the conclusion for each test at the 0.05 level of significance?
b. Remove all independent variables that are not significant at the 0.05 level of significance from the estimated regression equation. What is your estimated regression equation in this case? Provide an interpretation of the coefficients in regards to the independent variables.