Homework answers / question archive /
1)The decisionmaking concepts covered in Data Analysis & Decision Making book include which of the following?
Optimization techniques c
1)The decisionmaking concepts covered in Data Analysis & Decision Making book include which of the following?
Optimization techniques c
Statistics
Share With
1)The decisionmaking concepts covered in Data Analysis & Decision Making book include which of the following?

 Optimization techniques c. Structured sensitivity analysis
 Decision analysis with uncertainty d. All of these options
 Which of the following statements is not true?
 Dealing with uncertainty includes measuring uncertainty
 Dealing with uncertainty includes modeling uncertainty explicitly into the analysis.
 Dealing with uncertainty includes eliminating uncertainty by using the normal probability distribution
 Uncertainty is a key aspect of most business problems, and dealing with uncertainty requires a basic understanding of probability
 Which of the following is not one of the important themes of your Data Analysis & Decision Making book?
 Data analysis c. Decision making
 Dealing with uncertainty d. Data mining
 Data analysis includes
 data description c. the search for relationships in data
 data inference d. All of these options
 Which of the following is not one of the steps in the modeling process?
 Select scale for model
 Collect and summarize data
 Verify the model
 Present the results
 Implement the model and update it through time
 Which of the following would not be included under data analysis?
 Measuring uncertainty c. Data inference
 Data description d. Search for relationships
 The decision making process includes
 optimization techniques for problems with no uncertainty
 decision analysis for problems with uncertainty
 sensitivity analysis
 All of the above
 Which of the following is not one of the types of models described in Data Analysis & Decision Making book?
 Algebraic model c. Scale model
 Spreadsheet model d. Graphical model
 The modeling process discussed in Data Analysis & Decision Making book is a a. sevenstep process
 sixstep process
 fivestep process
 fourstep process
 threestep process
 Which of the following is Excel addin for performing whatif analyses?
 PrecisionTree
 TopRank
 Solver
 @Risk
 StatTools
 Which of the following statements are false?
 The modeling process discussed in Data Analysis & Decision Making book is five step process
 Dealing with uncertainty requires a basic understanding of probability
 Uncertainty is a key aspect of most business problems
 Data description and data inference are included under data analysis
 Which of the following statements are false? a. Decisionmaking includes optimization techniques for problems with certainty, decision
analysis for problems with certainty, and structured sensitivity analysis.
b. Graphical models can be very helpful for simple problems. For complex problems, however, graphical models usually fail to show the important elements of a problem and how they are related. c. Dealing with uncertainty includes measuring uncertainty and modeling uncertainty
explicitly into the analysis.
d. All of these options
 Which of the following statements are true:
 A fairly recent alternative to algebraic modeling is spreadsheet modeling. Instead of relating various quantities with algebraic equations and inequalities, we relate them in a spreadsheet with cell formulas.
 Data are usually meaningless until they are analyzed for trends, patterns, relationships, and other useful information
 Algebraic models, by means of algebraic equations and inequalities, specify a set of
relationships in a very precise way. Their main drawback is that they require an ability to work with abstract mathematical symbols.

 When we make inferences from data and search for relationships in data, or when we use decision trees to help make decisions, we must deal with uncertainty.
 All of these options
ANS: E
 Which of the following statements are true?
 Three important themes run through the book. Two of them are in the title: data analysis and decision making. The third is dealing with uncertainty.
 Data analysis includes data description, data inference, and the search for relationships in data
 Decision making includes optimization techniques for problems with no uncertainty, decision analysis for problems with uncertainty, and structured sensitivity analysis.
 Dealing with uncertainty includes measuring uncertainty and modeling uncertainty explicitly into the analysis.
 All of these options
ANS: E
 Which of the following is an Excel addin for simulation?
 PrecisionTree
 TopRank
 Solver
 @Risk
 StatTools
TRUE/FALSE
 Data analysis includes data description, data inference, and the search for relationships in data.
 Decisionmaking includes optimization techniques for problems with certainty, decision analysis for problems with certainty, and structured sensitivity analysis.
 Dealing with uncertainty includes measuring uncertainty and modeling uncertainty explicitly into the analysis.
 The authors of Data Analysis & Decision Making book described three types of models: graphical models, algebraic models, and spreadsheet models.
 Graphical models are the least intuitive type of model. Its purpose is simply to provide enough quantitative details to enable us solve the problem of interest.
 Three important themes run through this book: data analysis, decisionmaking, and dealing with uncertainty.
 Graphical models can be very helpful for simple problems. For complex problems, however, graphical models usually fail to show the important elements of a problem and how they are related.
 The overall modeling process typically done in the business world always require seven steps: define the problem, collect and summarize data, formulate a model, verify the model, select one or more suitable decisions, present the results to the organization, and finally implement the model and update it through time.
 Algebraic models, by means of algebraic equations and inequalities, specify a set of relationships in a very precise way. Their main drawback is that they require an ability to work with abstract mathematical symbols.
 Data are usually meaningless until they are analyzed for trends, patterns, relationships, and other useful information.
 A fairly recent alternative to algebraic modeling is spreadsheet modeling. Instead of relating various quantities with algebraic equations and inequalities, we relate them in a spreadsheet with cell formulas.
 When we use simulation models to help make decisions, we do not deal with uncertainty at all, since we often must make inferences from the simulated data.
 When we make inferences from data and search for relationships in data, or when we use decision trees to help make decisions, we must deal with uncertainty.
 The @Risk is Excel addin that can be used to run replications of a simulation, keep track of outputs, create useful charts, and perform sensitivity analyses.
 Graphical models are probably the least intuitive and most quantitative type of model.
CHAPTER 2: Describing the Distribution of a Single Variable
MULTIPLE CHOICE
 A sample of a population taken at one particular point in time is categorized as:
 categorical c. crosssectional
 discrete d. timeseries
 If data is stored in a database package, which of the following terms are typically used?
 Fields and records c. Variables and samples
 Cases and columns d. Variables and observations
 Researchers may gain insight into the characteristics of a population by examining a
 mathematical model describing the population
 sample of the population
 description of the population
 replica
 Numerical variables can be subdivided into which two types?
 Diverse and categorical c. Nominal and progressive
 Discrete and continuous d. Crosssectional and discrete
 Gender and State are examples of which type of data?
 Discrete data c. Categorical data
 Continuous data d. Ordinal data
 Which of the following indicates how many observations fall into various categories?
 The Likert scale c. The sample table
 The frequency table d. The tabulation scale
 Data that arise from counts are called:
 continuous data c. counted data
 nominal data d. discrete data
 A histogram that is positively skewed is also called
 skewed to the right c. balanced
 skewed to the left d. symmetric
 A histogram that has exactly two peaks is called a
 unimodal distribution c. skewed distribution
 bimodal distribution d. scatterplot
 A histogram that has a single peak and looks approximately the same to the left and right of the peak is called:
 bimodal c. balanced
 symmetric d. proportional
 A variable is classified as ordinal if:
 there is a natural ordering of categories
 there is no natural ordering of categories
 the data arise from continuous measurements
 we track the variable through a period of time
 In order for the characteristics of a sample to be generalized to the entire population, it should be:
 symbolic of the population c. representative of the population
 typical of the population d. illustrative of the population
 When we look at a time series plot, we usually look for which two things?
 “Is there an observable trend?” and “Is there a seasonal pattern?”
 “Is there an observable trend” and “Can we make predictions?”
 “Is the sample representative?” and “Is there a seasonal pattern?”
 “Is there an observable trend?” and “Is the trend symmetric?”
 Which of the following are possible categorizations of data type?
 Numerical versus categorical (with subcategories nominal, ordinal)
 Discrete versus continuous
 Crosssectional versus time series
 All of these options
 Two of these options
 Which of the following are the two most commonly used measures of variability?
 Variance and median
 Variance and standard deviation
 Mean and variance
 Mean and range
 First quartile and third quartile
 The median can also be described as:
 the middle observation when the data values are arranged in ascending order b. the second quartile

 the 50^{th} percentile
 All of these options
 The difference between the first and third quartile is called the
 interquartile range
 interdependent range
 unimodal range
 bimodal range
 mid range
 If a value represents the 95^{th} percentile, this means that
 95% of all values are below this value
 95% of all values are above this value
 95% of the time you will observe this value
 there is a 5% chance that this value is incorrect
 there is a 95% chance that this value is correct
 For a boxplot, the point inside the box indicates the location of the
 mean c. minimum value
 median d. maximum value
 For a boxplot, the vertical line inside the box indicates the location of the
 mean
 median
 mode
 minimum value
 maximum value
 Which of the following are the three most common measures of central location?
 Mean, median, and mode
 Mean, variance, and standard deviation
 Mean, median, and variance
 Mean, median, and standard deviation
 First quartile, second quartile, and third quartile
 The length of the box in the boxplot portrays the
 mean
 median
 range
 interquartile range
 third quartile
 Suppose that a histogram of a data set is approximately symmetric and "bell shaped". Approximately what percent of the observations are within two standard deviations of the mean? a. 50%
 68%
 95%
 99.7%
 100%
 The mode is best described as the
 middle observation
 same as the average
 50^{th} percentile
 most frequently occurring value
 third quartile
 For a boxplot, the box itself represents what percent of the observations?
 lower 25%
 middle 50%
 upper 75%
 upper 90%
 100%
 Which of the following statements is true for the following data values: 7, 5, 6, 4, 7, 8, and 12?
 The mean, median and mode are all equal
 Only the mean and median are equal
 Only the mean and mode are equal
 Only the median and mode are equal
 In a histogram, the percentage of the total area which must be to the left of the median is: a. exactly 50%
 less than 50% if the distribution is skewed to the left
 more than 50% if the distribution is skewed to the right
 between 25% and 50% if the distribution is symmetric and unimodal
 The average score for a class of 30 students was 75. The 20 male students in the class averaged 70. The 10 female students in the class averaged: a. 75
 85
 60
 70
 80
 Which of the following statements is true?
 The sum of the deviations from the mean is always zero
 The sum of the squared deviations from the mean is always zero
 The range is always smaller than the variance
 The standard deviation is always smaller than the variance
 Expressed in percentiles, the interquartile range is the difference between the
 10^{th} and 60^{th} percentiles
 15^{th} and 65^{th} percentiles
 20^{th} and 70^{th} percentiles
 25^{th} and 75^{th} percentiles
 35^{th} and 85^{th} percentiles
 A sample of 20 observations has a standard deviation of 4. The sum of the squared deviations from the sample mean is: a. 400
 320
 304
 288
 180
TRUE/FALSE
 Age, height, and weight are examples of numerical data.
 Data can be categorized as crosssectional or time series.
 All nominal data may be treated as ordinal data.
 Four different shapes of histograms are commonly observed: symmetric, positively skewed, negatively skewed, and bimodal.
 Categorical variables can be classified as either discrete or continuous.
 A skewed histogram is one with a long tail extending either to the right or left. The former is called negatively skewed, and the later is called positively skewed.
CHAPTER 3: Finding Relationships Among Variables
MULTIPLE CHOICE
 To examine relationships between two categorical variables, we can use
 Counts and corresponding charts of the counts
 Scatterplots
 Histograms
 None of these options
 Tables used to display counts of a categorical variable are called
 Crosstabs c. Both of these options
 Contingency tables d. Neither of these options
 The Excel function that allows you to count using more than one criterion is
 COUNTIF
 COUNTIFS
 SUMPRODUCT
 VLOOKUP
 HLOOKUP
 Example of comparison problems include
 Salary broken down by male and female subpopulations
 Cost of living broken down by region of a country
 Recovery rate for a disease broken down by patients who have taken a drug and patients who have taken a placebo
 Starting salary of recent graduates broken down by academic major
 All of these options
 The most common data format is
 Long c. Stacked
 Short d. Unstacked
 A useful way of comparing the distribution of a numerical variable across categories of some
categorical variable is

 Sidebyside boxplots c. Both of these options
 Sidebyside histograms d. Neither of these options
 We study relationships among numerical variables using
 Correlation
 Covariance
 Scatterplots
 All of these options
 None of these options
 Scatterplots are also referred to as
 Crosstabs
 Contingency charts
 XY charts
 All of these options
 None of these options
 Correlation and covariance measure
 The strength of a linear relationship between two numerical variables
 The direction of a linear relationship between two numerical variables
 The strength and direction of a linear relationship between two numerical variables
 The strength and direction of a linear relationship between two categorical variables e. None of these options
 We can infer that there is a strong relationship between two numerical variables when
 The points on a scatterplot cluster tightly around an upward sloping straight line
 The points on a scatterplot cluster tightly around a downward sloping straight line c. Either of these options
d. Neither of these options
 The limitation of covariance as a descriptive measure of association is that it
 Only captures positive relationships
 Does not capture the units of the variables
 Is very sensitive to the units of the variables
 Is invalid if one of the variables is categorical
 None of these options
 A the correlation is close to 0, then we expect to see
 An upward sloping cluster of points on the scatterplot
 A downward sloping cluster of points
 A cluster of points around a trendline
 A cluster of points with no apparent relationship
 We cannot say what the scatterplot should look like based on the correlation
 We are usually on the lookout for large correlations near
 +1 c. Either of these options
 1 d. Neither of these options
 The correlation is best interpreted
 By itself
 Along with the covariance
 Along with the corresponding scatterplot
 Along with the corresponding contingency chart
 Along with the mean and standard deviation
 Which of the following are considered measures of association?
 Mean and variance
 Variance and correlation
 Correlation and covariance
 Covariance and variance
 First quartile and third quartile
 Generally speaking, if two variables are unrelated (as one increases, the other shows no pattern), the covariance will be
 a large positive number
 a large negative number
 a positive or negative number close to zero
 a positive or negative number close to +1 or 1
 A perfect straight line sloping downward would produce a correlation coefficient equal to a. +1
 –1
 0
 +2
 –2
 If Cov(X,Y) =  16.0, variance of X = 25, variance of Y = 16 then the sample coefficient of correlation r is
 + 1.60
 – 1.60
 – 0.80
 + 0.80
 Cannot be determined from the given information
 A scatterplot allows one to see:
 whether there is any relationship between two variables
 what type of relationship there is between two variables
 Both options are correct
 Neither option is correct
 The tool that provides useful information about a data set by breaking it down into subpopulations is the:
 histogram c. pivot table
 scatterplot d. spreadsheet
 The tables that result from pivot tables are called:
 samples c. specimens
 subtables d. crosstabs
 Which of the following statements are false?
 Contingency tables are traditional statistical terms for pivot tables that list counts.
 Time series plot is a chart showing behavior over time of a time series variable.
 Pivot table is a table in Excel that summarizes data broken down by one or more numerical variables.
 None of these options
 Which of the following are true statements of pivot tables?
 They allow us to “slice and dice” data in a variety of ways.
 Statisticians often refer to them as contingency tables or crosstabs.
 Pivot tables can list counts, averages, sums, and other summary measures, whereas contingency tables list only counts.
 All of these options
TRUE/FALSE
 Counts for categorical variable are often expressed as percentages of the total.
 An example of a joint category of two variables is the count of all nondrinkers who are also nonsmokers.
 Joint categories for categorical variables cannot be used to make inferences about the relationship between the individual categorical variables.
 Problems in data analysis where we want to compare a numerical variable across two or more subpopulations are called comparison problems.
 Sidebyside boxplots allow you to quickly see how two or more categories of a numerical variable compare
 We must specify appropriate bins for sidebyside histograms in order to make fair comparisons of distributions by category.
 Correlation and covariance can be used to examine relationships between numerical variables and categorical variables that have been coded numerically.
 A trend line on a scatterplot is a line or a curve that fits the scatter as well as possible
 To form a scatterplot of X versus Y, X and Y must be paired
 Correlation has the advantage of being in the same original units as the X and Y variables
 Correlation is a singlenumber summary of a scatterplot
 We do not even try to interpret correlations numerically except possibly to check whether they are positive or negative
 The cutoff for defining a large correlation is >0.7 or <0.7.
 Generally speaking, if two variables are unrelated, the covariance will be a positive or negative number close to zero
 The correlation between two variables is a unitless and is always between –1 and +1.
 If the standard deviations of X and Y are 15.5 and 10.8, respectively, and the covariance of X and Y is 128.8, then the coefficient of correlation r is approximately 0.77.
 It is possible that the data points are close to a curve and have a correlation close to 0, because correlation is relevant only for measuring linear relationships.
 If the coefficient of correlation r = 0 .80, the standard deviations of X and Y are 20 and 25, respectively, then Cov(X, Y) must be 400.
 The advantage that the coefficient of correlation has over the covariance is that the former has a set lower and upper limit.
 If the standard deviation of X is 15, the covariance of X and Y is 94.5, the coefficient of correlation r =
0.90, then the variance of Y is 7.0.
 The scatterplot is a graphical technique used to describe the relationship between two numerical variables.
 Statisticians often refer to the pivot tables as contingency tables or crosstabs.
 If we draw a straight line through the points in a scatterplot and most of the points fall close to the line, there is a strong positive linear relationship between the two variables.
SHORT ANSWER
Table of Correlations

Gender

Age

Prior Exp

Gamma Exp

Education

Salary

Gender

1.000






Age

0.111

1.000





Prior Exp

0.054

0.800

1.000




Gamma Exp

0.203

0.916

0.587

1.000



Education

0.039

0.518

0.434

0.342

1.000


Salary

0.154

0.923

0.723

0.870

0.617

1.000

Table of Covariances (variances on the diagonal)

Gender

Age

Prior Exp

Gamma Exp

Education

Salary

Gender

0.259




Age 0.633 134.051
Prior Exp 0.117 39.060 19.045
Gamma Exp 0.700 72.047 17.413 49.421
Education 0.033 9.951 3.140 3.987 2.947
Salary 1825.97 249702.35 73699.75 143033.29 24747.68 584640062
 Which two variables have the strongest linear relationship with annual salary?
 For which of the two variables, number of years of prior work experience or number of years of postsecondary education, is the relationship with salary stronger? Justify your answer.
 How would you characterize the relationship between gender and annual salary?
 The percentage of the US population without health insurance coverage for samples from the 50 states and District of Columbia for both 2003 and 2004 produced the following table of correlations.
Table of Correlations:
Percent 2003
Percent 2003 Percent 2004 Percent 2004
What does the table for the two given sets of percentages tell you in this case?
The correlation for each pairing of variables are shown in the table below:
Table of correlations
 Which of the variables have a positive linear relationship with the household’s average monthly expenditure on utilities?
 Which of the variables have a negative linear relationship with the household’s average monthly expenditure on utilities?
 Which of the variables have essentially no linear relationship with the household’s average monthly expenditure on utilities?
 Three samples, regarding the ages of teachers, are selected randomly as shown below:
Sample A: 17 22 20 18 23 Sample B: 30 28 35 40 25
Sample C: 44 39 54 21 52
How is the value of the correlation coefficient r affected in each of the following cases? a) Each X value is multiplied by 4.
 Each X value is switched with the corresponding Y value.
 Each X value is increased by 2.
 The students at small community college in Iowa apply to study either English or Business. Some administrators at the college are concerned that women are being discriminated against in being allowed admittance, particularly in the business program. Below, you will find two pivot tables that show the percentage of students admitted by gender to the English program and the Business school. The data has also been presented graphically. What do the data and graphs indicate?
 A sample of 30 schools produced the pivot table shown below for the average percentage of students graduating from high school. Use this table to determine how the type of school (public or Catholic) that students attend affects their chance of graduating from high school.
 A data set from a sample of 399 Michigan families was collected. The characteristics of the data include family size (large or small), number of cars owned by family (1, 2, 3, or 4), and whether family owns a foreign car. Excel produced the pivot table shown below.
Use this pivot table to determine how family size and number of cars owned influence the likelihood that a family owns a foreign car.
 Of those in the sample who went partying the weekend before the final exam, what percentage of them did well in the exam?
 Of those in the sample who did well on the final exam, what percentage of them went partying the weekend before the exam?
 What percentage of the students in the sample went partying the weekend before the final exam and did well in the exam?
 What percentage of the students in the sample spent the weekend studying and did well in the final exam?
 What percentage of the students in the sample went partying the weekend before the final exam and did poorly on the exam?
 If the sample is a good representation of the population, what percentage of the students in the population should we expect to spend the weekend studying and do poorly on the final exam?
 If the sample is a good representation of the population, what percentage of those who spent the weekend studying should we expect to do poorly on the final exam?
 If the sample is a good representation of the population, what percentage of those who did poorly on the final exam should we expect to have spent the weekend studying?
 Of those in the sample who went partying the weekend before the final exam, what percentage of them did poorly in the exam?
 Of those in the sample who did well in the final exam, what percentage of them spent the weekend before the exam studying?
 A health magazine reported that a man’s weight at birth has a significant impact on the chance that the man will suffer a heart attack during his life. A statistician analyzed a data set for a sample of 798 men, and produced the pivot table and histogram shown below. Determine how birth weight influences the chances that a man will have a heart attack.
 The table shown below contains information technology (IT) investment as a percentage of total investment for eight countries during the 1990s. It also contains the average annual percentage change in employment during the 1990s. Explain how these data shed light on the question of whether IT investment creates or costs jobs. (Hint: Use the data to construct a scatterplot)
Country

% IT

% Change

Netherlands

2.5%

1.6%

Italy

4.1%

2.2%

Germany

4.5%

2.0%

France

5.5%

1.8%

Canada

8.3%

2.7%

Japan

8.3%

2.7%

Britain

8.3%

3.3%

U.S.

12.4%

3.7%

 There are two scatterplots shown below. The first chart shows the relationship between the size of the home and the selling price. The second chart examines the relationship between the number of bedrooms in the home and its selling price. Which of these two variables (the size of the home or the number of bedrooms) seems to have the stronger relationship with the home’s selling price? Justify your answer.
 The following scatterplot compares the selling price and the appraised value.
Is there a linear relationship between these two variables? If so, how would you characterize the relationship?
.
 Approximate the percentage of these Internet users who are men under the age of 30.
 Approximate the percentage of these Internet users who are single with no formal education beyond high school.
 Approximate the percentage of these Internet users who are currently employed.
 What is the average annual salary of the employed Internet users in this sample?
 Approximate the percentage of these Internet users who are married with formal education beyond high school.
 What percentage of these Internet users who are married.
 Approximate the percentage of these Internet users who are in the 5871 age group.
 Approximate the percentage of these internet users who are women.
 What percentage of these internet users has formal education beyond high school?
 Approximate the percentage of these internet users who are women in the 3043 age group.
 Explain why the ratio of the average wage of the top 10% of all wage earners to the median measures income inequality.
 Do these data help to confirm or contradict the hypothesis that increased wage inequality leads to lower unemployment levels? [Hint: construct a scatterplot]
 What other data would you need to be more confident that increased income inequality leads to lower unemployment?
:
 A car dealer collected the following information about a sample of 448 Grand Rapids residents:
· Exact salaries of these Grand Rapids residents
· Education level (completed high school only or completed college)
· Income level (low or high)
· Car finance (whether or not the last purchased car was financed)
Using the education level, income level, and car finance data, he created the three pivot tables shown below. Based on these tables; determine how education and income influence the likelihood that a family finances a car.
:
 Some histograms have two or more peaks. This is often an indication that the data come from two or more distinct populations.
 A population includes all elements or objects of interest in a study, whereas a sample is a subset of the population used to gain insights into the characteristics of the population.
 A frequency table indicates how many observations fall within each category, and a histogram is its graphical analog.
 In the term “frequency table,” frequency refers to the number of data values falling within each category.
 Time series data are often graphically depicted on a line chart, which is a plot of the variable of interest over time.
 The number of car insurance policy holders is an example of a discrete random variable
 A variable (or field) is an attribute, or measurement, on members of a population, whereas an observation (or case or record) is a list of all variable values for a single member of a population.
 Phone numbers, Social Security numbers, and zip codes are examples of numerical variables.
 Crosssectional data are data on a population at a distinct point in time, whereas time series data are data collected across time.
 Distribution is a general term used to describe the way data are distributed, as indicated by a frequency table or histogram.
 Both ordinal and nominal variables are categorical.
 A histogram is said to be symmetric if it has a single peak and looks approximately the same to the left and right of the peak.
 Suppose that a sample of 10 observations has a standard deviation of 3, then the sum of the squared deviations from the sample mean is 30.
 If a histogram has a single peak and looks approximately the same to the left and right of the peak, we should expect no difference in the values of the mean, median, and mode.
 The mean is a measure of central location.
 The length of the box in the boxplot portrays the interquartile range.
 In a positively skewed distribution, the mean is smaller than the median and the median is smaller than the mode.
 The value of the standard deviation always exceeds that of the variance.
 The difference between the first and third quartiles is called the interquartile range.
 The standard deviation is measured in original units, such as dollars and pounds.
 The median is one of the most frequently used measures of variability.
 Assume that the histogram of a data set is symmetric and bell shaped, with a mean of 75 and standard deviation of 10. Then, approximately 95% of the data values were between 55 and 95.
 Abby has been keeping track of what she spends to rent movies. The last seven week's expenditures, in dollars, were 6, 4, 8, 9, 6, 12, and 4. The mean amount Abby spends on renting movies is $7.
 Expressed in percentiles, the interquartile range is the difference between the 25^{th} and 75^{th} percentiles.
 The value of the mean times the number of observations equals the sum of all of the data values.
 The difference between the largest and smallest values in a data set is called the range.
 There are four quartiles that divide the values in a data set into four equal parts.
 Suppose that a sample of 8 observations has a standard deviation of 2.50, then the sum of the squared deviations from the sample mean is 17.50.
 The median of a data set with 30 values would be the average of the 15^{th} and the 16^{th} values when the data values are arranged in ascending order.
SHORT ANSWER
 Would you conclude that there is a difference between the salaries of women and men in this plant? Justify your answer.
 How large must a person’s salary should be to qualify as an outlier on the high side? How many outliers are there in these data?
 What can you say about the shape of the distributions given the boxplots above?
 What are the mean and median scores on this exam?
 Explain why the mean and median are different.
 Find the mean, median, standard deviation, first and third quartiles, and the 95^{th} percentile for family incomes in both years.
 The Republicans claim that the country was better off in 1990 than in 1980, because the average income increased. Do you agree?
 Generate a boxplot to summarize the data. What does the boxplot indicate?
 Interpret the variance and standard deviation of this sample.
 Are the empirical rule applicable in this case? If so, apply it and interpret your results. If not, explain why the empirical rule is not applicable here.
 Explain what would cause the mean to be slightly lower than the median in this case.
 Which of the states listed paid their teachers average salaries that exceed at least 75% of all average salaries?
 Which of the states listed paid their teachers average salaries that are below 75% of all average salaries?
 What salary amount represents the second quartile?
 How would you describe the salary of Virginia’s teachers compared to those across the entire United States? Justify your answer.
 What do these statistics tell you about the shape of the distribution?
 What can you say about the relative position of each of the observations 34, 84, and 104?
 Calculate the interquartile range. What does this tell you about the data?
 Compute the mean number of children.
 Compute the median number of children.
 Is the distribution of the number of children symmetrical or skewed? Why?
 The data below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Given the time series plot below, do you see any obvious patterns in the data? Explain.
 An operations management professor is interested in how her students performed on her midterm exam. The histogram shown below represents the distribution of exam scores (where the maximum score is 100) for 50 students.
Based on this histogram, how would you characterize the students’ performance on this exam?
 The proportion of Americans under the age of 18 who are living below the poverty line for each of the years 1959 through 2000 is used to generate the following time series plot.
How successful have Americans been recently in their efforts to win “the war against poverty” for the nation’s children?
 Indicate the type of data for each of the six variables included in this set.
 Based on the histogram shown below, how would you describe the age distribution for these data?
 Based on the histogram shown below, how would you describe the salary distribution for these data?
 What percentage of the job applicants scored between 30 and 40?
 What percentage of the job applicants scored below 60?
 How many job applicants scored between 10 and 30?
 How many job applicants scored above 50?
 Seventy percent of the job applicants scored above what value?
 Half of the job applicants scored below what value?
 A question of great interest to economists is how the distribution of family income has changed in the United States during the last 20 years. The summary measures and histograms shown below are generated for a sample of 500 family incomes, using the 1985 and 2005 income for each family in the sample.
Summary Measures:
Based on these results, discuss as completely as possible how the distribution of family income in the United States changed from 1985 to 2005.
CHAPTER 4: Probability and Probability Distributions
MULTIPLE CHOICE
 Probabilities that cannot be estimated from longrun relative frequencies of events are
 objective probabilities c. complementary probabilities
 subjective probabilities d. joint probabilities
 The probability of an event and the probability of its complement always sum to:
 1 c. any value between 0 and 1
 0 d. any positive value
 If events A and B are mutually exclusive, then the probability of both events occurring simultaneously is equal to
 0.0 c. 1.0
 0.5 d. any value between 0.5 and 1.0
 Probabilities that can be estimated from longrun relative frequencies of events are
 objective probabilities c. complementary probabilities
 subjective probabilities d. joint probabilities
 Let A and B be the events of the FDA approving and rejecting a new drug to treat hypertension, respectively. The events A and B are:
 independent c. unilateral
 conditional d. mutually exclusive
 A function that associates a numerical value with each possible outcome of an uncertain event is called a
 conditional variable c. population variable
 random variable d. sample variable
 The formal way to revise probabilities based on new information is to use:
 complementary probabilities c. unilateral probabilities
 conditional probabilities d. common sense probabilities
 is the:
a. addition rule


c. rule of complements

b. commutative rule


d. rule of opposites




 The law of large numbers is relevant to the estimation of
 objective probabilities c. both of these options
 subjective probabilities d. neither of these options
 A discrete probability distribution:
 lists all of the possible values of the random variable and their corresponding probabilities
 is a tool that can be used to incorporate uncertainty into models
 can be estimated from longrun proportions
 is the distribution of a single random variable
 Which of the following statements are true?
 Probabilities must be nonnegative
 Probabilities must be less than or equal to 1
 The sum of all probabilities for a random variable must be equal to 1
 All of these options are true.
 If P(A) = P(AB), then events A and B are said to be
 mutually exclusive c. exhaustive
 independent d. complementary
 If A and B are mutually exclusive events with P(A) = 0.70, then P(B):
 can be any value between 0 and 1
 can be any value between 0 and 0.70
 cannot be larger than 0.30
 Cannot be determined with the information given
 If two events are collectively exhaustive, what is the probability that one or the other occurs? a. 0.25
 0.50
 1.00
 Cannot be determined from the information given.
 If two events are collectively exhaustive, what is the probability that both occur at the same time? a. 0.00
 0.50
 1.00
 Cannot be determined from the information given.
 If two events are mutually exclusive, what is the probability that one or the other occurs? a. 0.25
 0.50
 1.00
 Cannot be determined from the information given.
 If two events are mutually exclusive, what is the probability that both occur at the same time? a. 0.00
 0.50
 1.00
 Cannot be determined from the information given.
 If two events are mutually exclusive and collectively exhaustive, what is the probability that both occur?
 0.00
 0.50
 1.00
 Cannot be determined from the information given.
 There are two types of random variables, they are
 discrete and continuous c. complementary and cumulative
 exhaustive and mutually exclusive d. real and unreal
 If P(A) = 0.25 and P(B) = 0.65, then P(A and B) is:
 0.25
 0.40
 0.90
 Cannot be determined from the information given
 If two events are independent, what is the probability that they both occur? a. 0
 0.50
 1.00
 Cannot be determined from the information given
 If A and B are any two events with P(A) = .8 and P(B) = .7, then P(and B) is
 .56 c. .24
 .14 d. None of the above
 Which of the following best describes the concept of marginal probability?
 It is a measure of the likelihood that a particular event will occur, regardless of whether another event occurs.
 It is a measure of the likelihood that a particular event will occur, given that another event has already occurred.
 It is a measure of the likelihood of the simultaneous occurrence of two or more events. d. None of the above.
 If A and B are mutually exclusive events with P(A) = 0.30 and P(B) = 0.40, then the probability that either A or B or both occur is:
 0.10 c. 0.70
 0.12 d. None of the above
 If A and B are any two events with P(A) = .8 and P(BA) = .4, then the joint probability of A and B is
 .80 c. .32
 .40 d. 1.20
TRUE/FALSE
 If A and B are independent events with P(A) = 0.40 and P(B) = 0.50, then P(A/B) is 0.50.
 A random variable is a function that associates a numerical value with each possible outcome of a random phenomenon.
 Two or more events are said to be exhaustive if one of them must occur.
 You think you have a 90% chance of passing your statistics class. This is an example of subjective probability.
 The number of cars produced by GM during a given quarter is a continuous random variable.
 Two events A and B are said to be independent if P(A and B) = P(A) + P(B)
 Probability is a number between 0 and 1, inclusive, which measures the likelihood that some event will occur.
 If events A and B have nonzero probabilities, then they can be both independent and mutually exclusive.
 The probability that event A will not occur is denoted as .
 If P(A and B) = 1, then A and B must be collectively exhaustive.
 Conditional probability is the probability that an event will occur, with no other events taken into consideration.
 When we wish to determine the probability that at least one of several events will occur, we would use the addition rule.
 The law of large numbers states that subjective probabilities can be estimated based on the long run relative frequencies of events
 Two events are said to be independent when knowledge of one event is of no value when assessing the probability of the other.
 Suppose A and B are mutually exclusive events where P(A) = 0.2 and P(B) = 0.5, then P(A or B) =
 If A and B are two independent events with P(A) = 0.20 and P(B) = 0.60, then P(A and B) = 0.80
 The relative frequency of an event is the number of times the event occurs out of the total number of times the random experiment is run.
 Marginal probability is the probability that a given event will occur, given that another event has already occurred.
 The temperature of the room in which you are writing this test is a continuous random variable.
 Two events A and B are said to mutually be exclusive if P(A and B) = 0.
 Two or more events are said to be exhaustive if at most one of them can occur.
 When two events are independent, they are also mutually exclusive.
 Two or more events are said to be mutually exclusive if at most one of them can occur.
 Given that events A and B are independent and that P(A) = 0.8 and P(B/A) = 0.4, then P(A and B) =
 The time students spend in a computer lab during one day is an example of a continuous random variable.
 The multiplication rule for two events A and B is: P(A and B) = P(AB)P(A).
 The number of car insurance policy holders is an example of a discrete random variable.
 Suppose A and B are mutually exclusive events where P(A) = 0.3 and P(B) = 0.4, then P(A and B) =
.
 Suppose A and B are two events where P(A) = 0.5, P(B) = 0.4, and P(A and B) = 0.2, then P(B/A) =
 Suppose that after graduation you will either buy a new car (event A) or take a trip to Europe (event B). Events A and B are mutually exclusive.
 If P(A and B) = 0, then A and B must be collectively exhaustive.
 The number of people entering a shopping mall on a given day is an example of a discrete random variable.
 Football teams toss a coin to see who will get their choice of kicking or receiving to begin a game. The probability that given team will win the toss three games in a row is 0.125.
SHORT ANSWER
 Find the probability distribution of X.
 What is the probability that this project will be completed in less than 4 months from now?
 What is the probability that this project will not be completed on time?
 (A) What is the expected completion time (in months) from now for this project?
(B) How much variability (in months) exists around the expected value found in (A)?
 Find the marginal distribution of X. What does this distribution tell you?
 Find the marginal distribution of Y. What does this distribution tell you?
 (A) Calculate the conditional distribution of X given Y.
(B) What is the practical benefit of knowing the conditional distribution in (A)?
 Calculate the conditional distribution of Y given X.
 What is the probability that no one is waiting or being served in the regular checkout line?
 What is the probability that no one is waiting or being served in the express checkout line?
 What is the probability that no more than two customers are waiting in both lines combined?
 On average, how many customers would you expect to see in each of these two lines at the grocery store?
ANS:
Expected number of customers in regular line = E(X) = 1.46
Expected number of customers in express line = E(Y) = 1.60
0.28 0.26 0.25
0.21
0.28 0.26 0.20 0.26
 Find the expected price and demand level for the upcoming quarter.
 What is the probability that the price of this product will be above its mean in the upcoming quarter?
 What is the probability that the demand of this product will be below its mean in the upcoming quarter?
 What is the probability that the demand of this product exceed 2500 units in the upcoming quarter, given that its price will be less than $30?
 What is the probability that the demand of this product will be less than 3500 units in the upcoming quarter, given that its price will be greater than $20?
 Calculate the joint probabilities of
.
 Determine the marginal probability distribution of .
 What is probability of observing the sale of at least one brand 1 bat and at least one brand 2 bat on the same day at this sporting goods store?
 What is the probability of observing the sale of at least one brand 1 bat on a given day at this sporting goods store?
 What is the probability of observing the sale of no more than two brand 2 bats on a given day at this sporting goods store?
 Given that no brand 2 bats are sold on a given day, what is the probability of observing the sale of at least one brand 1 bicycle at this sporting goods store?
 Set up a 22 contingency table for this situation.
 Give an example of a simple event.
 Give an example of a joint event.
 What is the probability that a respondent chosen at random is a male?
 What is the probability that a respondent chosen at random enjoys shopping for clothing?
 What is the probability that a respondent chosen at random is a male and enjoys shopping for clothing?
 What is the probability that a respondent chosen at random is a female and enjoys shopping for clothing?
 What is the probability that a respondent chosen at random is a male and does not enjoy shopping for clothing?
 What is the probability that a respondent chosen at random is a female or enjoys shopping for clothing?
 What is the probability that a respondent chosen at random is a male or does not enjoy shopping for clothing?
 What is the probability that a respondent chosen at random is a male or a female?
 What is the probability that a respondent chosen at random enjoys or does not enjoy shopping for clothing?
 Does consumer behavior depend on the gender of consumer? Explain using probabilities.
 Construct the joint probability table.
 What is the probability a randomly selected patron prefers wine?
 What is the probability a randomly selected patron is a female?
 What is the probability a randomly selected patron is a female who prefers wine?
 What is the probability a randomly selected patron is a female who prefers beer?
 Suppose a randomly selected patron prefers wine. What is the probability the patron is a male?
 Suppose a randomly selected patron prefers beer. What is the probability the patron is a male?
 Suppose a randomly selected patron is a female. What is the probability the patron prefers beer?
 Suppose a randomly selected patron is a female. What is the probability that the patron prefers wine?
 Are gender of patrons and drinking preference independent? Explain.
 Find the probability distribution of X; the number of oil wells that will be successful.
 What is the probability that none of the oil wells will be successful?
 If a new pipeline will be constructed in the event that all three wells are successful, what is the probability that the pipeline will be constructed?
 How many of the wells can the company expect to be successful?
 Suppose the first well to be completed is successful. What is the probability that one of the two remaining wells is successful?
 If it costs $200,000 to drill each well and a successful well will produce $1,000,000 worth of oil over its lifetime, what is the expected net value of this threewell program?