Homework answers / question archive /
1)The decision-making concepts covered in Data Analysis & Decision Making book include which of the following?
Optimization techniques c
1)The decision-making concepts covered in Data Analysis & Decision Making book include which of the following?
Optimization techniques c
Statistics
Share With
1)The decision-making concepts covered in Data Analysis & Decision Making book include which of the following?
-
- Optimization techniques c. Structured sensitivity analysis
- Decision analysis with uncertainty d. All of these options
- Which of the following statements is not true?
- Dealing with uncertainty includes measuring uncertainty
- Dealing with uncertainty includes modeling uncertainty explicitly into the analysis.
- Dealing with uncertainty includes eliminating uncertainty by using the normal probability distribution
- Uncertainty is a key aspect of most business problems, and dealing with uncertainty requires a basic understanding of probability
- Which of the following is not one of the important themes of your Data Analysis & Decision Making book?
- Data analysis c. Decision making
- Dealing with uncertainty d. Data mining
- Data analysis includes
- data description c. the search for relationships in data
- data inference d. All of these options
- Which of the following is not one of the steps in the modeling process?
- Select scale for model
- Collect and summarize data
- Verify the model
- Present the results
- Implement the model and update it through time
- Which of the following would not be included under data analysis?
- Measuring uncertainty c. Data inference
- Data description d. Search for relationships
- The decision making process includes
- optimization techniques for problems with no uncertainty
- decision analysis for problems with uncertainty
- sensitivity analysis
- All of the above
- Which of the following is not one of the types of models described in Data Analysis & Decision Making book?
- Algebraic model c. Scale model
- Spreadsheet model d. Graphical model
- The modeling process discussed in Data Analysis & Decision Making book is a a. seven-step process
- six-step process
- five-step process
- four-step process
- three-step process
- Which of the following is Excel add-in for performing what-if analyses?
- PrecisionTree
- TopRank
- Solver
- @Risk
- StatTools
- Which of the following statements are false?
- The modeling process discussed in Data Analysis & Decision Making book is five- step process
- Dealing with uncertainty requires a basic understanding of probability
- Uncertainty is a key aspect of most business problems
- Data description and data inference are included under data analysis
- Which of the following statements are false? a. Decision-making includes optimization techniques for problems with certainty, decision
analysis for problems with certainty, and structured sensitivity analysis.
b. Graphical models can be very helpful for simple problems. For complex problems, however, graphical models usually fail to show the important elements of a problem and how they are related. c. Dealing with uncertainty includes measuring uncertainty and modeling uncertainty
explicitly into the analysis.
d. All of these options
- Which of the following statements are true:
- A fairly recent alternative to algebraic modeling is spreadsheet modeling. Instead of relating various quantities with algebraic equations and inequalities, we relate them in a spreadsheet with cell formulas.
- Data are usually meaningless until they are analyzed for trends, patterns, relationships, and other useful information
- Algebraic models, by means of algebraic equations and inequalities, specify a set of
relationships in a very precise way. Their main drawback is that they require an ability to work with abstract mathematical symbols.
-
- When we make inferences from data and search for relationships in data, or when we use decision trees to help make decisions, we must deal with uncertainty.
- All of these options
ANS: E
- Which of the following statements are true?
- Three important themes run through the book. Two of them are in the title: data analysis and decision making. The third is dealing with uncertainty.
- Data analysis includes data description, data inference, and the search for relationships in data
- Decision making includes optimization techniques for problems with no uncertainty, decision analysis for problems with uncertainty, and structured sensitivity analysis.
- Dealing with uncertainty includes measuring uncertainty and modeling uncertainty explicitly into the analysis.
- All of these options
ANS: E
- Which of the following is an Excel add-in for simulation?
- PrecisionTree
- TopRank
- Solver
- @Risk
- StatTools
TRUE/FALSE
- Data analysis includes data description, data inference, and the search for relationships in data.
- Decision-making includes optimization techniques for problems with certainty, decision analysis for problems with certainty, and structured sensitivity analysis.
- Dealing with uncertainty includes measuring uncertainty and modeling uncertainty explicitly into the analysis.
- The authors of Data Analysis & Decision Making book described three types of models: graphical models, algebraic models, and spreadsheet models.
- Graphical models are the least intuitive type of model. Its purpose is simply to provide enough quantitative details to enable us solve the problem of interest.
- Three important themes run through this book: data analysis, decision-making, and dealing with uncertainty.
- Graphical models can be very helpful for simple problems. For complex problems, however, graphical models usually fail to show the important elements of a problem and how they are related.
- The overall modeling process typically done in the business world always require seven steps: define the problem, collect and summarize data, formulate a model, verify the model, select one or more suitable decisions, present the results to the organization, and finally implement the model and update it through time.
- Algebraic models, by means of algebraic equations and inequalities, specify a set of relationships in a very precise way. Their main drawback is that they require an ability to work with abstract mathematical symbols.
- Data are usually meaningless until they are analyzed for trends, patterns, relationships, and other useful information.
- A fairly recent alternative to algebraic modeling is spreadsheet modeling. Instead of relating various quantities with algebraic equations and inequalities, we relate them in a spreadsheet with cell formulas.
- When we use simulation models to help make decisions, we do not deal with uncertainty at all, since we often must make inferences from the simulated data.
- When we make inferences from data and search for relationships in data, or when we use decision trees to help make decisions, we must deal with uncertainty.
- The @Risk is Excel add-in that can be used to run replications of a simulation, keep track of outputs, create useful charts, and perform sensitivity analyses.
- Graphical models are probably the least intuitive and most quantitative type of model.
CHAPTER 2: Describing the Distribution of a Single Variable
MULTIPLE CHOICE
- A sample of a population taken at one particular point in time is categorized as:
- categorical c. cross-sectional
- discrete d. time-series
- If data is stored in a database package, which of the following terms are typically used?
- Fields and records c. Variables and samples
- Cases and columns d. Variables and observations
- Researchers may gain insight into the characteristics of a population by examining a
- mathematical model describing the population
- sample of the population
- description of the population
- replica
- Numerical variables can be subdivided into which two types?
- Diverse and categorical c. Nominal and progressive
- Discrete and continuous d. Cross-sectional and discrete
- Gender and State are examples of which type of data?
- Discrete data c. Categorical data
- Continuous data d. Ordinal data
- Which of the following indicates how many observations fall into various categories?
- The Likert scale c. The sample table
- The frequency table d. The tabulation scale
- Data that arise from counts are called:
- continuous data c. counted data
- nominal data d. discrete data
- A histogram that is positively skewed is also called
- skewed to the right c. balanced
- skewed to the left d. symmetric
- A histogram that has exactly two peaks is called a
- unimodal distribution c. skewed distribution
- bimodal distribution d. scatterplot
- A histogram that has a single peak and looks approximately the same to the left and right of the peak is called:
- bimodal c. balanced
- symmetric d. proportional
- A variable is classified as ordinal if:
- there is a natural ordering of categories
- there is no natural ordering of categories
- the data arise from continuous measurements
- we track the variable through a period of time
- In order for the characteristics of a sample to be generalized to the entire population, it should be:
- symbolic of the population c. representative of the population
- typical of the population d. illustrative of the population
- When we look at a time series plot, we usually look for which two things?
- “Is there an observable trend?” and “Is there a seasonal pattern?”
- “Is there an observable trend” and “Can we make predictions?”
- “Is the sample representative?” and “Is there a seasonal pattern?”
- “Is there an observable trend?” and “Is the trend symmetric?”
- Which of the following are possible categorizations of data type?
- Numerical versus categorical (with subcategories nominal, ordinal)
- Discrete versus continuous
- Cross-sectional versus time series
- All of these options
- Two of these options
- Which of the following are the two most commonly used measures of variability?
- Variance and median
- Variance and standard deviation
- Mean and variance
- Mean and range
- First quartile and third quartile
- The median can also be described as:
- the middle observation when the data values are arranged in ascending order b. the second quartile
-
- the 50th percentile
- All of these options
- The difference between the first and third quartile is called the
- interquartile range
- interdependent range
- unimodal range
- bimodal range
- mid range
- If a value represents the 95th percentile, this means that
- 95% of all values are below this value
- 95% of all values are above this value
- 95% of the time you will observe this value
- there is a 5% chance that this value is incorrect
- there is a 95% chance that this value is correct
- For a boxplot, the point inside the box indicates the location of the
- mean c. minimum value
- median d. maximum value
- For a boxplot, the vertical line inside the box indicates the location of the
- mean
- median
- mode
- minimum value
- maximum value
- Which of the following are the three most common measures of central location?
- Mean, median, and mode
- Mean, variance, and standard deviation
- Mean, median, and variance
- Mean, median, and standard deviation
- First quartile, second quartile, and third quartile
- The length of the box in the boxplot portrays the
- mean
- median
- range
- interquartile range
- third quartile
- Suppose that a histogram of a data set is approximately symmetric and "bell shaped". Approximately what percent of the observations are within two standard deviations of the mean? a. 50%
- 68%
- 95%
- 99.7%
- 100%
- The mode is best described as the
- middle observation
- same as the average
- 50th percentile
- most frequently occurring value
- third quartile
- For a boxplot, the box itself represents what percent of the observations?
- lower 25%
- middle 50%
- upper 75%
- upper 90%
- 100%
- Which of the following statements is true for the following data values: 7, 5, 6, 4, 7, 8, and 12?
- The mean, median and mode are all equal
- Only the mean and median are equal
- Only the mean and mode are equal
- Only the median and mode are equal
- In a histogram, the percentage of the total area which must be to the left of the median is: a. exactly 50%
- less than 50% if the distribution is skewed to the left
- more than 50% if the distribution is skewed to the right
- between 25% and 50% if the distribution is symmetric and unimodal
- The average score for a class of 30 students was 75. The 20 male students in the class averaged 70. The 10 female students in the class averaged: a. 75
- 85
- 60
- 70
- 80
- Which of the following statements is true?
- The sum of the deviations from the mean is always zero
- The sum of the squared deviations from the mean is always zero
- The range is always smaller than the variance
- The standard deviation is always smaller than the variance
- Expressed in percentiles, the interquartile range is the difference between the
- 10th and 60th percentiles
- 15th and 65th percentiles
- 20th and 70th percentiles
- 25th and 75th percentiles
- 35th and 85th percentiles
- A sample of 20 observations has a standard deviation of 4. The sum of the squared deviations from the sample mean is: a. 400
- 320
- 304
- 288
- 180
TRUE/FALSE
- Age, height, and weight are examples of numerical data.
- Data can be categorized as cross-sectional or time series.
- All nominal data may be treated as ordinal data.
- Four different shapes of histograms are commonly observed: symmetric, positively skewed, negatively skewed, and bimodal.
- Categorical variables can be classified as either discrete or continuous.
- A skewed histogram is one with a long tail extending either to the right or left. The former is called negatively skewed, and the later is called positively skewed.
CHAPTER 3: Finding Relationships Among Variables
MULTIPLE CHOICE
- To examine relationships between two categorical variables, we can use
- Counts and corresponding charts of the counts
- Scatterplots
- Histograms
- None of these options
- Tables used to display counts of a categorical variable are called
- Crosstabs c. Both of these options
- Contingency tables d. Neither of these options
- The Excel function that allows you to count using more than one criterion is
- COUNTIF
- COUNTIFS
- SUMPRODUCT
- VLOOKUP
- HLOOKUP
- Example of comparison problems include
- Salary broken down by male and female subpopulations
- Cost of living broken down by region of a country
- Recovery rate for a disease broken down by patients who have taken a drug and patients who have taken a placebo
- Starting salary of recent graduates broken down by academic major
- All of these options
- The most common data format is
- Long c. Stacked
- Short d. Unstacked
- A useful way of comparing the distribution of a numerical variable across categories of some
categorical variable is
-
- Side-by-side boxplots c. Both of these options
- Side-by-side histograms d. Neither of these options
- We study relationships among numerical variables using
- Correlation
- Covariance
- Scatterplots
- All of these options
- None of these options
- Scatterplots are also referred to as
- Crosstabs
- Contingency charts
- X-Y charts
- All of these options
- None of these options
- Correlation and covariance measure
- The strength of a linear relationship between two numerical variables
- The direction of a linear relationship between two numerical variables
- The strength and direction of a linear relationship between two numerical variables
- The strength and direction of a linear relationship between two categorical variables e. None of these options
- We can infer that there is a strong relationship between two numerical variables when
- The points on a scatterplot cluster tightly around an upward sloping straight line
- The points on a scatterplot cluster tightly around a downward sloping straight line c. Either of these options
d. Neither of these options
- The limitation of covariance as a descriptive measure of association is that it
- Only captures positive relationships
- Does not capture the units of the variables
- Is very sensitive to the units of the variables
- Is invalid if one of the variables is categorical
- None of these options
- A the correlation is close to 0, then we expect to see
- An upward sloping cluster of points on the scatterplot
- A downward sloping cluster of points
- A cluster of points around a trendline
- A cluster of points with no apparent relationship
- We cannot say what the scatterplot should look like based on the correlation
- We are usually on the lookout for large correlations near
- +1 c. Either of these options
- -1 d. Neither of these options
- The correlation is best interpreted
- By itself
- Along with the covariance
- Along with the corresponding scatterplot
- Along with the corresponding contingency chart
- Along with the mean and standard deviation
- Which of the following are considered measures of association?
- Mean and variance
- Variance and correlation
- Correlation and covariance
- Covariance and variance
- First quartile and third quartile
- Generally speaking, if two variables are unrelated (as one increases, the other shows no pattern), the covariance will be
- a large positive number
- a large negative number
- a positive or negative number close to zero
- a positive or negative number close to +1 or -1
- A perfect straight line sloping downward would produce a correlation coefficient equal to a. +1
- –1
- 0
- +2
- –2
- If Cov(X,Y) = - 16.0, variance of X = 25, variance of Y = 16 then the sample coefficient of correlation r is
- + 1.60
- – 1.60
- – 0.80
- + 0.80
- Cannot be determined from the given information
- A scatterplot allows one to see:
- whether there is any relationship between two variables
- what type of relationship there is between two variables
- Both options are correct
- Neither option is correct
- The tool that provides useful information about a data set by breaking it down into subpopulations is the:
- histogram c. pivot table
- scatterplot d. spreadsheet
- The tables that result from pivot tables are called:
- samples c. specimens
- sub-tables d. crosstabs
- Which of the following statements are false?
- Contingency tables are traditional statistical terms for pivot tables that list counts.
- Time series plot is a chart showing behavior over time of a time series variable.
- Pivot table is a table in Excel that summarizes data broken down by one or more numerical variables.
- None of these options
- Which of the following are true statements of pivot tables?
- They allow us to “slice and dice” data in a variety of ways.
- Statisticians often refer to them as contingency tables or crosstabs.
- Pivot tables can list counts, averages, sums, and other summary measures, whereas contingency tables list only counts.
- All of these options
TRUE/FALSE
- Counts for categorical variable are often expressed as percentages of the total.
- An example of a joint category of two variables is the count of all non-drinkers who are also nonsmokers.
- Joint categories for categorical variables cannot be used to make inferences about the relationship between the individual categorical variables.
- Problems in data analysis where we want to compare a numerical variable across two or more subpopulations are called comparison problems.
- Side-by-side boxplots allow you to quickly see how two or more categories of a numerical variable compare
- We must specify appropriate bins for side-by-side histograms in order to make fair comparisons of distributions by category.
- Correlation and covariance can be used to examine relationships between numerical variables and categorical variables that have been coded numerically.
- A trend line on a scatterplot is a line or a curve that fits the scatter as well as possible
- To form a scatterplot of X versus Y, X and Y must be paired
- Correlation has the advantage of being in the same original units as the X and Y variables
- Correlation is a single-number summary of a scatterplot
- We do not even try to interpret correlations numerically except possibly to check whether they are positive or negative
- The cutoff for defining a large correlation is >0.7 or <-0.7.
- Generally speaking, if two variables are unrelated, the covariance will be a positive or negative number close to zero
- The correlation between two variables is a unitless and is always between –1 and +1.
- If the standard deviations of X and Y are 15.5 and 10.8, respectively, and the covariance of X and Y is 128.8, then the coefficient of correlation r is approximately 0.77.
- It is possible that the data points are close to a curve and have a correlation close to 0, because correlation is relevant only for measuring linear relationships.
- If the coefficient of correlation r = 0 .80, the standard deviations of X and Y are 20 and 25, respectively, then Cov(X, Y) must be 400.
- The advantage that the coefficient of correlation has over the covariance is that the former has a set lower and upper limit.
- If the standard deviation of X is 15, the covariance of X and Y is 94.5, the coefficient of correlation r =
0.90, then the variance of Y is 7.0.
- The scatterplot is a graphical technique used to describe the relationship between two numerical variables.
- Statisticians often refer to the pivot tables as contingency tables or crosstabs.
- If we draw a straight line through the points in a scatterplot and most of the points fall close to the line, there is a strong positive linear relationship between the two variables.
SHORT ANSWER
Table of Correlations
|
Gender
|
Age
|
Prior Exp
|
Gamma Exp
|
Education
|
Salary
|
Gender
|
1.000
|
|
|
|
|
|
Age
|
-0.111
|
1.000
|
|
|
|
|
Prior Exp
|
0.054
|
0.800
|
1.000
|
|
|
|
Gamma Exp
|
-0.203
|
0.916
|
0.587
|
1.000
|
|
|
Education
|
-0.039
|
0.518
|
0.434
|
0.342
|
1.000
|
|
Salary
|
-0.154
|
0.923
|
0.723
|
0.870
|
0.617
|
1.000
|
Table of Covariances (variances on the diagonal)
|
Gender
|
Age
|
Prior Exp
|
Gamma Exp
|
Education
|
Salary
|
Gender
|
0.259
|
|
|
|
Age -0.633 134.051
Prior Exp 0.117 39.060 19.045
Gamma Exp -0.700 72.047 17.413 49.421
Education -0.033 9.951 3.140 3.987 2.947
Salary -1825.97 249702.35 73699.75 143033.29 24747.68 584640062
- Which two variables have the strongest linear relationship with annual salary?
- For which of the two variables, number of years of prior work experience or number of years of post-secondary education, is the relationship with salary stronger? Justify your answer.
- How would you characterize the relationship between gender and annual salary?
- The percentage of the US population without health insurance coverage for samples from the 50 states and District of Columbia for both 2003 and 2004 produced the following table of correlations.
Table of Correlations:
Percent 2003
Percent 2003 Percent 2004 Percent 2004
What does the table for the two given sets of percentages tell you in this case?
The correlation for each pairing of variables are shown in the table below:
Table of correlations
- Which of the variables have a positive linear relationship with the household’s average monthly expenditure on utilities?
- Which of the variables have a negative linear relationship with the household’s average monthly expenditure on utilities?
- Which of the variables have essentially no linear relationship with the household’s average monthly expenditure on utilities?
- Three samples, regarding the ages of teachers, are selected randomly as shown below:
Sample A: 17 22 20 18 23 Sample B: 30 28 35 40 25
Sample C: 44 39 54 21 52
How is the value of the correlation coefficient r affected in each of the following cases? a) Each X value is multiplied by 4.
- Each X value is switched with the corresponding Y value.
- Each X value is increased by 2.
- The students at small community college in Iowa apply to study either English or Business. Some administrators at the college are concerned that women are being discriminated against in being allowed admittance, particularly in the business program. Below, you will find two pivot tables that show the percentage of students admitted by gender to the English program and the Business school. The data has also been presented graphically. What do the data and graphs indicate?
- A sample of 30 schools produced the pivot table shown below for the average percentage of students graduating from high school. Use this table to determine how the type of school (public or Catholic) that students attend affects their chance of graduating from high school.
- A data set from a sample of 399 Michigan families was collected. The characteristics of the data include family size (large or small), number of cars owned by family (1, 2, 3, or 4), and whether family owns a foreign car. Excel produced the pivot table shown below.
Use this pivot table to determine how family size and number of cars owned influence the likelihood that a family owns a foreign car.
- Of those in the sample who went partying the weekend before the final exam, what percentage of them did well in the exam?
- Of those in the sample who did well on the final exam, what percentage of them went partying the weekend before the exam?
- What percentage of the students in the sample went partying the weekend before the final exam and did well in the exam?
- What percentage of the students in the sample spent the weekend studying and did well in the final exam?
- What percentage of the students in the sample went partying the weekend before the final exam and did poorly on the exam?
- If the sample is a good representation of the population, what percentage of the students in the population should we expect to spend the weekend studying and do poorly on the final exam?
- If the sample is a good representation of the population, what percentage of those who spent the weekend studying should we expect to do poorly on the final exam?
- If the sample is a good representation of the population, what percentage of those who did poorly on the final exam should we expect to have spent the weekend studying?
- Of those in the sample who went partying the weekend before the final exam, what percentage of them did poorly in the exam?
- Of those in the sample who did well in the final exam, what percentage of them spent the weekend before the exam studying?
- A health magazine reported that a man’s weight at birth has a significant impact on the chance that the man will suffer a heart attack during his life. A statistician analyzed a data set for a sample of 798 men, and produced the pivot table and histogram shown below. Determine how birth weight influences the chances that a man will have a heart attack.
- The table shown below contains information technology (IT) investment as a percentage of total investment for eight countries during the 1990s. It also contains the average annual percentage change in employment during the 1990s. Explain how these data shed light on the question of whether IT investment creates or costs jobs. (Hint: Use the data to construct a scatterplot)
Country
|
% IT
|
% Change
|
Netherlands
|
2.5%
|
1.6%
|
Italy
|
4.1%
|
2.2%
|
Germany
|
4.5%
|
2.0%
|
France
|
5.5%
|
1.8%
|
Canada
|
8.3%
|
2.7%
|
Japan
|
8.3%
|
2.7%
|
Britain
|
8.3%
|
3.3%
|
U.S.
|
12.4%
|
3.7%
|
- There are two scatterplots shown below. The first chart shows the relationship between the size of the home and the selling price. The second chart examines the relationship between the number of bedrooms in the home and its selling price. Which of these two variables (the size of the home or the number of bedrooms) seems to have the stronger relationship with the home’s selling price? Justify your answer.
- The following scatterplot compares the selling price and the appraised value.
Is there a linear relationship between these two variables? If so, how would you characterize the relationship?
.
- Approximate the percentage of these Internet users who are men under the age of 30.
- Approximate the percentage of these Internet users who are single with no formal education beyond high school.
- Approximate the percentage of these Internet users who are currently employed.
- What is the average annual salary of the employed Internet users in this sample?
- Approximate the percentage of these Internet users who are married with formal education beyond high school.
- What percentage of these Internet users who are married.
- Approximate the percentage of these Internet users who are in the 58-71 age group.
- Approximate the percentage of these internet users who are women.
- What percentage of these internet users has formal education beyond high school?
- Approximate the percentage of these internet users who are women in the 30-43 age group.
- Explain why the ratio of the average wage of the top 10% of all wage earners to the median measures income inequality.
- Do these data help to confirm or contradict the hypothesis that increased wage inequality leads to lower unemployment levels? [Hint: construct a scatterplot]
- What other data would you need to be more confident that increased income inequality leads to lower unemployment?
:
- A car dealer collected the following information about a sample of 448 Grand Rapids residents:
· Exact salaries of these Grand Rapids residents
· Education level (completed high school only or completed college)
· Income level (low or high)
· Car finance (whether or not the last purchased car was financed)
Using the education level, income level, and car finance data, he created the three pivot tables shown below. Based on these tables; determine how education and income influence the likelihood that a family finances a car.
:
- Some histograms have two or more peaks. This is often an indication that the data come from two or more distinct populations.
- A population includes all elements or objects of interest in a study, whereas a sample is a subset of the population used to gain insights into the characteristics of the population.
- A frequency table indicates how many observations fall within each category, and a histogram is its graphical analog.
- In the term “frequency table,” frequency refers to the number of data values falling within each category.
- Time series data are often graphically depicted on a line chart, which is a plot of the variable of interest over time.
- The number of car insurance policy holders is an example of a discrete random variable
- A variable (or field) is an attribute, or measurement, on members of a population, whereas an observation (or case or record) is a list of all variable values for a single member of a population.
- Phone numbers, Social Security numbers, and zip codes are examples of numerical variables.
- Cross-sectional data are data on a population at a distinct point in time, whereas time series data are data collected across time.
- Distribution is a general term used to describe the way data are distributed, as indicated by a frequency table or histogram.
- Both ordinal and nominal variables are categorical.
- A histogram is said to be symmetric if it has a single peak and looks approximately the same to the left and right of the peak.
- Suppose that a sample of 10 observations has a standard deviation of 3, then the sum of the squared deviations from the sample mean is 30.
- If a histogram has a single peak and looks approximately the same to the left and right of the peak, we should expect no difference in the values of the mean, median, and mode.
- The mean is a measure of central location.
- The length of the box in the boxplot portrays the interquartile range.
- In a positively skewed distribution, the mean is smaller than the median and the median is smaller than the mode.
- The value of the standard deviation always exceeds that of the variance.
- The difference between the first and third quartiles is called the interquartile range.
- The standard deviation is measured in original units, such as dollars and pounds.
- The median is one of the most frequently used measures of variability.
- Assume that the histogram of a data set is symmetric and bell shaped, with a mean of 75 and standard deviation of 10. Then, approximately 95% of the data values were between 55 and 95.
- Abby has been keeping track of what she spends to rent movies. The last seven week's expenditures, in dollars, were 6, 4, 8, 9, 6, 12, and 4. The mean amount Abby spends on renting movies is $7.
- Expressed in percentiles, the interquartile range is the difference between the 25th and 75th percentiles.
- The value of the mean times the number of observations equals the sum of all of the data values.
- The difference between the largest and smallest values in a data set is called the range.
- There are four quartiles that divide the values in a data set into four equal parts.
- Suppose that a sample of 8 observations has a standard deviation of 2.50, then the sum of the squared deviations from the sample mean is 17.50.
- The median of a data set with 30 values would be the average of the 15th and the 16th values when the data values are arranged in ascending order.
SHORT ANSWER
- Would you conclude that there is a difference between the salaries of women and men in this plant? Justify your answer.
- How large must a person’s salary should be to qualify as an outlier on the high side? How many outliers are there in these data?
- What can you say about the shape of the distributions given the boxplots above?
- What are the mean and median scores on this exam?
- Explain why the mean and median are different.
- Find the mean, median, standard deviation, first and third quartiles, and the 95th percentile for family incomes in both years.
- The Republicans claim that the country was better off in 1990 than in 1980, because the average income increased. Do you agree?
- Generate a boxplot to summarize the data. What does the boxplot indicate?
- Interpret the variance and standard deviation of this sample.
- Are the empirical rule applicable in this case? If so, apply it and interpret your results. If not, explain why the empirical rule is not applicable here.
- Explain what would cause the mean to be slightly lower than the median in this case.
- Which of the states listed paid their teachers average salaries that exceed at least 75% of all average salaries?
- Which of the states listed paid their teachers average salaries that are below 75% of all average salaries?
- What salary amount represents the second quartile?
- How would you describe the salary of Virginia’s teachers compared to those across the entire United States? Justify your answer.
- What do these statistics tell you about the shape of the distribution?
- What can you say about the relative position of each of the observations 34, 84, and 104?
- Calculate the interquartile range. What does this tell you about the data?
- Compute the mean number of children.
- Compute the median number of children.
- Is the distribution of the number of children symmetrical or skewed? Why?
- The data below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Given the time series plot below, do you see any obvious patterns in the data? Explain.
- An operations management professor is interested in how her students performed on her midterm exam. The histogram shown below represents the distribution of exam scores (where the maximum score is 100) for 50 students.
Based on this histogram, how would you characterize the students’ performance on this exam?
- The proportion of Americans under the age of 18 who are living below the poverty line for each of the years 1959 through 2000 is used to generate the following time series plot.
How successful have Americans been recently in their efforts to win “the war against poverty” for the nation’s children?
- Indicate the type of data for each of the six variables included in this set.
- Based on the histogram shown below, how would you describe the age distribution for these data?
- Based on the histogram shown below, how would you describe the salary distribution for these data?
- What percentage of the job applicants scored between 30 and 40?
- What percentage of the job applicants scored below 60?
- How many job applicants scored between 10 and 30?
- How many job applicants scored above 50?
- Seventy percent of the job applicants scored above what value?
- Half of the job applicants scored below what value?
- A question of great interest to economists is how the distribution of family income has changed in the United States during the last 20 years. The summary measures and histograms shown below are generated for a sample of 500 family incomes, using the 1985 and 2005 income for each family in the sample.
Summary Measures:
Based on these results, discuss as completely as possible how the distribution of family income in the United States changed from 1985 to 2005.
CHAPTER 4: Probability and Probability Distributions
MULTIPLE CHOICE
- Probabilities that cannot be estimated from long-run relative frequencies of events are
- objective probabilities c. complementary probabilities
- subjective probabilities d. joint probabilities
- The probability of an event and the probability of its complement always sum to:
- 1 c. any value between 0 and 1
- 0 d. any positive value
- If events A and B are mutually exclusive, then the probability of both events occurring simultaneously is equal to
- 0.0 c. 1.0
- 0.5 d. any value between 0.5 and 1.0
- Probabilities that can be estimated from long-run relative frequencies of events are
- objective probabilities c. complementary probabilities
- subjective probabilities d. joint probabilities
- Let A and B be the events of the FDA approving and rejecting a new drug to treat hypertension, respectively. The events A and B are:
- independent c. unilateral
- conditional d. mutually exclusive
- A function that associates a numerical value with each possible outcome of an uncertain event is called a
- conditional variable c. population variable
- random variable d. sample variable
- The formal way to revise probabilities based on new information is to use:
- complementary probabilities c. unilateral probabilities
- conditional probabilities d. common sense probabilities
is the:
a. addition rule
|
|
c. rule of complements
|
b. commutative rule
|
|
d. rule of opposites
|
|
|
|
- The law of large numbers is relevant to the estimation of
- objective probabilities c. both of these options
- subjective probabilities d. neither of these options
- A discrete probability distribution:
- lists all of the possible values of the random variable and their corresponding probabilities
- is a tool that can be used to incorporate uncertainty into models
- can be estimated from long-run proportions
- is the distribution of a single random variable
- Which of the following statements are true?
- Probabilities must be nonnegative
- Probabilities must be less than or equal to 1
- The sum of all probabilities for a random variable must be equal to 1
- All of these options are true.
- If P(A) = P(A|B), then events A and B are said to be
- mutually exclusive c. exhaustive
- independent d. complementary
- If A and B are mutually exclusive events with P(A) = 0.70, then P(B):
- can be any value between 0 and 1
- can be any value between 0 and 0.70
- cannot be larger than 0.30
- Cannot be determined with the information given
- If two events are collectively exhaustive, what is the probability that one or the other occurs? a. 0.25
- 0.50
- 1.00
- Cannot be determined from the information given.
- If two events are collectively exhaustive, what is the probability that both occur at the same time? a. 0.00
- 0.50
- 1.00
- Cannot be determined from the information given.
- If two events are mutually exclusive, what is the probability that one or the other occurs? a. 0.25
- 0.50
- 1.00
- Cannot be determined from the information given.
- If two events are mutually exclusive, what is the probability that both occur at the same time? a. 0.00
- 0.50
- 1.00
- Cannot be determined from the information given.
- If two events are mutually exclusive and collectively exhaustive, what is the probability that both occur?
- 0.00
- 0.50
- 1.00
- Cannot be determined from the information given.
- There are two types of random variables, they are
- discrete and continuous c. complementary and cumulative
- exhaustive and mutually exclusive d. real and unreal
- If P(A) = 0.25 and P(B) = 0.65, then P(A and B) is:
- 0.25
- 0.40
- 0.90
- Cannot be determined from the information given
- If two events are independent, what is the probability that they both occur? a. 0
- 0.50
- 1.00
- Cannot be determined from the information given
- If A and B are any two events with P(A) = .8 and P(B|
) = .7, then P(
and B) is
- .56 c. .24
- .14 d. None of the above
- Which of the following best describes the concept of marginal probability?
- It is a measure of the likelihood that a particular event will occur, regardless of whether another event occurs.
- It is a measure of the likelihood that a particular event will occur, given that another event has already occurred.
- It is a measure of the likelihood of the simultaneous occurrence of two or more events. d. None of the above.
- If A and B are mutually exclusive events with P(A) = 0.30 and P(B) = 0.40, then the probability that either A or B or both occur is:
- 0.10 c. 0.70
- 0.12 d. None of the above
- If A and B are any two events with P(A) = .8 and P(B|A) = .4, then the joint probability of A and B is
- .80 c. .32
- .40 d. 1.20
TRUE/FALSE
- If A and B are independent events with P(A) = 0.40 and P(B) = 0.50, then P(A/B) is 0.50.
- A random variable is a function that associates a numerical value with each possible outcome of a random phenomenon.
- Two or more events are said to be exhaustive if one of them must occur.
- You think you have a 90% chance of passing your statistics class. This is an example of subjective probability.
- The number of cars produced by GM during a given quarter is a continuous random variable.
- Two events A and B are said to be independent if P(A and B) = P(A) + P(B)
- Probability is a number between 0 and 1, inclusive, which measures the likelihood that some event will occur.
- If events A and B have nonzero probabilities, then they can be both independent and mutually exclusive.
- The probability that event A will not occur is denoted as
.
- If P(A and B) = 1, then A and B must be collectively exhaustive.
- Conditional probability is the probability that an event will occur, with no other events taken into consideration.
- When we wish to determine the probability that at least one of several events will occur, we would use the addition rule.
- The law of large numbers states that subjective probabilities can be estimated based on the long run relative frequencies of events
- Two events are said to be independent when knowledge of one event is of no value when assessing the probability of the other.
- Suppose A and B are mutually exclusive events where P(A) = 0.2 and P(B) = 0.5, then P(A or B) =
- If A and B are two independent events with P(A) = 0.20 and P(B) = 0.60, then P(A and B) = 0.80
- The relative frequency of an event is the number of times the event occurs out of the total number of times the random experiment is run.
- Marginal probability is the probability that a given event will occur, given that another event has already occurred.
- The temperature of the room in which you are writing this test is a continuous random variable.
- Two events A and B are said to mutually be exclusive if P(A and B) = 0.
- Two or more events are said to be exhaustive if at most one of them can occur.
- When two events are independent, they are also mutually exclusive.
- Two or more events are said to be mutually exclusive if at most one of them can occur.
- Given that events A and B are independent and that P(A) = 0.8 and P(B/A) = 0.4, then P(A and B) =
- The time students spend in a computer lab during one day is an example of a continuous random variable.
- The multiplication rule for two events A and B is: P(A and B) = P(A|B)P(A).
- The number of car insurance policy holders is an example of a discrete random variable.
- Suppose A and B are mutually exclusive events where P(A) = 0.3 and P(B) = 0.4, then P(A and B) =
.
- Suppose A and B are two events where P(A) = 0.5, P(B) = 0.4, and P(A and B) = 0.2, then P(B/A) =
- Suppose that after graduation you will either buy a new car (event A) or take a trip to Europe (event B). Events A and B are mutually exclusive.
- If P(A and B) = 0, then A and B must be collectively exhaustive.
- The number of people entering a shopping mall on a given day is an example of a discrete random variable.
- Football teams toss a coin to see who will get their choice of kicking or receiving to begin a game. The probability that given team will win the toss three games in a row is 0.125.
SHORT ANSWER
- Find the probability distribution of X.
- What is the probability that this project will be completed in less than 4 months from now?
- What is the probability that this project will not be completed on time?
- (A) What is the expected completion time (in months) from now for this project?
(B) How much variability (in months) exists around the expected value found in (A)?
- Find the marginal distribution of X. What does this distribution tell you?
- Find the marginal distribution of Y. What does this distribution tell you?
- (A) Calculate the conditional distribution of X given Y.
(B) What is the practical benefit of knowing the conditional distribution in (A)?
- Calculate the conditional distribution of Y given X.
- What is the probability that no one is waiting or being served in the regular checkout line?
- What is the probability that no one is waiting or being served in the express checkout line?
- What is the probability that no more than two customers are waiting in both lines combined?
- On average, how many customers would you expect to see in each of these two lines at the grocery store?
ANS:
Expected number of customers in regular line = E(X) = 1.46
Expected number of customers in express line = E(Y) = 1.60
0.28 0.26 0.25
0.21
0.28 0.26 0.20 0.26
- Find the expected price and demand level for the upcoming quarter.
- What is the probability that the price of this product will be above its mean in the upcoming quarter?
- What is the probability that the demand of this product will be below its mean in the upcoming quarter?
- What is the probability that the demand of this product exceed 2500 units in the upcoming quarter, given that its price will be less than $30?
- What is the probability that the demand of this product will be less than 3500 units in the upcoming quarter, given that its price will be greater than $20?
- Calculate the joint probabilities of
.
- Determine the marginal probability distribution of
.
- What is probability of observing the sale of at least one brand 1 bat and at least one brand 2 bat on the same day at this sporting goods store?
- What is the probability of observing the sale of at least one brand 1 bat on a given day at this sporting goods store?
- What is the probability of observing the sale of no more than two brand 2 bats on a given day at this sporting goods store?
- Given that no brand 2 bats are sold on a given day, what is the probability of observing the sale of at least one brand 1 bicycle at this sporting goods store?
- Set up a 2
2 contingency table for this situation.
- Give an example of a simple event.
- Give an example of a joint event.
- What is the probability that a respondent chosen at random is a male?
- What is the probability that a respondent chosen at random enjoys shopping for clothing?
- What is the probability that a respondent chosen at random is a male and enjoys shopping for clothing?
- What is the probability that a respondent chosen at random is a female and enjoys shopping for clothing?
- What is the probability that a respondent chosen at random is a male and does not enjoy shopping for clothing?
- What is the probability that a respondent chosen at random is a female or enjoys shopping for clothing?
- What is the probability that a respondent chosen at random is a male or does not enjoy shopping for clothing?
- What is the probability that a respondent chosen at random is a male or a female?
- What is the probability that a respondent chosen at random enjoys or does not enjoy shopping for clothing?
- Does consumer behavior depend on the gender of consumer? Explain using probabilities.
- Construct the joint probability table.
- What is the probability a randomly selected patron prefers wine?
- What is the probability a randomly selected patron is a female?
- What is the probability a randomly selected patron is a female who prefers wine?
- What is the probability a randomly selected patron is a female who prefers beer?
- Suppose a randomly selected patron prefers wine. What is the probability the patron is a male?
- Suppose a randomly selected patron prefers beer. What is the probability the patron is a male?
- Suppose a randomly selected patron is a female. What is the probability the patron prefers beer?
- Suppose a randomly selected patron is a female. What is the probability that the patron prefers wine?
- Are gender of patrons and drinking preference independent? Explain.
- Find the probability distribution of X; the number of oil wells that will be successful.
- What is the probability that none of the oil wells will be successful?
- If a new pipeline will be constructed in the event that all three wells are successful, what is the probability that the pipeline will be constructed?
- How many of the wells can the company expect to be successful?
- Suppose the first well to be completed is successful. What is the probability that one of the two remaining wells is successful?
- If it costs $200,000 to drill each well and a successful well will produce $1,000,000 worth of oil over its lifetime, what is the expected net value of this three-well program?