Social Science Statistics STA2122.501 ? ONLINE

Homework answers / question archive / Social Science Statistics STA2122

Social Science Statistics STA2122

Statistics

Share With

Social Science Statistics

STA2122.501 ? ONLINE

University of South Florida

Instructor: Dr. Erica L.Toothman

Email: USFstats@gmail.com

Project Report 4: Exploring national values through bivariate analyses

Contents:

Description and Required Materials (Page 1)
Project Instructions (Page 2)
Grading Rubric (Page 6)
APPENDIX A: Variable Codebook (Page 8)
APPENDIX B: SPSS Analysis Instructions (Page 10)

For this project, you will be asked to detect, and critically reflect upon, patterns in social data using a representative sample of United States residents (i.e., General Social Survey). First, you will perform a cross-tabulation of two ordinal/nominal variables. The second test, an ANOVA, will have you examine a causal relationship between an ordinal independent variable

and an interval-ratio dependent variable. And for your third test, you will examine the relationship between two interval-ratio variables. This project covers learning goals and objectives #1, 2, 3, and 4 as listed in the syllabus (and below).

the vocabulary and symbols used in social statistics
how to measure variables and test relationships at different levels
the basics of descriptive and inferential statistics
to become critical consumers of statistical information
about global systems and issues and associated dimensions (e.g., historical, political, economic, social, cultural, environmental, technological)
to analyze global interrelationships and interdependencies across place and time
to formally report findings from statistical analyses

REQUIRED MATERIALS

For this project, you will need the following items:

Your NOTES from lessons 9, 10, and 11.
A word processing program to type up your final REPORT (e.g., Microsoft Word)
The SPSS data analysis program
DATA FILE: Project 4 STA2122 GSS Data Spring 2021.sav

PROJECT INSTRUCTIONS

NOTE: The variables you choose to examine in this project cannot be related to those you presented in Projects #1 through #3. This project requires the use of a new and unique set of data.

Bivariate analyses assess the relationship between two variables (e.g., variable 1: test scores and variable 2: number of hours spent studying). Here, you will be using data from the General Social Survey (GSS) to examine the values of adults in the United States. First, you will perform a cross-tabulation of two ordinal/nominal variables. The second test, an ANOVA, will have you examine a causal relationship between an ordinal independent variable and an interval-ratio dependent variable. And for your third test, you will examine the relationship between two interval-ratio variables.

Project Organization:

PART ONE: CROSS-TABULATION and CHI-SQUARE (page 3)

Introduction and descriptive statistics (10 points)
Hypothesis statements (5 points)
Tables (2.5 points)
Annotated test statistics (5 points)
Hypothesis decision (5 points)
Statement of test strength (2.5 points)
Reflection (5 points)

PART TWO: ANALYSIS OF VARIANCE (ANOVA) (page 4)

Introduction and descriptive statistics (10 points)
Hypothesis statements (5 points)
Annotated test statistics (5 points)
Hypothesis decision (5 points)
Examination of multiple relationships (5 points)
Reflection (5 points)

PART THREE: REGRESSION AND CORRELATION (page 5)

Introduction and descriptive statistics (10 points)
Statement and explanation of regression equation (5 points)
Statement of determination (5 points)
Statement of correlation (5 points)
Reflection (5 points)

PART ONE: CROSS-TABULATION and CHI-SQUARE

For part one, I’d like you to perform and report a cross-tabulation between two variables, test a hypothesis with a chi-square, and reflect on that relationship. You will need to select the following types of variables for this analysis (see the codebook on pg. 8 for a list of variables): • one nominal or ordinal independent variable, and • one nominal or ordinal dependent variable.

Introduction and descriptive statistics. This section should be designed to introduce the reader to your analysis.
1. First, describe your two variables, their categories, and levels of measurement.
2. Second, provide us with the following descriptive statistics for each variable: the number of observations (n), the best measure of central tendency, the best measure of variability. HINT: Use SPSS to generate the descriptive statistics about

your variables -- you can refer back to instructions from project 2 for how to do this.

1. Third, critically reflect on these statistics and consider the following questions: What do your descriptive statistics say about the population of the US? Why would your independent variable influence your dependent variable?
Hypothesis statement. For this test, write a null and one-tailed research hypothesis using appropriate statistical language.
Tables. Perform a cross-tabulation of your variables in SPSS. Make sure that you have SPSS produce column percentages. In Word or Excel, produce your own crosstabulation table with column percentages AND row/column totals table using the information gathered from SPSS. Include a title.
1. For instructions on how to perform a chi-square analysis, please refer to page 10.
2. HINT: Your independent variable should be reported as the column variable and your dependent variable should be the row variable.
Annotated test statistics. Report and describe (in a list) the obtained Pearson chi-square statistic, the degrees of freedom, the P-value, and the chi-square critical value.
Hypothesis decision. Using statistical language, state your decision regarding your null hypothesis in one full sentence. Assume your α (alpha) is set at a 0.05 level.
Statement of test strength. Select and report the most appropriate measure of association for your variables. You can report lambda, Cramer’s V, gamma, Kendall’s tau-b or Kendall’s tau-c, depending on the nature of your variables. What does this score indicate about your test?
Reflection. In three to five sentences, explain your findings making reference to your cross-tabulation, chi-square test, and measure of association. Make sure you use appropriate statistical language and describe what the sample statistics indicate about

the US population. You are not required to use resources, but they will strengthen your arguments.

PART TWO: ANALYSIS OF VARIANCE (ANOVA)

For part two, you will be asked to conduct a hypothesis test using an ANOVA and Tukey’s HSD (Tukey’s HSD is only covered in the class memo). You will need to select the following types of variables for this analysis (see the codebook on pg. 8 for a list of variables): • one nominal or ordinal independent variable, and

• one interval-ratio dependent variable.

Introduction and descriptive statistics. This section is designed to introduce the reader to your analysis.
1. First, describe your two variables, their categories, and levels of measurement.
2. Second, provide us with the following descriptive statistics for each variable: the number of observations (n), the best measure of central tendency, the best measure of variability. HINT: Use SPSS to generate the descriptive statistics about

your variables -- you can refer back to instructions from project 2 for how to do this.

1. Third, critically reflect on these statistics and consider the following questions: What do your descriptive statistics say about the population of the US? Why would your independent variable influence your dependent variable?
Hypothesis statement. For this test, write a null and one-tailed research hypothesis using symbols AND using appropriate statistical language.
Annotated test statistics. List and describe the SSB, SSW, dfb, dfw, MSb, MSw, the obtained F-statistic, the P-value, and the F-critical value. For instructions on how to

perform an ANOVA in SPSS, please refer to page 10.

Hypothesis decision. Using statistical language, state your decision regarding your null hypothesis in one full sentence. Assume your α (alpha) is set at a 0.05 level.
Examination of multiple relationships. Examine the table created from Tukey’s HSD. In a list, report all statistically significant differences. Report the mean difference and the Pvalue for each statistically significant difference. HINT: If there are none, you must

clearly state there are no statistically significant differences between groups in order to receive credit for this portion of the assignment.

Reflection. In three to five sentences, explain your findings making reference to your ANOVA and Tukey’s HSD. Make sure you use appropriate statistical language and describe what the sample statistics indicate about the US population. You are not required to use resources, but they will strengthen your arguments.

PART THREE: REGRESSION AND CORRELATION

For part three, you will be asked to perform a regression and correlation. You will need to select the following types of variables for this analysis (see the codebook on pg. 8 for a list of variables):

• one interval-ratio independent variable, and • one interval-ratio dependent variable.

Introduction and descriptive statistics. This section is designed to introduce the reader to your analysis.
1. First, describe your two variables, their categories, and levels of measurement.
2. Second, provide us with the following descriptive statistics for each variable: the number of observations (n), the best measure of central tendency, the best measure of variability. HINT: Use SPSS to generate the descriptive statistics about

your variables -- you can refer back to instructions from project 2 for how to do this.

1. Third, critically reflect on these statistics and consider the following questions: What do your descriptive statistics say about the population of the US? Why would your independent variable influence your dependent variable?
Statement and explanation of regression equation. Using SPSS, conduct the regression and write-out the complete regression equation. Using appropriate statistical language, explain what the regression equation means in two to five sentences.
Statement of determination. Report and label the coefficient of determination (r²). Using appropriate statistical language, interpret the coefficient of determination.
Statement of correlation. Report and label the correlation coefficient (Pearson’s r). Using appropriate statistical language, describe both the strength and direction of the relationship in one to two sentences.
Reflection. In three to five sentences, explain your findings making reference to your regression line and statements of determination and correlation. Make sure you use appropriate statistical language and describe what the sample statistics indicate about the US population. You are not required to use resources, but they will strengthen your arguments.

PROJECT 4 RUBRIC

PART ONE: CROSS-TABULATION and CHI -SQUARE

1. Introduction and descriptive statistics (10 points)

Exceeds expectations. Two appropriate variables are selected and fully described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. These statistics are described in a way that demonstrates mastery of descriptive statistics. Includes critical discussion of how variables represent the US population. Also includes description of the causal relationship between variables.
Meets expectations. Two appropriate variables are selected and described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. Includes some discussion of how variables represent the US population. Also includes description of the causal relationship between variables.
Approaches/below expectations. Two inappropriate variables are selected may not be described. Required statistics incomplete or missing. Includes some or no discussion of how variables represent the US population. May include description of the causal relationship between variables – relationship may be implausible.

2. Hypothesis statements (5 points)

FULL CREDIT: Null and research hypotheses are logical and clearly labeled. Statistical language is used.
HALF CREDIT: Null and research hypotheses contain muddled language or may confuse a symbol or word.
NO CREDIT: Erroneous or incorrect.

3. Tables (2.5 points)

FULL CREDIT: Cross-tabulation created using Word or Excel. Column percentages reported. Total row and total column reported. Title included.
HALF CREDIT: Table may be missing a component. Column/row inappropriate.
NO CREDIT: Table is entirely inaccurate, missing, or copied

4. Annotated test statistics (5 points)

FULL CREDIT: Statistics reported and adequately explained.
HALF CREDIT: Statistics reported and inadequately explained.
NO CREDIT: Missing, incorrect, or mislabeled statistics.

5. Hypothesis decision (5 points)

FULL CREDIT: Correct decision stated with respect to the null hypothesis.
HALF CREDIT: Logical decision stated with respect to research hypothesis.
NO CREDIT: Missing/inappropriate.

6. Statement of test strength (2.5 points)

FULL CREDIT: Reported at least one applicable measure of association for variables. OR, if appropriate, one was not reported. In this case, you must state one you would have reported if it were appropriate.
NO CREDIT: Reports inappropriate measure of association or none at all.

7. Reflection (5 points)

Exceeds expectations. Above-and-beyond effort. Response demonstrates mastery of chi-square test. Excellent explanation of accurate results in line with the information provided.
Meets expectations. Typical effort. Response demonstrates adequate understanding of chi-square test.

Appropriate explanation of the results in line with the information provided.

Approaches/below expectations. Below-average effort. Response demonstrates incomplete understanding of chi-square test. The explanation demonstrates significant misunderstandings of the results.

PART TWO: ANALYSIS OF VARIANCE (ANOVA)

8. Introduction and descriptive statistics (10 points)

• Exceeds expectations. Two appropriate variables are selected and fully described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. These statistics are described in a way that demonstrates mastery of descriptive statistics. Includes critical discussion of how variables represent the US population. Also includes description of the causal relationship between variables.

Meets expectations. Two appropriate variables are selected and described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. Includes some discussion of how variables represent the US population. Also includes description of the causal relationship between variables.
Approaches/below expectations. Two inappropriate variables are selected may not be described. Required statistics incomplete or missing. Includes some or no discussion of how variables represent the US population. May include description of the causal relationship between variables – relationship may be implausible.

9. Hypothesis statements (5 points)

FULL CREDIT: Null and research hypotheses are logical and clearly labeled. Statistical language is used.
HALF CREDIT: Null and research hypotheses contain muddled language or may confuse a symbol or word.
NO CREDIT: Erroneous or incorrect.

10. Annotated test statistics (5 points)

FULL CREDIT: Statistics reported and adequately explained.
HALF CREDIT: Statistics reported and inadequately explained.
NO CREDIT: Missing, incorrect, or mislabeled statistics.

11. Hypothesis decision (5 points)

FULL CREDIT: Correct decision stated with respect to the null hypothesis.
HALF CREDIT: Logical decision stated with respect to research hypothesis.
NO CREDIT: Missing/inappropriate.

12. Examination of multiple relationships (5 points)

FULL CREDIT: All groups with statistically significant mean differences and their corresponding P-values have been reported and labeled. If none of the groups significantly differed, this is stated.
HALF CREDIT: All statistically significant differences have been reported, but issues with labeling OR pvalues were omitted.

13. Reflection (5 points)

Exceeds expectations. Above-and-beyond effort. Response demonstrates mastery of ANOVA. Excellent explanation of accurate results in line with the information provided. Connects results to population.
Meets expectations. Typical effort. Response demonstrates adequate understanding of ANOVA. Appropriate explanation of the results in line with the information provided.
Approaches/below expectations. Below-average effort. Response demonstrates incomplete understanding of ANOVA. The explanation demonstrates significant misunderstandings of the results.

PART THREE: REGRESSION AND CORRELATION

14. Introduction and descriptive statistics (10 points)

Exceeds expectations. Two appropriate variables are selected and fully described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. These statistics are described in a way that demonstrates mastery of descriptive statistics. Includes critical discussion of how variables represent the US population. Also includes description of the causal relationship between variables.
Meets expectations. Two appropriate variables are selected and described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. Includes some discussion of how variables represent the US population. Also includes description of the causal relationship between variables.
Approaches/below expectations. Two inappropriate variables are selected may not be described. Required statistics incomplete or missing. Includes some or no discussion of how variables represent the US population. May include description of the causal relationship between variables – relationship may be implausible.

15. Statement and explanation of regression equation (5 points)

FULL CREDIT: The regression equation is effectively explained in two to five sentences.
HALF CREDIT: Regression equation explained superficially or using muddled language.

16. Statement of determination (5 points)

FULL CREDIT: Interpretation of the coefficient of determination is effective and correct.
HALF CREDIT: Interpretation, though correct, does not demonstrate statistical literacy.
NO CREDIT: The interpretation is not effective and does not use appropriate statistical language.

17. Statement of correlation (5 points)

FULL CREDIT: Interpretation of the correlation coefficient is effective and correct.
HALF CREDIT: Interpretation, though correct, does not demonstrate statistical literacy.
NO CREDIT: The interpretation is not effective and does not use appropriate statistical language.

13. Reflection (5 points)

Exceeds expectations. Above-and-beyond effort. Response demonstrates mastery of OLS. Excellent explanation of accurate results in line with the information provided. Connects results to population.
Meets expectations. Typical effort. Response demonstrates adequate understanding of OLS. Appropriate explanation of the results in line with the information provided.
Approaches/below expectations. Below-average effort. Response demonstrates incomplete understanding of OLS. The explanation demonstrates significant misunderstandings of the results.

APPENDIX A: VARIABLE CODEBOOK

A codebook will contain all possible variables in a given data set. Below is a list containing a subset of questions (i.e., variables) appearing in the most recent wave GENERAL SOCIAL SURVEY (GSS) from 2018. The data has been cleaned (e.g., many cases omitted from the “original data”) and the response categories edited for ease of use. For more on GSS, please visit: http://gss.norc.org/

Sample: n=698 Randomly sampled people who live in America and answered all items from the GSS in 2018.

Code	Variable Description	Response Categories
marital	Respondent marital status	1=Married; 2=Widowed; 3=Divorced; 4=Separated; 5=Never Married (Nominal)
vote16	Respondent voted in 2016 Election	1=yes; 2=no; 3=ineligible (Nominal)
partyid	Respondent political spectrum ID	0=Strong Democrat… 3=Independent… 6=Strong Republican (Ordinal or Interval-ratio)
happy	Respondent general happiness	1=very happy; 2=pretty happy; 3=not too happy (Ordinal)
satfin	Respondent satisfaction with financial situation	1=pretty satisfied; 2=more or less; 3=not at all (Ordinal)
tvhours	Respondent hours per day watching TV	0,1,2,3,4,5…24 (Interval-ratio)
wwwhr	Respondent number of hours respondent spends on internet per week	0,1,2,3,4,5…168 (Interval-ratio)
age	Age of respondent	18-99 (Interval -ratio)
educ	Respondent highest year of education	0,1,2,3,4,5…20 (Interval-ratio)
abany	A woman should be able to get an abortion for any reason.	1=yes; 2=no; 3=don’t know/no answer (Nominal)

Questions as they appear on the interview instrument:

[marital] Are you currently--married, widowed, divorced, separated, or have you never been married?

[vote16] In 2016, you remember that Clinton ran for President on the Democratic ticket against Trump for the Republicans. Do you remember for sure whether or not you voted in that election?

[partyid] Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or what?

[happy] Taken all together, how would you say things are these days--would you say that you are very happy, pretty happy, or not too happy?

[satfin] We are interested in how people are getting along financially these days. So far as you and your family are concerned, would you say that you are pretty well satisfied with your present financial situation, more or less satisfied, or not satisfied at all?

[tvhours] On the average how many hours of TV do you watch per day?

[wwwhr] Not counting e-mail, about how many minutes or hours per week do you use the

Web? (Include time you spend visiting regular web sites and time spent using interactive Internet services like chat rooms, Usenet groups, discussion forums, bulletin boards, and the like.)

[age] Age of respondent at time of survey.

[educ] Highest year of education. Note: 1-12 = grades in school, 13-20 = indicate years spent in college.

[abany] Please tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if. . . The woman wants it for any reason?

[eqwlth] Some people think that the government in Washington ought to reduce the income differences between the rich and the poor, perhaps by raising the taxes of wealthy families or by giving income assistance to the poor. Others think that the government should not concern itself with reducing this income difference between the rich and the poor. Here is a card with a scale from 1 to 7. Think of a score of 1 as meaning that the government ought to reduce the income differences between rich and poor, and a score of 7 meaning that the government should not concern itself with reducing income differences. What score between 1 and 7 comes closest to the way you feel?

APPENDIX B: ANALYSIS INSTRUCTIONS

Cross-Tabulation and Chi-Square: Background 10

Chi-Square and Measures of Association for Two Ordinal Variables 15

Regression and Correlation 18

ANALYSIS OF VARIANCE (ANOVA) 20

Cross-Tabulation and Chi-Square: Background

Once your data are open, click Analyze, then Descriptive Statistics, and finally Crosstabs to create your cross-tabulation and analyze your data using chi-square (figure 1). This will open

the following dialogue (figure 2):

Next, select the variables you will use to create your crosstab. You should select your

independent variable to be your

column variable

, and the dependent variable to b

e your

row

variable. Select the variable and then click the arrow next to the appropriate column or row box to move it to the appropriate section.

In this example, I selected SEXORNT as my column variable. SEXORNT is a nominal level variable created from the question, “Which of the following best describes you?” with the following categories: “gay, lesbian, or homosexual,” “bisexual,” and “heterosexual or straight.” Respondents who reported “don’t know,” “refused,” or “not applicable” were coded as missing.

For my dependent variable, I selected marhomo as my row variable. marhomo is an ordinal level variable created from responses to the statement, “homosexuals should have the right to marry.” Respondents reported the following valid responses: “strongly agree,” “agree,”

“neither agree nor disagree,” “disagree,” and “strongly disagree.” Respondents who reported “cannot choose” or “not applicable” were coded as missing.

Figure 1

I will test the following hypotheses. The alpha has been set at 0.05.

H₀: Sexual orientation and attitudes toward same sex marriage are statistically independent.
H₁: Sexual orientation and attitudes toward same sex marriage are statistically dependent.

Next, you will need to click on the box labeled Statistics to tell SPSS which statistics you will calculate for your cross-tabulation. The following dialogue will appear:

Figure 2

Next, you will click on the checkbox next to each of the statistics you will calculate. At bare minimum, you will select the box for chi-square.

You will also select a check-box to calculate an appropriate measure of association to test the strength of the relationship between your two variables. For our purposes, I will check the boxes for lambda and phi and Cramer’s V. Later in the guide, I will show you an example using two ordinal level variables so you can effectively read the output.

Figure 3

Click Continue and you will return to this screen (figure 6). Click Cells. The following cell display dialogue will appear (figure 7).

Figure

& Figure

By default, only

Observed

should be checked. This will report the number of people who fall

into each cell. In addition to observed, you should also check the box for column percentages (Figure 8). When you create your own table in Word or Excel, I only want to see the column percentages in your new table. Click Continue. This will return you to the crosstabs screen (figure 9).

Figure

Now you are ready to click OK. This will produce the output.

The output screen will show several boxes to you. Let’s go through them one by one.

Output

The first box simply shows you how many valid and missing cases you have. For your purposes, the Valid percent should be 100%, and missing should be 0%.

Figure 8

The next box shows you the cross-tabulation. We’ll go through it one by one.

Figure 9

First, just look at the table and make sure your data are displayed appropriately. Ours are good! Notice the cells:

Figure 10

Just from looking at the cells, it looks like people who identify as gay, lesbian, or homosexual seem more likely than bisexuals or heterosexuals to agree that LGBT people should be able to marry. Bisexuals seem more divided, but lean more heavily on agreeing that they should be allow to marry. Heterosexuals seem even more divided: about half agreeing, 11.8% do not agree or disagree, and more than a third disagree.

Take a look at the totals:

Figure 11

By now you should also notice that the sample is disproportionately heterosexual. If we only looked at the raw totals, we would not have been able to infer much about the relationship between sexual orientation and attitudes toward same sex marriage. You will need to replicate this table on your own, using either Excel or Word to create the table for you. The next box shows you the chi-square statistics. We are interested in the information contained in the row labeled Pearson Chi-Square.

Figure

Our obtained chi

square statistic is 22.355. We have eight degrees of freedom. The P

value

(labeled asymp. sig. (2-sided) is 0.004. With this information, we can reject the null hypothesis. Sexual orientation and attitudes toward same sex marriage are statistically dependent. The next box shows us the directional measures. These are the first set of measures of association we calculated. In this box, we can find lambda. Since our dependent variable is marhomo, we need to refer to that section of the table.

Figure

Lambda is zero! Why is that?

lambda will always be zero when the mode for

each category

of the independent

variable falls into the same category of the independent variable – even if other

measures of association tell us that the two variables actually are related. If the two

variables seem related, based on the chi-square statistic or observations of the differences in percentages, we need to try a different measure of association to measure the strength of the relationship.

Lambda is not an adequate measure of association for our relationship. Here, we need to look at the row labeled Cramer’s V. So let’s take a look at the final table in our output:

Figure 16 & Figure 17

Do not worry about information in the column labeled approx. sig. Our Cramer’s V is 0.10. Interpret appropriately. J

Chi-Square and Measures of Association for Two Ordinal Variables

If you select two ordinal level variables to complete this portion of the assignment, you will need to select a different measure of association. For this next test, I will continue to use marhomo as the dependent variable. The variable fund16 is the independent variable. Fund16 is an ordinal level variable that reports the “fundamentalism/liberalism of religion [the] respondent was raised in.” The categories “fundamentalist” (1) means the religion was categorized as a very conservative denomination. “Moderate” (2) means the religion was categorized as being a more moderate religion. “Liberal” (3) means the religion was considered to be a liberal religious group. We can think of this as a continuum. Higher scores indicate more liberal religious beliefs at 16; lower scores indicate more conservative religious beliefs.

We will test the following hypotheses:

H₀: Religious fundamentalism at 16 and attitudes toward same sex marriage are statistically independent.
H₁: Religious fundamentalism at 16 and attitudes toward same sex marriage are statistically dependent.

Figure 18

We set up our chi-square test similarly to the previous example, until we get to the statistics box. We will still select a chi-square test to test the significance of the relationship, but with respect to the association, we need to select gamma and either Kendall’s tau-b or Kendall’s tau-c. Since there are five categories on our dependent variable, and only three for our column variable, we should select Kendall’s tau-c in addition to gamma. This is because we use Kendall’s tau-c when the cross-tabulation is a rectangle. If we had the same number of categories on both variables, we would use Kendall’s tau-b.

Figure 19

Output

Let’s take a look at the cross-tabulation:

Figure 20

It seems that something may be going on here. A little over one-third of people raised in fundamentalist religions agree that same-sex couples should be allowed to marry. Twelve and a half percent do not agree or disagree, while over half disagree with the statement that samesex couples should be allowed to marry. Over one-half of those raised in moderate and liberal denominations agreed that same-sex couples should be allowed to marry. So there might be something going on here. Keep in mind that on the fund16 variable, lower scores = more

conservative beliefs; higher scores = more liberal beliefs. On our marhomo variable, lower scores = agreement that same-sex couples should be allowed to marry and higher scores = disagreement that same-sex couples should be allowed to marry. With this in mind, and with

the evidence presented above, it seems that if the two variables are statistically dependent, we are likely to have a negative relationship. As the independent variable score increases, the dependent variable score decreases.

The chi-square test suggests that we should reject our null hypothesis. Religious

fundamentalism and attitudes toward same-sex marriage are statistically dependent. Now we can examine the box labeled symmetric measures. Here we can find our gamma and the Kendall’s tau-c.

Figure

We are only interested in the information contained in the

value

column. Our Kendall’s tau

c is

-0.143 and our gamma is -0.193. Interpret appropriately. J

Regression and Correlation

To begin, click Analyze, then Regression, and finally linear to complete the regression and correlation portion of Project 4. The following linear regression dialogue will appear:

Figure

You will need to select two appropriate variables to

obtain your regression line equation.

Remember, your independent variable is the one you think will predict a change in the dependent variable.

In this example, weekswrk is an interval-ratio level variable reporting the number of weeks a respondent worked last year. It is the independent variable. VISLIB is an interval-ratio level variable reporting the number of times a respondent visited a public library last year.

Figure 25

All we need to do to get the information to report our linear regression equation, the correlation coefficient, and the coefficient of determination, is click OK. Your output will appear! I’ll go through each box one by one.

Figure 26

The first box just shows the variables in the equation and the method used to enter them (you don’t need to worry about this). Double check that the variables entered matches your independent variable and the dependent variable below the box matches what you intended to do. We’re all good!

The Model Summary box reports Pearson’s correlation coefficient (R) and the coefficient of determination (R²).

Figure 27

Hold off on interpreting the correlation coefficient for now. We can also see that our regression equation demonstrates that only 0.1% of the variation in visits to the library last year is explained by how many weeks worked last year.

The next box shows us ANOVA. ANOVA and regression are related! For this portion of the assignment, do not worry about this box. This box shows us if there is a significant relationship in our regression (we did not cover this in our class – we can conduct hypothesis tests with regression, too!). In short: there isn’t a statistically significant relationship.

Figure 28

The final box, labeled coefficients, contains the information you need to report your regression line equation.

Figure 29

The information contained under the column labeled B shows us both the slope (b) and the yintercept (a).

The row labeled (Constant) shows us the constant of the regression line. This is different language than you are already familiar. The number in the cell where B and (Constant) meet is the y-intercept.

The row labeled WEEKS R WORKED LAST YEAR shows us various statistics relevant to how the independent variable is related to the dependent variable. In the cell where B and WEEKS R WORKED LAST YEAR meet, we are shown the slope of the line (b). For each unit increase in the number of weeks someone worked last year, we expect a decrease in the number of library visits last year of 0.019.

Let’s return to the correlation box I showed you earlier.

Figure 30

The correlation coefficient presented here is only going to show a positive figure (this is what you should expect; the explanation is beyond the scope of this class). However, based on the slope of our regression line, we know that we actually have a negative relationship – as X increases, Y decreases. When you report your correlation coefficient, make sure you report the appropriate sign. In this case, we know we have a very weak negative relationship (R = 0.036).

Just to confirm, you can but do not have to calculate the bivariate correlation (click Analyze, then Correlation, and then Bivariate Correlation). Take a look at what we find:

Figure 31

There it is! We can see there is a negative correlation of -0.36!

ANALYSIS OF VARIANCE (ANOVA)

To begin our ANOVA, you will first click Analyze, then Compare Means, and finally One-way ANOVA. The following One-way ANOVA dialogue will appear:

Figure

Under

dependent list,

you will select your dependent variable. For our purposes, I’ll use our

weekswrk variable from earlier. Under factor, you will select your grouping variable. I’m selecting marital, which is a nominal level variable reporting respondents’ marital status: married, widowed, divorced, separated, or never married. This is the variable you will use to see if there are differences in the mean number of weeks worked last year by marital status.

I’ll test the following hypotheses:

H₀: There are no differences in mean weeks worked last year by marital status.
H₁: There is at least one difference in the mean number of weeks worked last year by marital status.

Figure 34

From here, click on the box that says Post Hoc. Your book doesn’t discuss post-hoc tests, but they are very useful in figuring out which groups differ and which do not. The following Post Hoc Multiple Comparisons dialogue will appear:

Figure

There are lots of different tests we can use to see which groups differ. Click the box labeled

simply

Tukey.

Then click

Continue.

This will bring you back to the

One

Way ANOVA

dialogue.

Click OK.

Figure 37

Now your output will appear. We’ll go through it one by one. The first box shows you the sum of squares, mean squares, degrees of freedom, the obtained F-statistic, and the p-value.

Everything you need and are already comfortable with calculating by hand. J

Figure 38

In Project 4, I’ve asked you to report df_b, df_w, MSB, MSW, the obtained F-statistic, and the pvalue. It’s all right here! We can reject the null hypothesis because our ANOVA demonstrates there is at least one difference in mean number of weeks worked last year. If you did not find a significant relationship, you’re pretty much done. If you did find a significant relationship, you need to take a look at the next bit of output. The Post Hoc Tests output comes next.

The first column shows us all five relationship status categories. Notice that it is labeled (I) MARITAL STATUS. The next column shows us each of the other four relationship status categories, relative to the category reported in the first column. It is labeled (J) MARITAL STATUS. The third column labeled Mean Differences (I-J) shows us the mean difference in weeks worked last year, subtracting the mean number of weeks worked for the first category (labeled I) from the mean number of weeks worked for the second category (labeled J). Let’s examine the first row:

First, we are looking at the mean difference in the number of weeks worked last year between married respondents and widowed respondents. The value reported in the first cell under the Mean Difference (I-J) column is calculated using the following formula:

??????? ??????? ?????????? ????? ??????

In this equation, I = Married and J = Widowed. The mean difference is 23.317 weeks. This means that married respondents worked an average of 23.317 more weeks than widowed respondents did last year. Now take a look at the value under the sig. column. This is the P-value! The P-value is 0.000. This means that at the alpha = 0.05 level, we can confirm that married respondents worked significantly more weeks last year than widowed respondents. You need to do this for each row. Notice there is repeated information in the table. Let’s look at the row where I = WIDOWED and J = MARRIED.

From this, we can see that widowed respondents worked significantly fewer weeks last year than married respondents did. They worked 23.317 weeks fewer, on average. Hey! That’s the reciprocal value! J

Let’s identify each set that significantly differed:

Notice anything? Widowed respondents worked significantly fewer weeks last year than married, divorced, separated, and never married respondents. Perhaps even more interesting, the only significant differences involved widowed respondents. What might explain this? It is

probable that most of the widowed respondents are elderly, and thus at retirement age already. Make sure you report all significant differences appropriately. J

pur-new-sol

Social Science Statistics STA2122

Statistics

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions

menu

Social Science Statistics STA2122

Statistics

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions