Fill This Form To Receive Instant Help
Homework answers / question archive / Social Science Statistics STA2122
Social Science Statistics STA2122.501 ? ONLINE |
University of South Florida Instructor: Dr. Erica L.Toothman Email: USFstats@gmail.com |
Project Report 4: Exploring national values through bivariate analyses
Contents:
For this project, you will be asked to detect, and critically reflect upon, patterns in social data using a representative sample of United States residents (i.e., General Social Survey). First, you will perform a cross-tabulation of two ordinal/nominal variables. The second test, an ANOVA, will have you examine a causal relationship between an ordinal independent variable
and an interval-ratio dependent variable. And for your third test, you will examine the relationship between two interval-ratio variables. This project covers learning goals and objectives #1, 2, 3, and 4 as listed in the syllabus (and below).
REQUIRED MATERIALS
For this project, you will need the following items:
PROJECT INSTRUCTIONS
NOTE: The variables you choose to examine in this project cannot be related to those you presented in Projects #1 through #3. This project requires the use of a new and unique set of data.
Bivariate analyses assess the relationship between two variables (e.g., variable 1: test scores and variable 2: number of hours spent studying). Here, you will be using data from the General Social Survey (GSS) to examine the values of adults in the United States. First, you will perform a cross-tabulation of two ordinal/nominal variables. The second test, an ANOVA, will have you examine a causal relationship between an ordinal independent variable and an interval-ratio dependent variable. And for your third test, you will examine the relationship between two interval-ratio variables.
Project Organization:
PART ONE: CROSS-TABULATION and CHI-SQUARE (page 3)
PART TWO: ANALYSIS OF VARIANCE (ANOVA) (page 4)
PART THREE: REGRESSION AND CORRELATION (page 5)
PART ONE: CROSS-TABULATION and CHI-SQUARE
For part one, I’d like you to perform and report a cross-tabulation between two variables, test a hypothesis with a chi-square, and reflect on that relationship. You will need to select the following types of variables for this analysis (see the codebook on pg. 8 for a list of variables): • one nominal or ordinal independent variable, and • one nominal or ordinal dependent variable.
your variables -- you can refer back to instructions from project 2 for how to do this.
the US population. You are not required to use resources, but they will strengthen your arguments.
PART TWO: ANALYSIS OF VARIANCE (ANOVA)
For part two, you will be asked to conduct a hypothesis test using an ANOVA and Tukey’s HSD (Tukey’s HSD is only covered in the class memo). You will need to select the following types of variables for this analysis (see the codebook on pg. 8 for a list of variables): • one nominal or ordinal independent variable, and
• one interval-ratio dependent variable.
your variables -- you can refer back to instructions from project 2 for how to do this.
perform an ANOVA in SPSS, please refer to page 10.
clearly state there are no statistically significant differences between groups in order to receive credit for this portion of the assignment.
PART THREE: REGRESSION AND CORRELATION
For part three, you will be asked to perform a regression and correlation. You will need to select the following types of variables for this analysis (see the codebook on pg. 8 for a list of variables):
• one interval-ratio independent variable, and • one interval-ratio dependent variable.
your variables -- you can refer back to instructions from project 2 for how to do this.
PROJECT 4 RUBRIC
PART ONE: CROSS-TABULATION and CHI -SQUARE |
1. Introduction and descriptive statistics (10 points)
|
2. Hypothesis statements (5 points)
|
3. Tables (2.5 points)
|
4. Annotated test statistics (5 points)
|
5. Hypothesis decision (5 points)
|
6. Statement of test strength (2.5 points)
|
7. Reflection (5 points)
Appropriate explanation of the results in line with the information provided.
|
PART TWO: ANALYSIS OF VARIANCE (ANOVA) |
8. Introduction and descriptive statistics (10 points) • Exceeds expectations. Two appropriate variables are selected and fully described. All required statistics (i.e., sample size, central tendency, variability) are reported for each variable. These statistics are described in a way that demonstrates mastery of descriptive statistics. Includes critical discussion of how variables represent the US population. Also includes description of the causal relationship between variables. |
|
9. Hypothesis statements (5 points)
|
10. Annotated test statistics (5 points)
|
11. Hypothesis decision (5 points)
|
12. Examination of multiple relationships (5 points)
|
13. Reflection (5 points)
|
PART THREE: REGRESSION AND CORRELATION |
14. Introduction and descriptive statistics (10 points)
|
15. Statement and explanation of regression equation (5 points)
|
16. Statement of determination (5 points)
|
17. Statement of correlation (5 points)
|
13. Reflection (5 points)
|
APPENDIX A: VARIABLE CODEBOOK
A codebook will contain all possible variables in a given data set. Below is a list containing a subset of questions (i.e., variables) appearing in the most recent wave GENERAL SOCIAL SURVEY (GSS) from 2018. The data has been cleaned (e.g., many cases omitted from the “original data”) and the response categories edited for ease of use. For more on GSS, please visit: http://gss.norc.org/
Sample: n=698 Randomly sampled people who live in America and answered all items from the GSS in 2018.
Code |
Variable Description |
Response Categories |
marital |
Respondent marital status |
1=Married; 2=Widowed; 3=Divorced; 4=Separated; 5=Never Married (Nominal) |
vote16 |
Respondent voted in 2016 Election |
1=yes; 2=no; 3=ineligible (Nominal) |
partyid |
Respondent political spectrum ID |
0=Strong Democrat… 3=Independent… 6=Strong Republican (Ordinal or Interval-ratio) |
happy |
Respondent general happiness |
1=very happy; 2=pretty happy; 3=not too happy (Ordinal) |
satfin |
Respondent satisfaction with financial situation |
1=pretty satisfied; 2=more or less; 3=not at all (Ordinal) |
tvhours |
Respondent hours per day watching TV |
0,1,2,3,4,5…24 (Interval-ratio) |
wwwhr |
Respondent number of hours respondent spends on internet per week |
0,1,2,3,4,5…168 (Interval-ratio) |
age |
Age of respondent |
18-99 (Interval -ratio) |
educ |
Respondent highest year of education |
0,1,2,3,4,5…20 (Interval-ratio) |
abany |
A woman should be able to get an abortion for any reason. |
1=yes; 2=no; 3=don’t know/no answer (Nominal) |
Questions as they appear on the interview instrument:
[marital] Are you currently--married, widowed, divorced, separated, or have you never been married?
[vote16] In 2016, you remember that Clinton ran for President on the Democratic ticket against Trump for the Republicans. Do you remember for sure whether or not you voted in that election?
[partyid] Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or what?
[happy] Taken all together, how would you say things are these days--would you say that you are very happy, pretty happy, or not too happy?
[satfin] We are interested in how people are getting along financially these days. So far as you and your family are concerned, would you say that you are pretty well satisfied with your present financial situation, more or less satisfied, or not satisfied at all?
[tvhours] On the average how many hours of TV do you watch per day?
[wwwhr] Not counting e-mail, about how many minutes or hours per week do you use the
Web? (Include time you spend visiting regular web sites and time spent using interactive Internet services like chat rooms, Usenet groups, discussion forums, bulletin boards, and the like.)
[age] Age of respondent at time of survey.
[educ] Highest year of education. Note: 1-12 = grades in school, 13-20 = indicate years spent in college.
[abany] Please tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if. . . The woman wants it for any reason?
[eqwlth] Some people think that the government in Washington ought to reduce the income differences between the rich and the poor, perhaps by raising the taxes of wealthy families or by giving income assistance to the poor. Others think that the government should not concern itself with reducing this income difference between the rich and the poor. Here is a card with a scale from 1 to 7. Think of a score of 1 as meaning that the government ought to reduce the income differences between rich and poor, and a score of 7 meaning that the government should not concern itself with reducing income differences. What score between 1 and 7 comes closest to the way you feel?
APPENDIX B: ANALYSIS INSTRUCTIONS
Cross-Tabulation and Chi-Square: Background 10
Chi-Square and Measures of Association for Two Ordinal Variables 15
ANALYSIS OF VARIANCE (ANOVA) 20
Cross-Tabulation and Chi-Square: Background
Once your data are open, click Analyze, then Descriptive Statistics, and finally Crosstabs to create your cross-tabulation and analyze your data using chi-square (figure 1). This will open
the following dialogue (figure 2): |
|
|
|
Next, select the variables you will use to create your crosstab. You should select your |
independent variable to be your |
column variable |
, and the dependent variable to b |
e your |
row |
variable. Select the variable and then click the arrow next to the appropriate column or row box to move it to the appropriate section.
In this example, I selected SEXORNT as my column variable. SEXORNT is a nominal level variable created from the question, “Which of the following best describes you?” with the following categories: “gay, lesbian, or homosexual,” “bisexual,” and “heterosexual or straight.” Respondents who reported “don’t know,” “refused,” or “not applicable” were coded as missing.
For my dependent variable, I selected marhomo as my row variable. marhomo is an ordinal level variable created from responses to the statement, “homosexuals should have the right to marry.” Respondents reported the following valid responses: “strongly agree,” “agree,”
“neither agree nor disagree,” “disagree,” and “strongly disagree.” Respondents who reported “cannot choose” or “not applicable” were coded as missing.
Figure 1
I will test the following hypotheses. The alpha has been set at 0.05.
Next, you will need to click on the box labeled Statistics to tell SPSS which statistics you will calculate for your cross-tabulation. The following dialogue will appear:
Figure 2
Next, you will click on the checkbox next to each of the statistics you will calculate. At bare minimum, you will select the box for chi-square.
You will also select a check-box to calculate an appropriate measure of association to test the strength of the relationship between your two variables. For our purposes, I will check the boxes for lambda and phi and Cramer’s V. Later in the guide, I will show you an example using two ordinal level variables so you can effectively read the output.
Figure 3
Click Continue and you will return to this screen (figure 6). Click Cells. The following cell display dialogue will appear (figure 7).
Figure |
4 |
|
& Figure |
5 |
|
|
By default, only |
Observed |
should be checked. This will report the number of people who fall |
into each cell. In addition to observed, you should also check the box for column percentages (Figure 8). When you create your own table in Word or Excel, I only want to see the column percentages in your new table. Click Continue. This will return you to the crosstabs screen (figure 9).
Figure |
6 |
|
& |
|
Figure |
7 |
|
|
|
Now you are ready to click OK. This will produce the output.
The output screen will show several boxes to you. Let’s go through them one by one.
Output
The first box simply shows you how many valid and missing cases you have. For your purposes, the Valid percent should be 100%, and missing should be 0%.
Figure 8
The next box shows you the cross-tabulation. We’ll go through it one by one.
Figure 9
First, just look at the table and make sure your data are displayed appropriately. Ours are good! Notice the cells:
Figure 10
Just from looking at the cells, it looks like people who identify as gay, lesbian, or homosexual seem more likely than bisexuals or heterosexuals to agree that LGBT people should be able to marry. Bisexuals seem more divided, but lean more heavily on agreeing that they should be allow to marry. Heterosexuals seem even more divided: about half agreeing, 11.8% do not agree or disagree, and more than a third disagree.
Take a look at the totals:
Figure 11
By now you should also notice that the sample is disproportionately heterosexual. If we only looked at the raw totals, we would not have been able to infer much about the relationship between sexual orientation and attitudes toward same sex marriage. You will need to replicate this table on your own, using either Excel or Word to create the table for you. The next box shows you the chi-square statistics. We are interested in the information contained in the row labeled Pearson Chi-Square.
Figure |
12 |
|
& |
|
Figure |
13 |
|
|
Our obtained chi |
- |
square statistic is 22.355. We have eight degrees of freedom. The P |
- |
value |
(labeled asymp. sig. (2-sided) is 0.004. With this information, we can reject the null hypothesis. Sexual orientation and attitudes toward same sex marriage are statistically dependent. The next box shows us the directional measures. These are the first set of measures of association we calculated. In this box, we can find lambda. Since our dependent variable is marhomo, we need to refer to that section of the table.
Figure |
14 |
|
& |
|
Figure |
15 |
|
|
Lambda is zero! Why is that? |
|
lambda will always be zero when the mode for |
each category |
of the independent |
variable falls into the same category of the independent variable – even if other
measures of association tell us that the two variables actually are related. If the two
variables seem related, based on the chi-square statistic or observations of the differences in percentages, we need to try a different measure of association to measure the strength of the relationship.
Lambda is not an adequate measure of association for our relationship. Here, we need to look at the row labeled Cramer’s V. So let’s take a look at the final table in our output:
Figure 16 & Figure 17
Do not worry about information in the column labeled approx. sig. Our Cramer’s V is 0.10. Interpret appropriately. J
Chi-Square and Measures of Association for Two Ordinal Variables
If you select two ordinal level variables to complete this portion of the assignment, you will need to select a different measure of association. For this next test, I will continue to use marhomo as the dependent variable. The variable fund16 is the independent variable. Fund16 is an ordinal level variable that reports the “fundamentalism/liberalism of religion [the] respondent was raised in.” The categories “fundamentalist” (1) means the religion was categorized as a very conservative denomination. “Moderate” (2) means the religion was categorized as being a more moderate religion. “Liberal” (3) means the religion was considered to be a liberal religious group. We can think of this as a continuum. Higher scores indicate more liberal religious beliefs at 16; lower scores indicate more conservative religious beliefs.
We will test the following hypotheses:
Figure 18
We set up our chi-square test similarly to the previous example, until we get to the statistics box. We will still select a chi-square test to test the significance of the relationship, but with respect to the association, we need to select gamma and either Kendall’s tau-b or Kendall’s tau-c. Since there are five categories on our dependent variable, and only three for our column variable, we should select Kendall’s tau-c in addition to gamma. This is because we use Kendall’s tau-c when the cross-tabulation is a rectangle. If we had the same number of categories on both variables, we would use Kendall’s tau-b.
Figure 19
Output
Let’s take a look at the cross-tabulation:
Figure 20
It seems that something may be going on here. A little over one-third of people raised in fundamentalist religions agree that same-sex couples should be allowed to marry. Twelve and a half percent do not agree or disagree, while over half disagree with the statement that samesex couples should be allowed to marry. Over one-half of those raised in moderate and liberal denominations agreed that same-sex couples should be allowed to marry. So there might be something going on here. Keep in mind that on the fund16 variable, lower scores = more
conservative beliefs; higher scores = more liberal beliefs. On our marhomo variable, lower scores = agreement that same-sex couples should be allowed to marry and higher scores = disagreement that same-sex couples should be allowed to marry. With this in mind, and with
the evidence presented above, it seems that if the two variables are statistically dependent, we are likely to have a negative relationship. As the independent variable score increases, the dependent variable score decreases.
The chi-square test suggests that we should reject our null hypothesis. Religious
fundamentalism and attitudes toward same-sex marriage are statistically dependent. Now we can examine the box labeled symmetric measures. Here we can find our gamma and the Kendall’s tau-c.
Figure |
21 |
|
|
& |
Figure |
22 |
|
|
We are only interested in the information contained in the |
value |
column. Our Kendall’s tau |
- |
c is |
-0.143 and our gamma is -0.193. Interpret appropriately. J
To begin, click Analyze, then Regression, and finally linear to complete the regression and correlation portion of Project 4. The following linear regression dialogue will appear:
Figure |
23 |
|
|
& |
Figure |
24 |
|
|
You will need to select two appropriate variables to |
obtain your regression line equation. |
Remember, your independent variable is the one you think will predict a change in the dependent variable.
In this example, weekswrk is an interval-ratio level variable reporting the number of weeks a respondent worked last year. It is the independent variable. VISLIB is an interval-ratio level variable reporting the number of times a respondent visited a public library last year.
Figure 25
All we need to do to get the information to report our linear regression equation, the correlation coefficient, and the coefficient of determination, is click OK. Your output will appear! I’ll go through each box one by one.
Figure 26
The first box just shows the variables in the equation and the method used to enter them (you don’t need to worry about this). Double check that the variables entered matches your independent variable and the dependent variable below the box matches what you intended to do. We’re all good!
The Model Summary box reports Pearson’s correlation coefficient (R) and the coefficient of determination (R2).
Figure 27
Hold off on interpreting the correlation coefficient for now. We can also see that our regression equation demonstrates that only 0.1% of the variation in visits to the library last year is explained by how many weeks worked last year.
The next box shows us ANOVA. ANOVA and regression are related! For this portion of the assignment, do not worry about this box. This box shows us if there is a significant relationship in our regression (we did not cover this in our class – we can conduct hypothesis tests with regression, too!). In short: there isn’t a statistically significant relationship.
Figure 28
The final box, labeled coefficients, contains the information you need to report your regression line equation.
Figure 29
The information contained under the column labeled B shows us both the slope (b) and the yintercept (a).
The row labeled (Constant) shows us the constant of the regression line. This is different language than you are already familiar. The number in the cell where B and (Constant) meet is the y-intercept.
The row labeled WEEKS R WORKED LAST YEAR shows us various statistics relevant to how the independent variable is related to the dependent variable. In the cell where B and WEEKS R WORKED LAST YEAR meet, we are shown the slope of the line (b). For each unit increase in the number of weeks someone worked last year, we expect a decrease in the number of library visits last year of 0.019.
Let’s return to the correlation box I showed you earlier.
Figure 30
The correlation coefficient presented here is only going to show a positive figure (this is what you should expect; the explanation is beyond the scope of this class). However, based on the slope of our regression line, we know that we actually have a negative relationship – as X increases, Y decreases. When you report your correlation coefficient, make sure you report the appropriate sign. In this case, we know we have a very weak negative relationship (R = 0.036).
Just to confirm, you can but do not have to calculate the bivariate correlation (click Analyze, then Correlation, and then Bivariate Correlation). Take a look at what we find:
Figure 31
There it is! We can see there is a negative correlation of -0.36!
To begin our ANOVA, you will first click Analyze, then Compare Means, and finally One-way ANOVA. The following One-way ANOVA dialogue will appear:
Figure |
32 |
|
|
& |
|
Figure |
33 |
|
|
Under |
dependent list, |
you will select your dependent variable. For our purposes, I’ll use our |
weekswrk variable from earlier. Under factor, you will select your grouping variable. I’m selecting marital, which is a nominal level variable reporting respondents’ marital status: married, widowed, divorced, separated, or never married. This is the variable you will use to see if there are differences in the mean number of weeks worked last year by marital status.
I’ll test the following hypotheses:
Figure 34
From here, click on the box that says Post Hoc. Your book doesn’t discuss post-hoc tests, but they are very useful in figuring out which groups differ and which do not. The following Post Hoc Multiple Comparisons dialogue will appear:
Figure |
35 |
|
& |
|
Figure |
36 |
|
|
There are lots of different tests we can use to see which groups differ. Click the box labeled |
simply |
Tukey. |
Then click |
Continue. |
This will bring you back to the |
One |
- |
Way ANOVA |
dialogue. |
Click OK.
Figure 37
Now your output will appear. We’ll go through it one by one. The first box shows you the sum of squares, mean squares, degrees of freedom, the obtained F-statistic, and the p-value.
Everything you need and are already comfortable with calculating by hand. J
Figure 38
In Project 4, I’ve asked you to report dfb, dfw, MSB, MSW, the obtained F-statistic, and the pvalue. It’s all right here! We can reject the null hypothesis because our ANOVA demonstrates there is at least one difference in mean number of weeks worked last year. If you did not find a significant relationship, you’re pretty much done. If you did find a significant relationship, you need to take a look at the next bit of output. The Post Hoc Tests output comes next.
39
The first column shows us all five relationship status categories. Notice that it is labeled (I) MARITAL STATUS. The next column shows us each of the other four relationship status categories, relative to the category reported in the first column. It is labeled (J) MARITAL STATUS. The third column labeled Mean Differences (I-J) shows us the mean difference in weeks worked last year, subtracting the mean number of weeks worked for the first category (labeled I) from the mean number of weeks worked for the second category (labeled J). Let’s examine the first row:
40
First, we are looking at the mean difference in the number of weeks worked last year between married respondents and widowed respondents. The value reported in the first cell under the Mean Difference (I-J) column is calculated using the following formula:
??????? ??????? ?????????? ????? ??????
In this equation, I = Married and J = Widowed. The mean difference is 23.317 weeks. This means that married respondents worked an average of 23.317 more weeks than widowed respondents did last year. Now take a look at the value under the sig. column. This is the P-value! The P-value is 0.000. This means that at the alpha = 0.05 level, we can confirm that married respondents worked significantly more weeks last year than widowed respondents. You need to do this for each row. Notice there is repeated information in the table. Let’s look at the row where I = WIDOWED and J = MARRIED.
41
From this, we can see that widowed respondents worked significantly fewer weeks last year than married respondents did. They worked 23.317 weeks fewer, on average. Hey! That’s the reciprocal value! J
Let’s identify each set that significantly differed:
42
Notice anything? Widowed respondents worked significantly fewer weeks last year than married, divorced, separated, and never married respondents. Perhaps even more interesting, the only significant differences involved widowed respondents. What might explain this? It is
probable that most of the widowed respondents are elderly, and thus at retirement age already. Make sure you report all significant differences appropriately. J