Fill This Form To Receive Instant Help
Homework answers / question archive / Will send rmd file through email as it is too large for this platform Challenge 1 and 2 From Lines 312 - 446 title: 'Session 4: Homework 2' author: "Study group and members' names go here" date: '`r Sys
Will send rmd file through email as it is too large for this platform Challenge 1 and 2 From Lines 312 - 446
title: 'Session 4: Homework 2'
author: "Study group and members' names go here"
date: '`r Sys.Date()`'
output:
pdf_document:
toc: yes
html_document:
theme: flatly
highlight: zenburn
number_sections: yes
toc: yes
toc_float: yes
code_folding: show
word_document:
toc: yes
---
```{r, setup, include=FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
tidy=FALSE, # display code as typed
size="small") # slightly smaller font for code
options(digits = 3)
# default figure size
knitr::opts_chunk$set(
fig.width=6.75,
fig.height=6.75,
fig.align = "center"
)
```
```{r load-libraries, include=FALSE}
library(tidyverse) # Load ggplot2, dplyr, and all the other tidyverse packages
library(mosaic)
library(ggthemes)
library(lubridate)
library(here)
library(skimr)
library(janitor)
library(httr)
library(readxl)
library(vroom)
```
# Climate change and temperature anomalies
If we wanted to study climate change, we can find data on the *Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies* in the Northern Hemisphere at [NASA's Goddard Institute for Space Studies](https://data.giss.nasa.gov/gistemp). The [tabular data of temperature anomalies can be found here](https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.txt)
To define temperature anomalies you need to have a reference, or base, period which NASA clearly states that it is the period between 1951-1980.
Run the code below to load the file:
```{r weather_data, cache=TRUE}
weather <-
read_csv("https://data.giss.nasa.gov/gistemp/tabledata_v4/NH.Ts+dSST.csv",
skip = 1,
na = "***")
```
Notice that, when using this function, we added two options: `skip` and `na`.
1. The `skip=1` option is there as the real data table only starts in Row 2, so we need to skip one row.
1. `na = "***"` option informs R how missing observations in the spreadsheet are coded. When looking at the spreadsheet, you can see that missing data is coded as "***". It is best to specify this here, as otherwise some of the data is not recognized as numeric data.
Once the data is loaded, notice that there is a object titled `weather` in the `Environment` panel. If you cannot see the panel (usually on the top-right), go to `Tools` > `Global Options` > `Pane Layout` and tick the checkbox next to `Environment`. Click on the `weather` object, and the dataframe will pop up on a seperate tab. Inspect the dataframe.
For each month and year, the dataframe shows the deviation of temperature from the normal (expected). Further the dataframe is in wide format.
You have two objectives in this section:
1. Select the year and the twelve month variables from the `weather` dataset. We do not need the others (J-D, D-N, DJF, etc.) for this assignment. Hint: use `select()` function.
1. Convert the dataframe from wide to 'long' format. Hint: use `gather()` or `pivot_longer()` function. Name the new dataframe as `tidyweather`, name the variable containing the name of the month as `month`, and the temperature deviation values as `delta`.
weather
```{r tidyweather}
tidyweather <- weather %>%
select(1:13) %>%
pivot_longer(cols = 2:13,
names_to ="month", values_to = "delta" )
tidyweather
```
Inspect your dataframe. It should have three variables now, one each for
1. year,
1. month, and
1. delta, or temperature deviation.
## Plotting Information
Let us plot the data using a time-series scatter plot, and add a trendline. To do that, we first need to create a new variable called `date` in order to ensure that the `delta` values are plot chronologically.
```{r scatter_plot}
tidyweather <- tidyweather %>%
mutate(date = ymd(paste(as.character(Year), month, "1")),
month = month(date, label=TRUE),
year = year(date))
ggplot(tidyweather, aes(x=date, y = delta))+
geom_point()+
geom_smooth(color="red") +
theme_bw() +
labs (
title = "Weather Anomalies"
)
```
Is the effect of increasing temperature more pronounced in some months? Use `facet_wrap()` to produce a seperate scatter plot for each month, again with a smoothing line. Your chart should human-readable labels; that is, each month should be labeled "Jan", "Feb", "Mar" (full or abbreviated month names are fine), not `1`, `2`, `3`.
```{r facet_wrap, echo=FALSE}
ggplot(tidyweather, aes(x=date, y=delta))+
geom_point()+
geom_smooth(color="red")+
facet_wrap(~ month, nrow = 4)
```
It is sometimes useful to group data into different time periods to study historical data. For example, we often refer to decades such as 1970s, 1980s, 1990s etc. to refer to a period of time. NASA calcuialtes a temperature anomaly, as difference form the base periof of 1951-1980. The code below creates a new data frame called `comparison` that groups data in five time periods: 1881-1920, 1921-1950, 1951-1980, 1981-2010 and 2011-present.
We remove data before 1800 and before using `filter`. Then, we use the `mutate` function to create a new variable `interval` which contains information on which period each observation belongs to. We can assign the different periods using `case_when()`.
```{r intervals}
comparison <- tidyweather %>%
filter(Year>= 1881) %>% #remove years prior to 1881
#create new variable 'interval', and assign values based on criteria below:
mutate(interval = case_when(
# case_when() is a multiple ifelse()
Year %in% c(1881:1920) ~ "1881-1920",
Year %in% c(1921:1950) ~ "1921-1950",
Year %in% c(1951:1980) ~ "1951-1980",
Year %in% c(1981:2010) ~ "1981-2010",
TRUE ~ "2011-present"
))
```
Inspect the `comparison` dataframe by clicking on it in the `Environment` pane.
Now that we have the `interval` variable, we can create a density plot to study the distribution of monthly deviations (`delta`), grouped by the different time periods we are interested in. Set `fill` to `interval` to group and colour the data by different time periods.
```{r density_plot}
ggplot(comparison, aes(x=delta, fill=interval))+
geom_density(alpha=0.2) + #density plot with tranparency set to 20%
theme_bw() + #theme
labs (
title = "Density Plot for Monthly Temperature Anomalies",
y = "Density" #changing y-axis label to sentence case
)
```
So far, we have been working with monthly anomalies. However, we might be interested in average annual anomalies. We can do this by using `group_by()` and `summarise()`, followed by a scatter plot to display the result.
#creating yearly averages
average_annual_anomaly <- tidyweather %>%
group_by(Year) %>% #grouping data by Year
# creating summaries for mean delta
# use `na.rm=TRUE` to eliminate NA (not available) values
summarise(annual_average_delta = mean(delta, na.rm=TRUE))
#plotting the data:
```{r averaging}
#creating yearly averages
#creating yearly averages
average_annual_anomaly <- tidyweather %>%
group_by(Year) %>% #grouping data by Year
# creating summaries for mean delta
# use `na.rm=TRUE` to eliminate NA (not available) values
summarise(annual_average_delta = mean(delta, na.rm=TRUE))
#plotting the data:
ggplot(average_annual_anomaly, aes(x=Year, y= annual_average_delta))+
geom_point()+
#Fit the best fit line, using LOESS method
geom_smooth() +
#change to theme_bw() to have white background + black frame around plot
theme_bw() +
labs (
title = "Average Yearly Anomaly",
y = "Average Annual Delta"
)
```
## Confidence Interval for `delta`
[NASA points out on their website](https://earthobservatory.nasa.gov/world-of-change/decadaltemp.php) that
> A one-degree global change is significant because it takes a vast amount of heat to warm all the oceans, atmosphere, and land by that much. In the past, a one- to two-degree drop was all it took to plunge the Earth into the Little Ice Age.
Your task is to construct a confidence interval for the average annual delta since 2011, both using a formula and using a bootstrap simulation with the `infer` package. Recall that the dataframe `comparison` has already grouped temperature anomalies according to time intervals; we are only interested in what is happening between 2011-present.
stanError = StandDeviation/sqrt(length(delta)
#Upperlevel=average + qt(0.95,length(delta)-1) * stanError
```{r, calculate_CI_using_formula}
formula_ci <- comparison %>%
filter(interval == "2011-present")%>%
summarise(mean_delta = mean(delta, na.rm = TRUE),
StandDeviation = sd(delta ,na.rm =TRUE),
se = StandDeviation/sqrt(length(delta)),
count = n(),
upperlvl = mean_delta + qt(0.975, count)*se,
lowerlvl = mean_delta - qt(0.975, count)* se
)
# choose the interval 2011-present
# what dplyr verb will you use?
# calculate summary statistics for temperature deviation (delta)
# calculate mean, SD, count, SE, lower/upper 95% CI
# what dplyr verb will you use?
#print out formula_CI
formula_ci
```
```{r, calculate_CI_using_bootstrap}
```
> What is the data showing us? Please type your answer after (and outside!) this blockquote. You have to explain what you have done, and the interpretation of the result. One paragraph max, please!
# Global warming and political views
[A 2010 Pew Research poll](https://www.pewresearch.org/2010/10/27/wide-partisan-divide-over-global-warming/) asked 1,306 Americans, "From what you've read and heard, is there solid evidence that the average temperature on earth has been getting warmer over the past few decades, or not?"
In this exercise we analyze whether there are any differences between the proportion of people who believe the earth is getting warmer and their political ideology. As usual, from the **survey sample data**, we will use the proportions to estimate values of *population parameters*. The file has 2253 observations on the following 2 variables:
- `party_or_ideology`: a factor (categorical) variable with levels Conservative Republican, Liberal Democrat, Mod/Cons Democrat, Mod/Lib Republican
- `response` : whether the respondent believes the earth is warming or not, or Don't know/ refuse to answer
```{r, read_global_warming_pew_data}
global_warming_pew <- read_csv(here::here("data", "global_warming_pew.csv"))
```
You will also notice that many responses should not be taken into consideration, like "No Answer", "Don't Know", "Not applicable", "Refused to Answer".
```{r}
global_warming_pew %>%
count(party_or_ideology, response)
```
We will be constructing four 95% confidence intervals to estimate population parameters, for the % who believe that **Earth is warming**, accoridng to their party or ideology. You can create the CIs using the formulas 'by hand'-- just remember to exclude the Dont know / refuse to answer!
Does it appear that whether or not a respondent believes the earth is warming is independent of their party ideology? You may want to
You may want to read on [The challenging politics of climate change](https://www.brookings.edu/research/the-challenging-politics-of-climate-change/)
# Biden's Approval Margins
As we saw in class, fivethirtyeight.com has detailed data on [all polls that track the president's approval ](https://projects.fivethirtyeight.com/biden-approval-rating)
```{r, cache=TRUE}
# Import approval polls data directly off fivethirtyeight website
approval_polllist <- read_csv('https://projects.fivethirtyeight.com/biden-approval-data/approval_polllist.csv')
glimpse(approval_polllist)
# Use `lubridate` to fix dates, as they are given as characters.
```
## Create a plot
What I would like you to do is to calculate the average net approval rate (approve- disapprove) for each week since he got into office. I want you plot the net approval, along with its 95% confidence interval. There are various dates given for each poll, please use `enddate`, i.e., the date the poll ended.
Also, please add an orange line at zero. Your plot should look like this:
```{r trump_margins, echo=FALSE, out.width="100%"}
knitr::include_graphics(here::here("images", "biden_approval_margin.png"), error = FALSE)
```
## Compare Confidence Intervals
Compare the confidence intervals for `week 3` and `week 25`. Can you explain what's going on? One paragraph would be enough.
# Challenge 1: How has the CPI and its components changed over the last few years?
Remember how we used the tidyqant package to download CPI data. In this exercise, I would like you to do the following:
1. You can find [CPI components at FRED](https://fredaccount.stlouisfed.org/public/datalist/843). You should adapt the code from German polls to scrape the FRED website and pull all of the CPI components into a vector. FIY, the list of components is the second table in that webpage.
```{r}
# read off CSV directly
cpi_components <- read_csv(here::here("data", "cpi_components.csv"))
```
1. Once you have a vector of components, you can then pass it to `tidyquant::tq_get(get = "economic.data", from = "2000-01-01")` to get all data since January 1, 2000
```{r}
#cpi_yoy <-
#cpi_components %>%
#pull(component) %>%
#tq_get(get = "economic.data", from = "1990-01-01") %>%
#rename(component = symbol,
#value = price)
#my_data <- cpi_yoy %>%
#left_join(cpi_components, by = "component")
```
1. Since the data you download is an index with various starting dates, you need to calculate the yearly, or 12-month change. To do this you need to use the `lag` function, and specifically, `year_change = value/lag(value, 12) - 1`; this means you are comparing the current month's value with that 12 months ago lag(value, 12).
1. I want you to order components so the higher the yearly change, the earlier does that component appear.
1. You should also make sure that the **All Items** CPI (CPIAUCSL) appears first.
1. Add a `geom_smooth()` for each component to get a sense of the overall trend.
1 You may want to colour the points according to whether yearly change was positive or negative.
Having done this, you should get something similar to this graph.
```{r cpi_all_components_since_2016, echo=FALSE, out.width="100%"}
knitr::include_graphics(here::here("images", "cpi_components_since_2016.png"), error = FALSE)
```
I used the following colour palette
```
my_colour_palette <- c("#7abaf3", "#DB5E61")
```
This graphs is fine, but perhaps has too many sub-categories. You can find the [relative importance of components in the Consumer Price Indexes: U.S. city average, December 2020](https://www.bls.gov/cpi/tables/relative-importance/2020.htm) here. Can you choose a smaller subset of the components you have and only list the major categories (Housing, Transportation, Food and beverages, Medical care, Education and communication, Recreation, and Apparel), sorted according to their relative importance?
# Challenge 2: Share of renewable energy production in the world
The National Bureau of Economic Research (NBER) has a a very interesting
dataset on the adoption of about 200 technologies in more than 150
countries since 1800. This is the[Cross-country Historical Adoption of
Technology (CHAT) dataset](https://www.nber.org/research/data/cross-country-historical-adoption-technology).
The following is a description of the variables
| **variable** | **class** | **description** |
|--------------|-----------|--------------------------------|
| variable | character | Variable name |
| label | character | Label for variable |
| iso3c | character | Country code |
| year | double | Year |
| group | character | Group (consumption/production) |
| category | character | Category |
| value | double | Value (related to label) |
```{r,load_technology_data}
technology <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-19/technology.csv')
#get all technologies
labels <- technology %>%
distinct(variable, label)
# Get country names using 'countrycode' package
#technology <- technology %>%
#filter(iso3c != "XCD") %>%
#mutate(iso3c = recode(iso3c, "ROM" = "ROU"),
#country = countrycode(iso3c, origin = "iso3c", destination = "country.name"),
#country = case_when(
#iso3c == "ANT" ~ "Netherlands Antilles",
#iso3c == "CSK" ~ "Czechoslovakia",
#iso3c == "XKX" ~ "Kosovo",
#TRUE ~ country))
#make smaller dataframe on energy
#energy <- technology %>%
# filter(category == "Energy")
# download CO2 per capita from World Bank using {wbstats} package
# https://data.worldbank.org/indicator/EN.ATM.CO2E.PC
#co2_percap <- wb_data(country = "countries_only",
# indicator = "EN.ATM.CO2E.PC",
#start_date = 1970,
#end_date = 2022,
#return_wide=FALSE) %>%
#filter(!is.na(value)) %>%
#drop unwanted variables
#select(-c(unit, obs_status, footnote, last_updated))
# get a list of countries and their characteristics
# we just want to get the region a country is in and its income level
#countries <- wb_cachelist$countries %>%
# select(iso3c,region,income_level)
```
This is a very rich data set, not just for energy and CO2 data, but for many other technologies. In our case, we just need to produce a couple of graphs-- at this stage, the emphasis is on data manipulation, rather than making the graphs gorgeous.
First, produce a graph with the countries with the highest and lowest % contribution of renewables in energy production. This is made up of `elec_hydro`, `elec_solar`, `elec_wind`, and `elec_renew_other`. You may want to use the *patchwork* package to assemble the two charts next to each other.
```{r min-max_renewables, echo=FALSE, out.width="100%"}
knitr::include_graphics(here::here("images", "renewables.png"), error = FALSE)
```
Second, you can produce an animation to explore the relationship between CO2 per capita emissions and the deployment of renewables. As the % of energy generated by renewables goes up, do CO2 per capita emissions seem to go down?
```{r animation, echo=FALSE, out.width="100%"}
knitr::include_graphics(here::here("images", "animation.gif"), error = FALSE)
```
To create this animation is actually straight-forward. You manipulate your data, and create the graph in the normal ggplot way. The only `gganimate` layers you need to add to your graphs are
```
labs(title = 'Year: {frame_time}',
x = '% renewables',
y = 'CO2 per cap') +
transition_time(integer(year)) +
ease_aes('linear')