Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / ### Problem 4: ddply() practice This problem uses the Adult dataset, which we load below

### Problem 4: ddply() practice This problem uses the Adult dataset, which we load below

Sociology

### Problem 4: ddply() practice

This problem uses the Adult dataset, which we load below.  The main variable of interest here is `high.income`, which indicates whether the individual's income was over $50K.  Anyone for whom `high.income == 1` is considered a "high earner".

```{r}
adult.data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header=FALSE, fill=FALSE, strip.white=T,
                 col.names=c("age", "type_employer", "fnlwgt", "education", 
                "education_num","marital", "occupation", "relationship", "race","sex",
                "capital_gain", "capital_loss", "hr_per_week","country", "income"))
adult.data <- mutate(adult.data,
                     high.income = as.numeric(income == ">50K"))
```

##### (a) Income by education level

Use the `ddply()` function to produce a summary table showing how many individuals there are in each `education_num` bin, and how the proportion of high earners varies across `education_num` levels.  Your table should have column names: `education_num`, `count` and `high.earn.rate`.  

```{r}
# Edit me
```

##### (b) Constructing a bar chart

Using the `ggplot` and `geom_bar` commands along with your data summary from part **(a)** to create a bar chart showing the high earning rate on the y axis and `education_num` on the x axis.  Specify that the color of the bars should be determined by the number of individuals in each bin.

```{r}
# Edit me
```

##### (c) summary table with multiple splitting variables

Use the `ddply()` function to produce a summary table showing how the proportion of high earners varies across all combinations of the following variables: `sex`, `race`, and `marital` (marital status).  In addition to showing the proportion of high earners, your table should also show the number of individuals in each bin.  Your table should have column names: `sex`, `race`, `marital`, `count` and `high.earn.rate`.  

```{r}
# Edit me
```

##### (d) Nicer table output using `kable()`

Use the `kable()` function from the `knitr` library to display the table from part **(c)** in nice formatting.  You should use the `digits` argument to ensure that the values in your table are being rounded to a reasonable number of decimal places.  

```{r}
# Edit me
```

### Problem 5: Getting the right plot

##### (a) A more complex bar chart.

Using the table you created in 4(c), use ggplot graphics to construct a plot that looks like [the one at this link](http://www.andrew.cmu.edu/user/achoulde/94842/homework/target_fig.png)


**Hint** You may find it useful to use the following layers: `facet_grid`, `coord_flip` (for horizontal bar charts), `theme` (rotating x axis text) and `guides` (removing fill legend). 

```{r, fig.height = 4, fig.width = 8}
# Edit me
```

##### (b)  Hiding code with `echo` 

Repeat part **(a)**, but this time set the `echo` argument of the code chunk in such a way that the code is not printed, but the plot is still displayed.

```{r, fig.height = 4, fig.width = 8}
# Edit me

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions