Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / Lab 1: R Basics and Working with Data Stats 10: Introduction to Statistical Reasoning Spring 2022 All rights reserved, Adam Chaffee, Michael Tsiang, and Maria Cha 2017-2022

Lab 1: R Basics and Working with Data Stats 10: Introduction to Statistical Reasoning Spring 2022 All rights reserved, Adam Chaffee, Michael Tsiang, and Maria Cha 2017-2022

Statistics

Lab 1: R Basics and Working with Data
Stats 10: Introduction to Statistical Reasoning
Spring 2022
All rights reserved, Adam Chaffee, Michael Tsiang, and Maria Cha 2017-2022.
Do not post, share, or distribute anywhere or with anyone without explicit permission.
Main exercises based on labs by Nicolas Christou and Robert Gould.
Objectives
1. Demonstrate basic R Skills (create vectors, perform vector operations, install packages)
2. Use basic statistical functions (mean, min, max, median, sd)
3. Visualize data (dot plot, histogram, box plot, contingency tables, scatter plot, bubble plot)
Before Coming to Lab
1. If you want to download the software on your desktop,
o Step 1: Download the latest version of R at
§ If you are using Mac: https://cran.r-project.org/bin/macosx/
§ If you are using Windows: https://cran.r-project.org/bin/windows/base/
o Step 2: After R is installed, download the latest version of RStudio (the free one
on the left): https://www.rstudio.com/products/rstudio/download/
2. If you do not want to download the software on your desktop, you are able to have access
to our department's server running RStudio Server software. Rstudio Server can be found
at https://rstudio.stat.ucla.edu/auth-sign-in?appUri=%2F An account will automatically
be created for you using your Google email address (xxxxxx@g.ucla.edu or
yyyyyy@gmail.com).
o Type in your Google email account into the textbox then select the "Sign in with
Google" Button.
§ If you used a @g.ucla.edu account then you will need to authenticate with
your UCLA Login credentials.
§ If you used a @gmail.com account then you will encounter a textbox that
asks for your password. Submit your password.
o If your official UCLA profile does NOT list a google email account, then you will
need to change your profile. To check your official email profile, please visit
https://my.ucla.edu and log in. Then click on "Settings" which is in the top light
blue bar just to the right of where you will see your first name. You will need to
update the email address if the one indicated is not an @g.ucla.edu or
@gmail.com one. Please complete this task immediately.
3. (Optional) Download “Introductory Statistics with R” by Peter Dalgaard. The book is
available for free on springerlink.com. You must access it via a campus computer or
using the BOL VPN. Read 1.1 and 1.2.1-1.2.3 of the Dalgaard book.
A Primer on labs
In Stats 10 the lab courses are designed to help you explore the capabilities of computer
programming to assist you in statistical analysis. This quarter we will be using R, a high-level
statistical programming language. We will cover several basic topics in R to help you carry out
analysis that you learn in lecture. R is a popular program in universities because it is free and
open-source, which means anyone can access the program’s source code and suggest
improvements.
Collaboration Policy
In lab you are encouraged to work in pairs or small groups to discuss the concepts on the
assignments. It is also allowed to follow and copy TA’s demonstration to complete the
assignments. However, DO NOT copy each other’s work as this constitutes cheating. The work
you submit must be entirely your own. If you have a question in lab, feel free to reach out to
other groups or talk to your TA if you get stuck.
What to include in the Lab report
Each lab contains multiple sections. Each section includes instruction and examples about
different topics. Read carefully and try the examples demonstrated in the examples. At the end of
each section, there are exercise questions that are similar to the examples. You need to include
1) R commands and 2) the results, including graphs and tables, and answers for all exercise
questions in the lab report and submit the report in pdf to CCLE turnitin link in every
other week. See the lab report template as an example. Note that your lab assignment is graded
based on TA’s instruction.
Section 1 – R and RStudio Basics
Instructions
When you open RStudio, you’ll notice the window is partitioned into four sections:
• The top left section is your working code file. This is where you should type and run
your commands. It serves two main purposes:
o You can edit and save this section just like a word document. In fact, let’s save
right now. Go to file>save, and save this in a convenient location on your
computer.
o You will run lines of code by highlighting the code chunk you want to run, and
clicking the run button (shortcut: ctrl+enter for PC, cmd+enter for Mac)
• The bottom left section is the “console”. This is essentially the R engine that runs your
commands and produces output. You should not need to type anything in this section, but
you will need to copy and paste output for exercises.
• The top right section defines your environment. The environment contains any objects
you create and data you import. You won’t use it much but it’s a quick way to see what
data and objects you have to work with.
• The bottom left section contains three important tabs in this course:
o Plots: when you create plots, they will show up here. You should copy and paste
these to answer some of the lab exercises.
o Packages: R contains packages that you can install and run to use cool features
and functions that the basic R install did not give you.
o Help: If you are unsure about what a function does, try searching for it in the help
tab. You will find information about the function arguments and examples.
Top left section Top right section
Bottom left section Bottom right section
Working with packages
To work with packages, we must do two things:
1. Install the package by using the packages tab mentioned above
2. Type and run library(“package_name”) to tell R to load the package we installed
Reading Data into RStudio
R needs to be told which folder to look in to retrieve data. To read in data, first download it into
a convenient folder on your machine. Then, try one of these methods:
1. At the top of RStudio, go to Session> set working directory> choose directory. Then
select the folder the data is saved in, and click “Select Folder”. Then, create a new data
object by using the syntax object_name <- read.csv(file = "filename.csv")
2. Use the syntax object_name <- read.csv(file.choose()) and use the file explorer to
manually select the downloaded data.
Vectors – Examples
Try running the commands below. The first creates a numeric vector object called numbers that
contains the numbers 1 to 5. The second creates a character vector object called schools that
contains the names of two top-tier colleges, and the name of a terrible school for spoiled
children. Note that when you create a vector of names, you must put quotes around each element.
The c() function is used to create both vectors. Try typing c() in the help menu to get a better
sense of what this function does.
numbers <- c(1, 2, 3, 4, 5)
schools <- c("UCLA", "UC Berkeley", "USC")
Note that when you run these commands, there is no output. That is because whenever you
define an object, there is not output. However, once you have made the object, you can print the
contents of an object by typing the name of the object. Example: type numbers
If the vector is numeric, we can do math on it. For example, try typing numbers * 2
We can also use square brackets to create subsets of our data. For example, if you type
schools[2] you will retrieve only the second element from the schools vector.
Object classes
Vectors, matrices, and data frames are three common object classes you will work with. Matrices
and data frames can be thought of as tables of data. However, matrices contain only numeric
data. Data frames can contain several types of data. The file you will work with in this lab is a
sample of information about babies born in North Carolina. It is considered a data frame because
it contains numeric information about each baby, as well as various pieces of categorical data
such as the race of the parents, and whether the mom has a smoking habit.
Section 1 exercises
1. - Vectors:
a. Create a vector named pinky that contains the pinky lengths, in inches, of yourself and
two of your friends or family members’. Print the contents of this vector.
b. Create a vector named cities that contains the names of cities where these people live in.
Print the contents of this vector.
c. Try typing cbind(pinky, cities). What did this command do? What ‘class’ is this new
object in R?
Hint: Try the class() function.
2. – Downloading data:
a. Download the data set births2022.csv from the CCLE site and upload it into RStudio.
Name the data frame NCbirths.
b. Demonstrate that you have been successful by typing head(NCbirths) and copying and
pasting the output into your word processing document.
3. – Load the maps package
a. Install the maps package. Verify its installation by typing find.package("maps") and
include the output in your answer.
b. Type library(maps) to load up the package. Type map("state") and include the plot output
in your answer.
4. – Perform vector operations
a. Extract the weight variable as a vector from the data frame by typing
weights <- NCbirths$weight
b. What units do you think the weights are in?
c. Create a new vector named weights_in_pounds which are the weights of the babies in
pounds. You can look up conversion factors on the internet.
d. Demonstrate your success by typing weights_in_pounds[1:10] and including the output in
your word processing document. This will print the first 10 weights of the babies in the
dataset.
Section 2 – Summarizing Data (one variable)
Useful functions
The functions summary(), mean(), sd(), max(), and min() all produce helpful results to help us
understand how quantitative data is distributed. These functions will take in a numeric vector
inside the parentheses. As an example, max(NCbirths$weight) will give us the maximum weight
of all the babies in out sample.
We can also use the tally() function in the mosaic package to help us summarize categorical data.
Tally requires a categorical vector inside the parentheses. For example, after you have installed
and loaded the mosaic package, try typing tally(NCbirths$Racemom).
Section 2 exercises
1. What is the mean and standard deviation of the mothers’ ages in the dataset? Hint : use
the variable “Mage.”
2. How many of mothers in the sample smoke? Also, what percentage of the mothers in the
sample smoke? Hint: use the tally function with the format argument. Use the help screen
for guidance.
3. According to the Centers for Disease Control, nearly 17 of every 100 U.S. females aged
18 years or older (17.0%) smoked cigarettes in 2022. How far off is the percentage you
found in 2 from the CDC’s report in 2022?
Section 3 – Visualizing Data (one quantitative variable)
Useful functions
The histogram, dot plot, and box plot are useful tools for visualizing quantitative data. The
functions you can use in R are histogram(), dotPlot(), and boxplot(). Browse the help screen for
these functions to get a sense of what these functions require.
Section 3 exercises
1. Produce a dot plot of the weights in pounds. Describe what you find from the dop plot.
2. Produce three different histograms of the weights in pounds. Use 5 bins, 20 bins, and 100
bins. Which histogram seems to give the best visualization, and why?
3. We can use the syntax boxplot(vector1, vector2) to make a side by side box plot. Create a
side by side boxplot of the mother’s ages and the father’s ages. Which gender tends to be
older?
4. Try typing histogram(~ weight | Habit, data = NCbirths, layout = c(1, 2)). Describe what
this code does. Based on the graph, do you see any major differences between baby
weights from smoking moms vs. non-smoking moms?
Section 4 – Visualizing Data (two categorical)
Useful functions
With two categorical variables we can again create two-way tables using the tally() function. For
example, try tally(~Habit | MomPriorCond, data = NCbirths, format = "proportion")
Section 4 exercises
1. Consider the other categorical variables in this data. Of those that record the health of the
baby, which do you think will be associated with the mother’s smoking and why? Make a
two-way Summary Table to check your hypothesis. Do you have evidence that this
variable associated with smoking? Why?
2. Create a segmented bar chart of the two variables of your choice in 1. Hint: use the
barplot function.
Section 5 – Visualizing Data (two quantitative)
Useful functions
To visualize a potential relationship between two quantitative variables, we typically use scatter
plots. The syntax is plot(variable1 ~ variable2)
Also, for most plotting functions we can add some arguments to make the plots look more
presentable.
• The col argument changes the color of the data.
• cex changes the size of the points
• pch should be an integer from 1 to 25 changes the style of the points
• xlab and ylab define the labels for the x and y axes
• main gives us the title for the plot
As an example, run and compare the following lines of code:
plot(NCbirths$weight ~ NCbirths$Gained)
plot(NCbirths$weight ~ NCbirths$Gained, col = "red", cex = 1.5, pch = 3,
xlab = "Weight gained during pregnancy", ylab = "Baby weight (oz.)",
main = "Baby weight vs. pregnancy weight gain")
Most of these arguments work for histograms, box plots, and dot plots too!
Section 5 exercises
1. Produce a nicely formatted scatter plot of the weight of the baby vs. the mother’s age.
Describe what you observe from the scatter plot.

Option 1

Low Cost Option
Download this past answer in few clicks

22.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE