chapter 1: visualization ........................................................................................................................4
1.1 functions used ............................................................................................................................4
1.4 setting up ...................................................................................................................................4
1.5 visualization ...............................................................................................................................5
1.6 customizing plots........................................................................................................................5
1.7 other plot types ...........................................................................................................................8
1.8 more visualizations .....................................................................................................................8
chapter 2: workflow basics ...................................................................................................................9
2.1 functions used ............................................................................................................................9
2.2 workflow basics ..........................................................................................................................9
R project..................................................................................................................................... 10
code style ................................................................................................................................... 11
2.3 set up ....................................................................................................................................... 11
2.4 import data : csv files ................................................................................................................ 11
2.5 problems .................................................................................................................................. 12
2.6 import other data types ............................................................................................................. 12
2.7 variable type ............................................................................................................................. 13
logical vectors ............................................................................................................................ 13
numbers..................................................................................................................................... 14
factors ....................................................................................................................................... 14
missing values ............................................................................................................................ 15
chapter 3 : data transformation .......................................................................................................... 16
3.1 functions used .......................................................................................................................... 16
3.3 the basics of dplyr ..................................................................................................................... 16
3.4 the pipe operator : |> ................................................................................................................. 16
3.5 operations on rows ................................................................................................................... 16
3.6 operations on columns.............................................................................................................. 17
3.7 groups ...................................................................................................................................... 17
3.9 saving data ............................................................................................................................... 18
chapter 4: reproducible reports .......................................................................................................... 19
4.2 functions used .......................................................................................................................... 19
4.3 required packages .................................................................................................................... 19
4.4 why use reproducible reports? ................................................................................................... 19
4.5 projects .................................................................................................................................... 19
4.5.1 start a project in Rstudio ..................................................................................................... 19
, 4.6 introduction to quarto ............................................................................................................... 20
4.6.1 what is quarto? ................................................................................................................... 20
4.6.2 why use it? ......................................................................................................................... 20
4.6.3 getting started with quarto .................................................................................................. 20
4.6.4 source versus visual editor .................................................................................................. 20
4.7 quarto syntax and structure ....................................................................................................... 20
4.7.1 metadata (YAML) ................................................................................................................ 20
4.7.2 markdown .......................................................................................................................... 20
4.7.3 code chunks....................................................................................................................... 21
4.7.4 running code ...................................................................................................................... 21
4.7.5 inline code ......................................................................................................................... 21
4.7.6 rendering your file ............................................................................................................... 21
4.8 writing a report .......................................................................................................................... 22
4.8.1 setup chunk ....................................................................................................................... 22
4.8.2 chunks options ................................................................................................................... 22
4.8.3 online sources .................................................................................................................... 22
4.8.4 local data files .................................................................................................................... 22
4.8.5 data analysis ...................................................................................................................... 22
4.8.6 code comments ................................................................................................................. 23
4.8.7 images ............................................................................................................................... 23
4.8.8 tables ................................................................................................................................ 23
4.8.9 cross references ................................................................................................................. 24
4.9 exploring the palmerpenguins dataset ........................................................................................ 24
chapter 5: data tidying ........................................................................................................................ 25
5.2 functions used .......................................................................................................................... 25
5.3 set-up ...................................................................................................................................... 25
5.4 tidy and untidy data ................................................................................................................... 25
5.4.1 untidy data ......................................................................................................................... 25
5.4.2 tidy data ............................................................................................................................. 26
5.5 reshaping data .......................................................................................................................... 26
5.5.1 wide to long ........................................................................................................................ 26
5.5.2 long to wide ........................................................................................................................ 27
chapter 6: planning a study ................................................................................................................. 28
6.1 introduction .............................................................................................................................. 28
6.2 factors influencing statistical power........................................................................................... 28
6.3 methods for conduction power analysis ..................................................................................... 29
6.4 the superpower package ........................................................................................................... 29
, 6.4.1 specifying the design ..................................................................................................... 30
6.5 one-way ANOVA ....................................................................................................................... 31
6.6 two-way ANOVA ........................................................................................................................ 31
6.7 additional considerations .......................................................................................................... 31
chapter 7: data relations .................................................................................................................... 32
7.1 functions used .......................................................................................................................... 32
7.2 set-up ...................................................................................................................................... 32
7.3 loading data.............................................................................................................................. 32
7.4 mutating joins ........................................................................................................................... 32
7.4.1 left_join() ............................................................................................................................ 32
7.4.2 right_join() .......................................................................................................................... 33
7.4.3 inner_join() ......................................................................................................................... 33
7.4.4 full_join() ............................................................................................................................ 33
7.5 filtering joins ............................................................................................................................. 33
7.5.1 semi_join() ......................................................................................................................... 33
7.5.2 anti_join() ........................................................................................................................... 33
7.6 multiple joins ............................................................................................................................ 33
7.7 binding joins ............................................................................................................................. 34
7.7.1 bind_rows() ........................................................................................................................ 34
7.7.2 bind_cols() ......................................................................................................................... 34
7.8 set operations........................................................................................................................... 34
7.8.1 intersect() .......................................................................................................................... 34
7.8.2 union() ............................................................................................................................... 34
7.8.3 setdiff() .............................................................................................................................. 34
,SUMMARY STATISTICS 5
CHAPTER 1: VISUALIZATION
1.1 FUNCTIONS USED
• built-in (you can always use these without loading any packages)
o base:: c(), seq(), read.csv()
• tidyverse (you can use all these with library(tidyverse))
o readr:: read_csv(), read_csv2(), read_tsv(), read_delim()
o dplyr:: count(), glimpse(),
o ggplot2:: aes(), coord_cartesian(), element_blank() geom_bar(), geom_boxplot(),
geom_col(), geom_histogram(), geom_jitter(), geom_point(), geom_smooth(), ggplot(),
ggtitle(), guides(), scale_fill_manual(), scale_x_continuous(), scale_x_discrete(),
scale_y_continuous(), theme(), theme_bw(), theme_minimal(), theme_set(), ggsave()
• other (you need to load each package to use these)
o ggthemes:: theme_gdocs()
o patchwork:: plot_layout()
o ggdistr:: geom_dots(), stat_halfeye()
1.4 SETTING UP
1. create a working directory
- save data and code
2. save the data
3. open and save a Rscript
- file → new file → R script
4. specify the working directory
- session → set working directory → to source file location
5. install and load the necessary packages
6. read in the data
- Rscript and data must be in the same folder (= working directory)
- dfsatisf <- read_csv("OWID_gdp-vs-happiness.csv")
o returns a tibble
- if subfolder
o dfsatisf <- read_csv("./data/lecture1/OWID_gdp-vs-happiness.csv")
o ./ = current folder ; data/lecture1 = subfolder
7. Check your data
- Call data:
o Tibble ( = rows, → = columns/variables)
o Variables can be <chr> = character variable; <dbl> = variable with real numbers;
<int> = integers (whole numbers); <dttm> = date and time
- Glimpse():
o Tibble ( = rows/variables, → = columns)
,1.5 VISUALIZATION
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point() +
geom_smooth(method = lm, formula = y~x)
geom_smooth(method = “gam”)
• ggplot() → start function
• dfsatisf → tibble or data frame
• aes(x,y) → mapping
o wich columns in the data should be mapped to different aspects (aesthetics) of the plot
o x and y → names of columns that should be plotted
• + → add layers (= geoms) onto base plot
o Layers display in the order you set them up
• geom_point() → add scatterplot
• geom_smooth(method = lm, formula = y~x) → add line of best fit
• geom_smooth(method = “gam”) → add flexible smooth regression line
Save plot as an object by assigning them (plot <- ggplot)
point_first + line_first + plot_layout(nrow = 1)
• Add plots together in 1 row
ggsave(“name”, name plot)
• Save plot as a file to the current working directory
1.6 CUSTOMIZING PLOTS
See appendix A for detailed instructions
Change overall style of geom:
• Color
• Alpha (= transparency)
• Shape
• Size
• Linetype
Change style depending on which category:
• Set argument inside the aes() function
• Example:
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness, color = Continent)) +
geom_point(
alpha = 0.4, # 40% transparency
shape = 20, # solid circle
size = 2
)+
, geom_smooth(
method = lm,
formula = y ~ x, # formula used to draw line,
# setting method & formula avoids an annoying message
linetype = 1
)
Change axes → scale_functions
• Use a scale function that matches the you’re plotting (continuous)
scale_x_continuous(
name = "GDP per capita (in dollars)",
trans = "log10",
breaks = c(1000, 2000, 5000, 10000, 20000, 50000, 100000),
labels = c("$1000", "$2000", "$5000", "$10000", "$20000", "$50000", "$100000")
)
o Name = … → change axis label
o Trans = “log 10” → logarithmically scales axis
o Breaks = … → set the major units and needs a vector of possible values
o Labels = …. → add labels to breaks
• Axis limits
coord_cartesian(ylim = c(0, 10))
o Change minimum and maximus of y-axis
Making dots proportional to size variable
geom_point(aes(size = PopSize),
alpha = 0.4, # 40% transparency
shape = 20 # solid circle
)+
scale_size_continuous(
name = "Population Size",
range = c(2, 15),
breaks = c(10e6, 100e6, 1e9),
labels = c("10 million", "100 million", "1 billion")
)
o Insert a new aesthetic mapping aes(size = variable) into geom_point()
o Scale_size_continuous() → increase size dots
▪ Name → name of legend
▪ Range → adjust size range of the points
▪ Breaks → define specific breaks for the legend
▪ Labels → labels for the breaks
Add show.legend = FALSE to the parts that show in the legend if you don’t want them
Themes:
• Built-in in ggplot2
o Example: theme_bw(), theme_minimal()
,Add text to the plot:
geom_text(aes(label = Country),
hjust = 1.1,
vjust = 1.1
size = 3,
check_overlap = TRUE, show.legend = FALSE
)
o Geom_text() → Add text labels (e.g., country names)
o Hjust, Vjust → Adjust label position
o Size → Size of the text
o Check_overlap → Avoid overlapping labels
o Show.legend → text should not appear in legend
Full code:
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness, color = Continent)) +
geom_point(aes(size = PopSize),
alpha = 0.4, # 40% transparency
shape = 20 # solid circle
)+
scale_size_continuous(
name = "Population Size",
range = c(2, 15), # Adjust the size range of the points
breaks = c(10e6, 100e6, 1e9), # Define specific breaks for the legend
labels = c("10 million", "100 million", "1 billion") # Labels for the breaks
)+
geom_text(aes(label = Country), # Add text labels (e.g., country names)
hjust = 1.1, vjust = 1.1, # Adjust label position
size = 3, # Size of the text
check_overlap = TRUE, # Avoid overlapping labels
show.legend = FALSE # text should not appear in legend
)+
geom_smooth(
method = lm,
formula = y ~ x, # formula used to draw line,
# setting method & formula avoids an annoying message
color = rgb(0, .5, .8),
linetype = 1
)+
scale_x_continuous(
name = "GDP per capita (in dollars)",
trans = "log10",
breaks = c(1000, 2000, 5000, 10000, 20000, 50000, 100000),
labels = c("$1000", "$2000", "$5000", "$10000", "$20000", "$50000", "$100000")
)+
scale_y_continuous(
name = "Life satisfaction (country average; 0-10)"
)
,1.7 OTHER PLOT TYPES
See cheat sheet
• Geom_point
o Scatterplot
• Geom_bar
o Bar plot
• Geom_col
o Bar plot with column with numbers as y-axis
• Geom_histogram
o Distribution of one continuous variable
o Set binwidth to something meaningful
• Geom_density
o Density plot
• Geom_boxplot
o Boxplot
1.8 MORE VISUALIZATIONS
• Comparing distributions across different levels of a categorical variable
o Ggridges packages
ggplot(dfsatisf, aes(x = Happiness, y = Continent, fill = Continent, color = Continent)) +
geom_density_ridges(alpha = 0.5, show.legend = FALSE)
,CHAPTER 2: WORKFLOW BASICS
2.1 FUNCTIONS USED
• built-in (you can always use these without loading any packages)
o base:: c(), library(), sum(), mean(), any(), all(), median(), sd(), var(), quantile(), log(),
log10(), exp(), round(), floor(), ceiling(), min(), max(), pmin(), pmax()
• tidyverse (you can use all these with library(tidyverse))
o readr:: read_csv(), read_csv2(), read_tsv(), read_delim(), write_csv(), write_csv2(),
write_tsv(), write_delim()
o dplyr:: glimpse(), near(), if_else(), case_when(), count()
o forcats:: fct(), fct_relevel()
o ggplot2:: ggplot(), geom_line(), scale_x_continuous()
o tidyr:: complete()
o tibble:: tibble()
• other (you need to load each package to use these)
o janitor:: clean_names()
o haven:: read_spss()
o readxl:: read_excel()
o afex:: aov_ez()
2.2 WORKFLOW BASICS
1. Create a separate folder per project on your computer. Make sure that this folder has some
backup.
2. Use subfolders with meaningful names for data-code-results-etc
3. The main folder (in the example: Master_thesis) should contain a file called “readme.txt”. This is
a plain text file that contains some general information on the folder structure and what the
content is. On a Windows computer, a plain text file can be created with Notepad.
4. The data folder can be further divided into a subfolder for the raw data (as they have been
generated by the measurement device such as an experiment computer) and the processed data
(after having processed the data using the tools and methods discussed in this course).
5. Provide metadata = data about the data
, Rules :
1. Names of files and folders should reflect content and should be understood also without
knowing the folder it belongs to. So, rather use “Data_Exp1_FearGeneralization.csv” than
“data.csv”.
2. Use 3-4 elements to make up the name and order these elements from general to specific.
Creation data can be added at the end in the following format: YYYYMMDD or YYYY-MM-DD.
3. Use hyphens (-) or underscores (_) to separate elements.
4. Avoid using spaces (blanks) or special characters (&“’%$^<>/).
5. Avoid too long names (over 30 characters).
6. If you need to keep track of several versions of a document (common for a file with text of a
manuscript), do not use suffixes such as “final” or “draft” but rather use the date or a version
indicator (Version1, Version2, etc.).
7. If there is a natural ordering of files, then start names by a number so that they are sorted and
presented in the correct order.
R PROJECT
1. Prevent Rstudio from saving your workspace after you close your R session
2.
3. Create and save a Rscript
At the top right (in red) you see the current project name. Bottom right (in orange) you see the
various files that are part of the project. At the bottom left (in green) you see the current working
directory. At the top left there is the current script.