Samenvatting

Summary Statistics for Psychologists, part 5: Introduction to Data Science

Beoordeling

Verkocht

Pagina's

Geüpload op

07-04-2026

Geschreven in

2025/2026

Samenvatting van Statistics for Psychologists, part 5: Introduction to Data Science. Handig tijdens de openboek testen aangezien het een handig overzicht is om alles snel terug te vinden!

Instelling

Vak

Voorbeeld van de inhoud

INHOUD

chapter 1: visualization ........................................................................................................................4
1.1 functions used ............................................................................................................................4
1.4 setting up ...................................................................................................................................4
1.5 visualization ...............................................................................................................................5
1.6 customizing plots........................................................................................................................5
1.7 other plot types ...........................................................................................................................8
1.8 more visualizations .....................................................................................................................8
chapter 2: workflow basics ...................................................................................................................9
2.1 functions used ............................................................................................................................9
2.2 workflow basics ..........................................................................................................................9
R project..................................................................................................................................... 10
code style ................................................................................................................................... 11
2.3 set up ....................................................................................................................................... 11
2.4 import data : csv files ................................................................................................................ 11
2.5 problems .................................................................................................................................. 12
2.6 import other data types ............................................................................................................. 12
2.7 variable type ............................................................................................................................. 13
logical vectors ............................................................................................................................ 13
numbers..................................................................................................................................... 14
factors ....................................................................................................................................... 14
missing values ............................................................................................................................ 15
chapter 3 : data transformation .......................................................................................................... 16
3.1 functions used .......................................................................................................................... 16
3.3 the basics of dplyr ..................................................................................................................... 16
3.4 the pipe operator : |> ................................................................................................................. 16
3.5 operations on rows ................................................................................................................... 16
3.6 operations on columns.............................................................................................................. 17
3.7 groups ...................................................................................................................................... 17
3.9 saving data ............................................................................................................................... 18
chapter 4: reproducible reports .......................................................................................................... 19
4.2 functions used .......................................................................................................................... 19
4.3 required packages .................................................................................................................... 19
4.4 why use reproducible reports? ................................................................................................... 19
4.5 projects .................................................................................................................................... 19
4.5.1 start a project in Rstudio ..................................................................................................... 19

, 4.6 introduction to quarto ............................................................................................................... 20
4.6.1 what is quarto? ................................................................................................................... 20
4.6.2 why use it? ......................................................................................................................... 20
4.6.3 getting started with quarto .................................................................................................. 20
4.6.4 source versus visual editor .................................................................................................. 20
4.7 quarto syntax and structure ....................................................................................................... 20
4.7.1 metadata (YAML) ................................................................................................................ 20
4.7.2 markdown .......................................................................................................................... 20
4.7.3 code chunks....................................................................................................................... 21
4.7.4 running code ...................................................................................................................... 21
4.7.5 inline code ......................................................................................................................... 21
4.7.6 rendering your file ............................................................................................................... 21
4.8 writing a report .......................................................................................................................... 22
4.8.1 setup chunk ....................................................................................................................... 22
4.8.2 chunks options ................................................................................................................... 22
4.8.3 online sources .................................................................................................................... 22
4.8.4 local data files .................................................................................................................... 22
4.8.5 data analysis ...................................................................................................................... 22
4.8.6 code comments ................................................................................................................. 23
4.8.7 images ............................................................................................................................... 23
4.8.8 tables ................................................................................................................................ 23
4.8.9 cross references ................................................................................................................. 24
4.9 exploring the palmerpenguins dataset ........................................................................................ 24
chapter 5: data tidying ........................................................................................................................ 25
5.2 functions used .......................................................................................................................... 25
5.3 set-up ...................................................................................................................................... 25
5.4 tidy and untidy data ................................................................................................................... 25
5.4.1 untidy data ......................................................................................................................... 25
5.4.2 tidy data ............................................................................................................................. 26
5.5 reshaping data .......................................................................................................................... 26
5.5.1 wide to long ........................................................................................................................ 26
5.5.2 long to wide ........................................................................................................................ 27
chapter 6: planning a study ................................................................................................................. 28
6.1 introduction .............................................................................................................................. 28
6.2 factors influencing statistical power........................................................................................... 28
6.3 methods for conduction power analysis ..................................................................................... 29
6.4 the superpower package ........................................................................................................... 29

, 6.4.1 specifying the design ..................................................................................................... 30
6.5 one-way ANOVA ....................................................................................................................... 31
6.6 two-way ANOVA ........................................................................................................................ 31
6.7 additional considerations .......................................................................................................... 31
chapter 7: data relations .................................................................................................................... 32
7.1 functions used .......................................................................................................................... 32
7.2 set-up ...................................................................................................................................... 32
7.3 loading data.............................................................................................................................. 32
7.4 mutating joins ........................................................................................................................... 32
7.4.1 left_join() ............................................................................................................................ 32
7.4.2 right_join() .......................................................................................................................... 33
7.4.3 inner_join() ......................................................................................................................... 33
7.4.4 full_join() ............................................................................................................................ 33
7.5 filtering joins ............................................................................................................................. 33
7.5.1 semi_join() ......................................................................................................................... 33
7.5.2 anti_join() ........................................................................................................................... 33
7.6 multiple joins ............................................................................................................................ 33
7.7 binding joins ............................................................................................................................. 34
7.7.1 bind_rows() ........................................................................................................................ 34
7.7.2 bind_cols() ......................................................................................................................... 34
7.8 set operations........................................................................................................................... 34
7.8.1 intersect() .......................................................................................................................... 34
7.8.2 union() ............................................................................................................................... 34
7.8.3 setdiff() .............................................................................................................................. 34

,SUMMARY STATISTICS 5
CHAPTER 1: VISUALIZATION

1.1 FUNCTIONS USED

• built-in (you can always use these without loading any packages)
o base:: c(), seq(), read.csv()
• tidyverse (you can use all these with library(tidyverse))
o readr:: read_csv(), read_csv2(), read_tsv(), read_delim()
o dplyr:: count(), glimpse(),
o ggplot2:: aes(), coord_cartesian(), element_blank() geom_bar(), geom_boxplot(),
geom_col(), geom_histogram(), geom_jitter(), geom_point(), geom_smooth(), ggplot(),
ggtitle(), guides(), scale_fill_manual(), scale_x_continuous(), scale_x_discrete(),
scale_y_continuous(), theme(), theme_bw(), theme_minimal(), theme_set(), ggsave()
• other (you need to load each package to use these)
o ggthemes:: theme_gdocs()
o patchwork:: plot_layout()
o ggdistr:: geom_dots(), stat_halfeye()

1.4 SETTING UP

1. create a working directory
- save data and code
2. save the data
3. open and save a Rscript
- file → new file → R script
4. specify the working directory
- session → set working directory → to source file location
5. install and load the necessary packages
6. read in the data
- Rscript and data must be in the same folder (= working directory)
- dfsatisf <- read_csv("OWID_gdp-vs-happiness.csv")
o returns a tibble
- if subfolder
o dfsatisf <- read_csv("./data/lecture1/OWID_gdp-vs-happiness.csv")
o ./ = current folder ; data/lecture1 = subfolder
7. Check your data
- Call data:
o Tibble ( = rows, → = columns/variables)
o Variables can be <chr> = character variable; <dbl> = variable with real numbers;
<int> = integers (whole numbers); <dttm> = date and time
- Glimpse():
o Tibble ( = rows/variables, → = columns)

,1.5 VISUALIZATION
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point() +
geom_smooth(method = lm, formula = y~x)
geom_smooth(method = “gam”)
• ggplot() → start function
• dfsatisf → tibble or data frame
• aes(x,y) → mapping
o wich columns in the data should be mapped to different aspects (aesthetics) of the plot
o x and y → names of columns that should be plotted
• + → add layers (= geoms) onto base plot
o Layers display in the order you set them up
• geom_point() → add scatterplot
• geom_smooth(method = lm, formula = y~x) → add line of best fit
• geom_smooth(method = “gam”) → add flexible smooth regression line

Save plot as an object by assigning them (plot <- ggplot)

point_first + line_first + plot_layout(nrow = 1)
• Add plots together in 1 row

ggsave(“name”, name plot)
• Save plot as a file to the current working directory

1.6 CUSTOMIZING PLOTS

See appendix A for detailed instructions

Change overall style of geom:

• Color
• Alpha (= transparency)
• Shape
• Size
• Linetype

Change style depending on which category:

• Set argument inside the aes() function
• Example:

ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness, color = Continent)) +
geom_point(
alpha = 0.4, # 40% transparency
shape = 20, # solid circle
size = 2
)+

, geom_smooth(
method = lm,
formula = y ~ x, # formula used to draw line,
# setting method & formula avoids an annoying message
linetype = 1
)

Change axes → scale_functions
• Use a scale function that matches the you’re plotting (continuous)
scale_x_continuous(
name = "GDP per capita (in dollars)",
trans = "log10",
breaks = c(1000, 2000, 5000, 10000, 20000, 50000, 100000),
labels = c("$1000", "$2000", "$5000", "$10000", "$20000", "$50000", "$100000")
)
o Name = … → change axis label
o Trans = “log 10” → logarithmically scales axis
o Breaks = … → set the major units and needs a vector of possible values
o Labels = …. → add labels to breaks
• Axis limits
coord_cartesian(ylim = c(0, 10))
o Change minimum and maximus of y-axis
Making dots proportional to size variable
geom_point(aes(size = PopSize),
alpha = 0.4, # 40% transparency
shape = 20 # solid circle
)+
scale_size_continuous(
name = "Population Size",
range = c(2, 15),
breaks = c(10e6, 100e6, 1e9),
labels = c("10 million", "100 million", "1 billion")
)
o Insert a new aesthetic mapping aes(size = variable) into geom_point()
o Scale_size_continuous() → increase size dots
▪ Name → name of legend
▪ Range → adjust size range of the points
▪ Breaks → define specific breaks for the legend
▪ Labels → labels for the breaks
Add show.legend = FALSE to the parts that show in the legend if you don’t want them
Themes:
• Built-in in ggplot2
o Example: theme_bw(), theme_minimal()

,Add text to the plot:
geom_text(aes(label = Country),
hjust = 1.1,
vjust = 1.1
size = 3,
check_overlap = TRUE, show.legend = FALSE
)
o Geom_text() → Add text labels (e.g., country names)
o Hjust, Vjust → Adjust label position
o Size → Size of the text
o Check_overlap → Avoid overlapping labels
o Show.legend → text should not appear in legend
Full code:
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness, color = Continent)) +
geom_point(aes(size = PopSize),
alpha = 0.4, # 40% transparency
shape = 20 # solid circle
)+
scale_size_continuous(
name = "Population Size",
range = c(2, 15), # Adjust the size range of the points
breaks = c(10e6, 100e6, 1e9), # Define specific breaks for the legend
labels = c("10 million", "100 million", "1 billion") # Labels for the breaks
)+
geom_text(aes(label = Country), # Add text labels (e.g., country names)
hjust = 1.1, vjust = 1.1, # Adjust label position
size = 3, # Size of the text
check_overlap = TRUE, # Avoid overlapping labels
show.legend = FALSE # text should not appear in legend
)+
geom_smooth(
method = lm,
formula = y ~ x, # formula used to draw line,
# setting method & formula avoids an annoying message
color = rgb(0, .5, .8),
linetype = 1
)+
scale_x_continuous(
name = "GDP per capita (in dollars)",
trans = "log10",
breaks = c(1000, 2000, 5000, 10000, 20000, 50000, 100000),
labels = c("$1000", "$2000", "$5000", "$10000", "$20000", "$50000", "$100000")
)+
scale_y_continuous(
name = "Life satisfaction (country average; 0-10)"
)

,1.7 OTHER PLOT TYPES

See cheat sheet

• Geom_point
o Scatterplot
• Geom_bar
o Bar plot
• Geom_col
o Bar plot with column with numbers as y-axis
• Geom_histogram
o Distribution of one continuous variable
o Set binwidth to something meaningful
• Geom_density
o Density plot
• Geom_boxplot
o Boxplot

1.8 MORE VISUALIZATIONS

• Comparing distributions across different levels of a categorical variable
o Ggridges packages
ggplot(dfsatisf, aes(x = Happiness, y = Continent, fill = Continent, color = Continent)) +
geom_density_ridges(alpha = 0.5, show.legend = FALSE)

,CHAPTER 2: WORKFLOW BASICS

2.1 FUNCTIONS USED

• built-in (you can always use these without loading any packages)
o base:: c(), library(), sum(), mean(), any(), all(), median(), sd(), var(), quantile(), log(),
log10(), exp(), round(), floor(), ceiling(), min(), max(), pmin(), pmax()
• tidyverse (you can use all these with library(tidyverse))
o readr:: read_csv(), read_csv2(), read_tsv(), read_delim(), write_csv(), write_csv2(),
write_tsv(), write_delim()
o dplyr:: glimpse(), near(), if_else(), case_when(), count()
o forcats:: fct(), fct_relevel()
o ggplot2:: ggplot(), geom_line(), scale_x_continuous()
o tidyr:: complete()
o tibble:: tibble()
• other (you need to load each package to use these)
o janitor:: clean_names()
o haven:: read_spss()
o readxl:: read_excel()
o afex:: aov_ez()

2.2 WORKFLOW BASICS

1. Create a separate folder per project on your computer. Make sure that this folder has some
backup.

2. Use subfolders with meaningful names for data-code-results-etc

3. The main folder (in the example: Master_thesis) should contain a file called “readme.txt”. This is
a plain text file that contains some general information on the folder structure and what the
content is. On a Windows computer, a plain text file can be created with Notepad.

4. The data folder can be further divided into a subfolder for the raw data (as they have been
generated by the measurement device such as an experiment computer) and the processed data
(after having processed the data using the tools and methods discussed in this course).

5. Provide metadata = data about the data

, Rules :

1. Names of files and folders should reflect content and should be understood also without
knowing the folder it belongs to. So, rather use “Data_Exp1_FearGeneralization.csv” than
“data.csv”.

2. Use 3-4 elements to make up the name and order these elements from general to specific.
Creation data can be added at the end in the following format: YYYYMMDD or YYYY-MM-DD.

3. Use hyphens (-) or underscores (_) to separate elements.

4. Avoid using spaces (blanks) or special characters (&“’%$^<>/).

5. Avoid too long names (over 30 characters).

6. If you need to keep track of several versions of a document (common for a file with text of a
manuscript), do not use suffixes such as “final” or “draft” but rather use the date or a version
indicator (Version1, Version2, etc.).

7. If there is a natural ordering of files, then start names by a number so that they are sorted and
presented in the correct order.

R PROJECT
1. Prevent Rstudio from saving your workspace after you close your R session
2.

3. Create and save a Rscript

At the top right (in red) you see the current project name. Bottom right (in orange) you see the
various files that are part of the project. At the bottom left (in green) you see the current working
directory. At the top left there is the current script.

Meld schending auteursrecht

Geschreven voor

Instelling: Katholieke Universiteit Leuven (KU Leuven)
Studie: Master In De Psychologie
Vak: Statistics for Psychologists, part 5 (P0X94A)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 7 april 2026
Bestand laatst geupdate op: 21 april 2026
Aantal pagina's: 34
Geschreven in: 2025/2026
Type: SAMENVATTING

Onderwerpen

statistiek
statistics
statistics part 5
wiskunde
eerste master
data analyse
data science
r
statistics for psychologists

€7,99

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

emmatheunis

Maak kennis met de verkoper

emmatheunis Katholieke Universiteit Leuven

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

5 uur geleden

0,0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper emmatheunis. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 49593 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary Statistics for Psychologists, part 5: Introduction to Data Science

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Meer vakken binnen Katholieke Universiteit Leuven (KU Leuven) > Master In De Psychologie

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?