4 types of graphical summaries (across subgroups) - 1. bar plots (categorical data)
2. histograms
3. box plots
4. scatter plots
3 major systems for plotting - 1. base r (built-in functions)
2. lattice
3. ggplot2 (sort of part of the tidyverse)
lattice - great for doing multiple plots (not part of tidyverse)
which major system for plotting will we be using - ggplot2
what ggplot2 code will create a plot instance - ggplot(data = data_frame)
what functions of ggplot2 will add layers (visualization) to the plot - geom or stat
which function in ggplot2 allows us to modify layer "mapping" args - aes()
what does it mean to modify layers - to map variables to attributes of the plot
ex:
size, color, x variable, y variable
how do you create a ggplot2 barplot - ggplot() + geom_bar()
how do you specify the categories that go across the x axis in a bar plot - aes(x = ...)
ex:
ggplot(data = titanicData, aes(x = survived))
what must you add to the bar plot? - either geom or stat layer
ex:
ggplot(data = titanicData, aes(x = survived) + geom_bar)
general instructions for making ggplot2 bar plot - 1 - save base object with global
aes() assignments
2 - add layers
what does the ggplot function itself create? - a base plotting object with global
aesthetics
What symbol is used to add another layer to a ggplot object? - +
What geom layer below would be used in creating a bar plot? - geom_bar()
factor - special class of vector with a levels attribute
, levels - define all possible values for that variable
ex:
define possible values 1-7 for a variable like Day which goes from monday through
sunday
- prevents r from thinking that a categorical variable is numerical
why are factors great for plotting - 1. you can order the levels and give nicer labels
example of creating a new factor version of the "survived" variable - titanicData <-
titanicData%>%
mutate(mySurvived = as.factor(survived))
str(titanicData$mySurvived)
- "survived" has 2 levels; "0" (died) or "1"(survived).
example of creating better labels for the 2 levels of "survived" variable -
levels(titanicData$mySurvived) <- c("Died", "Survived")
levels(titanicData$mySurvived)
- turns "0" into "Died" and "1" into "Survived"
example of changing the ordering for the 2 levels of "survived" variable - titanicData
<- titanicData%>%
mutate(mySurvived = factor(mySurvived, levels = c("Survived", "Died")))
table(titanicData$mySurvived)
- puts "survived" on the left of "died", instead of the opposite original composition
2 ways to prepare our data - 1. convert another categorical variable to a factor for
better plotting
2. drop any rows with missing values for any of these variables
example of converting another categorical variable to a factor for better plotting -
titanicData <- titanicData%>%
mutate(myEmbarked = as.factor(embarked))
levels(titanicData$myEmbarked)<- c("Cherbourg", "Queenstown", "Southampton")
example of dropping any rows with missing values for any of these variables -
titanicData<-titanicData%>%
drop_na(mySurvived, sex, myEmbarked)
aes() - defines visual properties of objects in the plot
map variables in the data frame to plot elements:
x = , y = , size = , shape = , color = , alpha = , ...
most common properties for a given geom - d + geom_bar()