A<-c(2,4,6,8), Replace value in a vector: B[2]<-1 & Naming vectors: names(file) <- c("name1", "name2", "name3")
Special type of vectors are factors. Factors are used to store categorical data
Unit 1+2 Vector
example1 <- factor(c("right", "left", "left", "right")), levels(example1) # with the function levels you can check the factor levels
(1d)
Matrix (2d): 1 A<-matrix(c(1:9),nrow=3,ncol=3,byrow=TRUE) & Replace elements in a matrix: Q["Rafael",]<-1 or Replace elements in a matrix: File["1", ]<-1, changes
data type entire row 1 to 1
Retrieve element in matrix: my_first_matrix[2, 3], row 2, column 3
Naming row and col names matrix:
rownames(file) <- c("", "", "") or to a numeric: rownames(IUCN_mammals) <- c(1:nrow(IUCN_mammals))
colnames(file) <- c("", "") or if you want to change not all cols: colnames(iris)[c(1:3)] <- c("", "",””)
Data frame (3d): Nile_data <- data.frame(Year = c(1871:1970), Flow = Nile) or datfr <- data.frame(x = c(1:3), y = c("A", "B", "C"))
can 2 data types Replace elements in a data frame: b[19,1]<-"Sara"
Array (3D+): height1 <- c(160, 182, 183, NA, 201) # height in 2000, weight1 <- c(60, 80, 76, 72, 95) # weight in 2000
more than two height2 <- c(160, 182, 183, NA, 201) # height in 2020
dimensions, but weight2 <- c(63, 78, 77, 71, 90) # weight in 2020
only one type dataset <- array(c(height1, height2, weight1, weight2), dim=c(5, 2, 2)), because 5 measurements per x and 2 variables per height and weight
List: different shortv <- c(1, 2, 3) & shortm <- matrix(c(1:4), nrow = 2, ncol = 2, byrow = TRUE) & shortf <- factor(c("medium", "high", "low"))
types and sizes somelist <- list(vector = shortv, matrix = shortm, factor = shortf) & somelist (to see the list)
Retrieving Data frame
elements df[3,1]: row 3, column 1 & df[c(2,4), c(1,2)]: rows 2 & 4, columns 1 & 2 & df$var2: select column var2 & df$var2[1:3]: first 3 values of var2
Array
arr[3,2,1]: row 3, column 2, matrix 1 & arr[3:5,1:2,2]: rows 3–5, columns 1–2, matrix 2
List
[[ ]]: select component & [ ]: select element inside component
lst[[1]]: first component & lst[[1]][5]: element 5 inside component & lst[[3]][3,2,1]: element inside matrix/array in component
Class / Types / Convert vec in matrix: m <- as.matrix(data.frame(Diameter = x[1:10],Height = x[11:20],Volume = x[21:30])) & Convert to a data frame:
Conversion, look as.data.frame(my_data)
for conv numb, Convert to a factor: data_savanna$SPP <- as.factor(data_savanna$SPP)
charc, logical on Convert to a character: trees<-as.character(unique(Orange$Tree))
page 23 Convert between data types example:
b_numeric <- as.numeric(file). Or again character etc.
trees_converted_df[, 1] <- as.numeric(trees_converted_df[, 1])
here you select variable 1 in de df and change it from character to numeric
Unit 3+4 Data data(iris) & information: ?iris & str(iris) & head(iris_width) & length(unique(rice$region)), use unique for length
information
Add/modify Add colum: iris$Flower_ID<-rownames(iris)
data Replace elements of a column:
students_clean$lang_correct[students_clean$class==10980]<-students_clean$lang_correct[students_clean$class==10980]+1
Paste
rownames(iris) <- paste("flower", rownames(iris), sep="_"), this adds flower before every row name
Change to lower case letters
mammals_data_combined$kingdom_name <- tolower(mammals_data_combined$kingdom_name)
Missing values any(is.na(iris)) & Sum: sum(is.na(airquality$Ozone)) & Remove: iris_no_na<-iris[complete.cases(iris),]
(NA) If you want to calculate the mean in one column: mean(Home_range$Body_mass_kg, na.rm = TRUE)
Duplicates duplicated(co2) & anyDuplicated(IUCN_combined_unique) & any(duplicated(students$studentID))
Remove duplicates: unique_iris <- unique(iris)
Remove: unique_co2<-unique(co2)
Merge / Join / mammals_data_combined<-merge(x=IUCN_mammals,y=mammalsFunctionalData,by="scientific_name",all=FALSE)
Bind Columns: cbind(countries_iris,head_iris_width)
Rows: IUCN_combined<-data.frame(rbind(IUCN_mammals,IUCN_primates_not_lemuridae))
rbind(x, y) & Stack datasets vertically (add rows). & Datasets must have the same columns. & cbind(x, y) Combine datasets horizontally (add columns).
& Datasets must have the same number of rows. & merge(x, y,)
Options: by.x or by.y= “Girth”, which column you want to match
all = FALSE → only matching rows (default) & all = TRUE → all rows from both datasets & all.x = TRUE → all rows from x & all.y = TRUE → all rows from y
Subset / Equal: esoph [esoph$ncases==0,] & More than: subset(esoph,ncontrols>2*ncases) & More or equal to: subset(esoph,ncontrols>=10) & Less than: < &Less or
Filtering equal to: <= & Not to: subset(esoph,!ncontrols<2*ncases) & And: esoph[esoph$tobgp=="30+"&esoph$ncases==”0”,]
Or: esoph[esoph$tobgp=="0-9g/day"|esoph$alcgp=="0-39g/day",]
Or and: USArrests_subset <- USArrests[(USArrests$Murder >= 10 | USArrests$Assault < 100) & (!USArrests$US_region =="South"),]
Select rows/ columns: esoph[c(1:4),c(4:5)]
Not select rows/ columns: IUCN_mammals_removed<-IUCN_mammals[-seq(from=10,to=nrow(IUCN_mammals),by=10),-c(3,6,9)]
Inside: students_clean[students_clean$lang%in%box_lang$out,]
esoph[(esoph$ncases > 10 | esoph$ncases < 1), ], using multiple conditions (logical operations)
To select a character using subset: subset_trees <- subset( Nijmegen_trees, Plant_year >= 1970 & Plant_year <= 1980 & Height_in_m >= 15)
IUCN_mammals <- subset(IUCN_species, class_name == "MAMMALIA")
Select rows: quest: Retrieve the home range size and body mass of the individuals in rows 3, 250-273 and 394-395 using one line of code
&subset_home_range <- subset(Home_range, select = c(Home_Range_km2, Body_mass_kg))[c(3, 250:273, 394:395), ]
Ratio (in%): ratio <- nrow(IUCN_primates) / nrow(IUCN_mammals) & print(ratio * 100)
Sequence esoph[seq(from=1,to=4,by=1),seq(from=1,to=5,by=2)]
Ommiting data To omit data you can use either the - or !: esoph[-c(20:88), ] # Omit rows 20 - 88 & esoph[!esoph$ncases > 5, ] # All observations with ncases equal to 6 or
more are removed & subset(esoph, !ncases >= 1) # Only observations with ncases less than 1 are retained
Order / Sorting esoph[order(esoph$agegp,esoph$ncases,decreasing = TRUE),] & IUCN_species_ordered <- IUCN_species[order(IUCN_species$order_name), ], for
alphabetically & trees_ordered <- trees[order(trees$Height, decreasing = FALSE),]) & retrieve first 7:trees_ordered <- trees_ordered[1:7,])
sort(esoph$tobgp) & sort(esoph$ncases, decreasing = TRUE)
The order() function ranks the indexes of the elements in a vector.
Works the same as sort: esoph[order(esoph$ncases, decreasing = TRUE), ], you see the whole dataset, same thing
Special type of vectors are factors. Factors are used to store categorical data
Unit 1+2 Vector
example1 <- factor(c("right", "left", "left", "right")), levels(example1) # with the function levels you can check the factor levels
(1d)
Matrix (2d): 1 A<-matrix(c(1:9),nrow=3,ncol=3,byrow=TRUE) & Replace elements in a matrix: Q["Rafael",]<-1 or Replace elements in a matrix: File["1", ]<-1, changes
data type entire row 1 to 1
Retrieve element in matrix: my_first_matrix[2, 3], row 2, column 3
Naming row and col names matrix:
rownames(file) <- c("", "", "") or to a numeric: rownames(IUCN_mammals) <- c(1:nrow(IUCN_mammals))
colnames(file) <- c("", "") or if you want to change not all cols: colnames(iris)[c(1:3)] <- c("", "",””)
Data frame (3d): Nile_data <- data.frame(Year = c(1871:1970), Flow = Nile) or datfr <- data.frame(x = c(1:3), y = c("A", "B", "C"))
can 2 data types Replace elements in a data frame: b[19,1]<-"Sara"
Array (3D+): height1 <- c(160, 182, 183, NA, 201) # height in 2000, weight1 <- c(60, 80, 76, 72, 95) # weight in 2000
more than two height2 <- c(160, 182, 183, NA, 201) # height in 2020
dimensions, but weight2 <- c(63, 78, 77, 71, 90) # weight in 2020
only one type dataset <- array(c(height1, height2, weight1, weight2), dim=c(5, 2, 2)), because 5 measurements per x and 2 variables per height and weight
List: different shortv <- c(1, 2, 3) & shortm <- matrix(c(1:4), nrow = 2, ncol = 2, byrow = TRUE) & shortf <- factor(c("medium", "high", "low"))
types and sizes somelist <- list(vector = shortv, matrix = shortm, factor = shortf) & somelist (to see the list)
Retrieving Data frame
elements df[3,1]: row 3, column 1 & df[c(2,4), c(1,2)]: rows 2 & 4, columns 1 & 2 & df$var2: select column var2 & df$var2[1:3]: first 3 values of var2
Array
arr[3,2,1]: row 3, column 2, matrix 1 & arr[3:5,1:2,2]: rows 3–5, columns 1–2, matrix 2
List
[[ ]]: select component & [ ]: select element inside component
lst[[1]]: first component & lst[[1]][5]: element 5 inside component & lst[[3]][3,2,1]: element inside matrix/array in component
Class / Types / Convert vec in matrix: m <- as.matrix(data.frame(Diameter = x[1:10],Height = x[11:20],Volume = x[21:30])) & Convert to a data frame:
Conversion, look as.data.frame(my_data)
for conv numb, Convert to a factor: data_savanna$SPP <- as.factor(data_savanna$SPP)
charc, logical on Convert to a character: trees<-as.character(unique(Orange$Tree))
page 23 Convert between data types example:
b_numeric <- as.numeric(file). Or again character etc.
trees_converted_df[, 1] <- as.numeric(trees_converted_df[, 1])
here you select variable 1 in de df and change it from character to numeric
Unit 3+4 Data data(iris) & information: ?iris & str(iris) & head(iris_width) & length(unique(rice$region)), use unique for length
information
Add/modify Add colum: iris$Flower_ID<-rownames(iris)
data Replace elements of a column:
students_clean$lang_correct[students_clean$class==10980]<-students_clean$lang_correct[students_clean$class==10980]+1
Paste
rownames(iris) <- paste("flower", rownames(iris), sep="_"), this adds flower before every row name
Change to lower case letters
mammals_data_combined$kingdom_name <- tolower(mammals_data_combined$kingdom_name)
Missing values any(is.na(iris)) & Sum: sum(is.na(airquality$Ozone)) & Remove: iris_no_na<-iris[complete.cases(iris),]
(NA) If you want to calculate the mean in one column: mean(Home_range$Body_mass_kg, na.rm = TRUE)
Duplicates duplicated(co2) & anyDuplicated(IUCN_combined_unique) & any(duplicated(students$studentID))
Remove duplicates: unique_iris <- unique(iris)
Remove: unique_co2<-unique(co2)
Merge / Join / mammals_data_combined<-merge(x=IUCN_mammals,y=mammalsFunctionalData,by="scientific_name",all=FALSE)
Bind Columns: cbind(countries_iris,head_iris_width)
Rows: IUCN_combined<-data.frame(rbind(IUCN_mammals,IUCN_primates_not_lemuridae))
rbind(x, y) & Stack datasets vertically (add rows). & Datasets must have the same columns. & cbind(x, y) Combine datasets horizontally (add columns).
& Datasets must have the same number of rows. & merge(x, y,)
Options: by.x or by.y= “Girth”, which column you want to match
all = FALSE → only matching rows (default) & all = TRUE → all rows from both datasets & all.x = TRUE → all rows from x & all.y = TRUE → all rows from y
Subset / Equal: esoph [esoph$ncases==0,] & More than: subset(esoph,ncontrols>2*ncases) & More or equal to: subset(esoph,ncontrols>=10) & Less than: < &Less or
Filtering equal to: <= & Not to: subset(esoph,!ncontrols<2*ncases) & And: esoph[esoph$tobgp=="30+"&esoph$ncases==”0”,]
Or: esoph[esoph$tobgp=="0-9g/day"|esoph$alcgp=="0-39g/day",]
Or and: USArrests_subset <- USArrests[(USArrests$Murder >= 10 | USArrests$Assault < 100) & (!USArrests$US_region =="South"),]
Select rows/ columns: esoph[c(1:4),c(4:5)]
Not select rows/ columns: IUCN_mammals_removed<-IUCN_mammals[-seq(from=10,to=nrow(IUCN_mammals),by=10),-c(3,6,9)]
Inside: students_clean[students_clean$lang%in%box_lang$out,]
esoph[(esoph$ncases > 10 | esoph$ncases < 1), ], using multiple conditions (logical operations)
To select a character using subset: subset_trees <- subset( Nijmegen_trees, Plant_year >= 1970 & Plant_year <= 1980 & Height_in_m >= 15)
IUCN_mammals <- subset(IUCN_species, class_name == "MAMMALIA")
Select rows: quest: Retrieve the home range size and body mass of the individuals in rows 3, 250-273 and 394-395 using one line of code
&subset_home_range <- subset(Home_range, select = c(Home_Range_km2, Body_mass_kg))[c(3, 250:273, 394:395), ]
Ratio (in%): ratio <- nrow(IUCN_primates) / nrow(IUCN_mammals) & print(ratio * 100)
Sequence esoph[seq(from=1,to=4,by=1),seq(from=1,to=5,by=2)]
Ommiting data To omit data you can use either the - or !: esoph[-c(20:88), ] # Omit rows 20 - 88 & esoph[!esoph$ncases > 5, ] # All observations with ncases equal to 6 or
more are removed & subset(esoph, !ncases >= 1) # Only observations with ncases less than 1 are retained
Order / Sorting esoph[order(esoph$agegp,esoph$ncases,decreasing = TRUE),] & IUCN_species_ordered <- IUCN_species[order(IUCN_species$order_name), ], for
alphabetically & trees_ordered <- trees[order(trees$Height, decreasing = FALSE),]) & retrieve first 7:trees_ordered <- trees_ordered[1:7,])
sort(esoph$tobgp) & sort(esoph$ncases, decreasing = TRUE)
The order() function ranks the indexes of the elements in a vector.
Works the same as sort: esoph[order(esoph$ncases, decreasing = TRUE), ], you see the whole dataset, same thing