Managing Data Frames with the dplyr package
The dplyr package in R is a powerful and flexible tool for data manipulation, allowing
you to work with data frames efficiently. It provides a set of verbs that make common
data tasks easier to accomplish. Here are some key functions in dplyr that are used
for managing data frames:
1. select(): Choose specific columns from a data frame
The select() function allows you to select columns based on their names or conditions.
library(dplyr)
# Select specific columns from a data frame
df <- data.frame(Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Salary = c(50000, 60000, 70000))
df %>%
select(Name, Age)
2. filter(): Filter rows based on conditions
The filter() function is used to filter rows that meet certain conditions.
# Filter rows where Age is greater than 30
df %>%
filter(Age > 30)
3. mutate(): Create new variables or modify existing ones
The mutate() function is used to add or modify columns.
# Create a new column that calculates the yearly salary
df %>%
mutate(YearlySalary = Salary * 12)
4. arrange(): Sort the data frame by one or more columns
The arrange() function is used to reorder the rows based on specified columns.
# Arrange rows by Age in ascending order
df %>%
arrange(Age)
You can also sort in descending order by using the desc() function:
df %>%
arrange(desc(Age))
5. summarise() and group_by(): Aggregate data
The dplyr package in R is a powerful and flexible tool for data manipulation, allowing
you to work with data frames efficiently. It provides a set of verbs that make common
data tasks easier to accomplish. Here are some key functions in dplyr that are used
for managing data frames:
1. select(): Choose specific columns from a data frame
The select() function allows you to select columns based on their names or conditions.
library(dplyr)
# Select specific columns from a data frame
df <- data.frame(Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Salary = c(50000, 60000, 70000))
df %>%
select(Name, Age)
2. filter(): Filter rows based on conditions
The filter() function is used to filter rows that meet certain conditions.
# Filter rows where Age is greater than 30
df %>%
filter(Age > 30)
3. mutate(): Create new variables or modify existing ones
The mutate() function is used to add or modify columns.
# Create a new column that calculates the yearly salary
df %>%
mutate(YearlySalary = Salary * 12)
4. arrange(): Sort the data frame by one or more columns
The arrange() function is used to reorder the rows based on specified columns.
# Arrange rows by Age in ascending order
df %>%
arrange(Age)
You can also sort in descending order by using the desc() function:
df %>%
arrange(desc(Age))
5. summarise() and group_by(): Aggregate data