RSTUDIO
FUNCTION DESCRITPTION
x <- 0 assigns value 0 to x
v <- c(1,2,3) creates vector v and assigns values [1,2,3] to it
mean(v) calculates the mean of vector v
max(v) calculates the maximum value of vector v
median(v) calculates the median of vector v
var(v) calculates the variance of vector v
rm(list=ls()) cleans the environment (all the assigned values are lost)
#write used to write commands in a script (not a part of code, just a command)
command + enter used to execute single line from a script
data("iris") loads the dataset "iris" on the environment ("iris" is a data already existing
in the RStudio)
iris when written in the script shows the whole data set in the console
head(iris) allows to inspect the rst raws of a dataset
iris[1,3] shows the entry in the 1st raw, 3 column of the dataset "iris"
iris[,3] shows all of the entries in the 3 column of the dataset "iris"
Petal.Length = iris[,3] assigns 3 column to the variable Petal.Length (its also possible to use
Petal.Length <- iris[,3] )
hist(Petal.Length) plots a histogram from a vector of values
hist(Petal.Length, breaks = 30) plots a histogram from a vector of values with 30 bars
plot(x=Petal.Length, y=Petal.Width ) plots a scatter diagram from a data with x and y coordinates speci ed
plot (x=Petal.Length, y=Petal.Width, plots a scatter diagram from a data with x and y coordinates speci ed, you
col="green", pch=5 ) can also specify some parameters like color (col="") and points on the
chart (pch= )
qplot(x=Petal.Length, y=Petal.Width) plots a scatter diagram from a data with x and y coordinates speci ed -
outcome is usually nicer than from simple plot function
Dataset_Movie_v4$movie_title displays the whole column "movie_title" from the dataset
"Dataset_Movie_v4"
summary(Dataset_Movie_v4$oscar_ provides a 5 number summary (min, Q1, median, Q3, max) of the data in
won_n) column"oscar_won_n" in the dataset "Dataset_Movie_v4"
fi fi
, abs.freq = de nes a variable "abs.freq" which after displaying on the console shows
table(Dataset_Movie_v4$main_genre) a table with absolute frequencies of the variable "main_genre"
rel.freq = prop.table(abs.freq) de nes a variable "rel.freq" which after displaying on the console shows a
table with relative frequencies of the variable "main_genre"
rbind(abs.freq, rel.freq) displays relative and absolute frequencies together
cum.freq = cumsum(rel.freq) de nes a variable "cum.freq" which after displaying on the console shows
a table with cumulative frequencies of the variable "main_genre"
pie(abs.freq) plots a pie chart of the absolute frequencies
barplot(abs.freq) plots a bar chart of the absolute frequencies
barplot(abs.freq, cex.names=0.8, plots a bar chart of the absolute frequencies, you can also specify some
las=2, col="pink") parameters like color of the chart (col=""), size of categories names
(cex.names= ), orientation of the names (las= )
length(Dataset_Movie_v4$runtime_m used to nd sample size n of the variable "runtime_minutes" in the dataset
inutes) "Dataset_Movie_v4"
BREAKS = de nes a variable "BREAK" which contains width of classes that we want
c(60,80,100,120,140,160,180,200,22 to group our data into
0)
hist(Dataset_Movie_v4$runtime_minu plots a histogram with class width speci ed in the variable "BREAK"
tes, breaks=BREAKS)
hist(Dataset_Movie_v4$runtime_minu plots a histogram with frequency density displayed on the y axis (if we
tes, freq = FALSE) don’t specify it and the classes are of the same width, histogram displays
absolute frequency on the y axis)
hist(Dataset_Movie_v4$runtime_minu plots a histogram in which right extreme in the class does not belong to
tes, right = FALSE) each bar eg. [0,5[
h= if we de ne a variable "h" as a histogram we can later explore some
hist(Dataset_Movie_v4$runtime_minu variables inside the histogram eg.
tes, breaks=BREAKS) - "h$counts" will display absolute frequencies of each class
- "h$density" will display density of each class
- "h$mids" will display middle values of each class
dbinom(3, size=10, prob=0.6) calculates the P(X=3) of the Binomial distribution X∼B(10, 0.6)
or
dbinom(3, 10, 0.6)
dnorm(3, mean=2, sd=2) calculates the P(X=3) of the Normal distribution X∼N(2, 4)
or
dnorm(3, 2, 2)
fi fi fi
FUNCTION DESCRITPTION
x <- 0 assigns value 0 to x
v <- c(1,2,3) creates vector v and assigns values [1,2,3] to it
mean(v) calculates the mean of vector v
max(v) calculates the maximum value of vector v
median(v) calculates the median of vector v
var(v) calculates the variance of vector v
rm(list=ls()) cleans the environment (all the assigned values are lost)
#write used to write commands in a script (not a part of code, just a command)
command + enter used to execute single line from a script
data("iris") loads the dataset "iris" on the environment ("iris" is a data already existing
in the RStudio)
iris when written in the script shows the whole data set in the console
head(iris) allows to inspect the rst raws of a dataset
iris[1,3] shows the entry in the 1st raw, 3 column of the dataset "iris"
iris[,3] shows all of the entries in the 3 column of the dataset "iris"
Petal.Length = iris[,3] assigns 3 column to the variable Petal.Length (its also possible to use
Petal.Length <- iris[,3] )
hist(Petal.Length) plots a histogram from a vector of values
hist(Petal.Length, breaks = 30) plots a histogram from a vector of values with 30 bars
plot(x=Petal.Length, y=Petal.Width ) plots a scatter diagram from a data with x and y coordinates speci ed
plot (x=Petal.Length, y=Petal.Width, plots a scatter diagram from a data with x and y coordinates speci ed, you
col="green", pch=5 ) can also specify some parameters like color (col="") and points on the
chart (pch= )
qplot(x=Petal.Length, y=Petal.Width) plots a scatter diagram from a data with x and y coordinates speci ed -
outcome is usually nicer than from simple plot function
Dataset_Movie_v4$movie_title displays the whole column "movie_title" from the dataset
"Dataset_Movie_v4"
summary(Dataset_Movie_v4$oscar_ provides a 5 number summary (min, Q1, median, Q3, max) of the data in
won_n) column"oscar_won_n" in the dataset "Dataset_Movie_v4"
fi fi
, abs.freq = de nes a variable "abs.freq" which after displaying on the console shows
table(Dataset_Movie_v4$main_genre) a table with absolute frequencies of the variable "main_genre"
rel.freq = prop.table(abs.freq) de nes a variable "rel.freq" which after displaying on the console shows a
table with relative frequencies of the variable "main_genre"
rbind(abs.freq, rel.freq) displays relative and absolute frequencies together
cum.freq = cumsum(rel.freq) de nes a variable "cum.freq" which after displaying on the console shows
a table with cumulative frequencies of the variable "main_genre"
pie(abs.freq) plots a pie chart of the absolute frequencies
barplot(abs.freq) plots a bar chart of the absolute frequencies
barplot(abs.freq, cex.names=0.8, plots a bar chart of the absolute frequencies, you can also specify some
las=2, col="pink") parameters like color of the chart (col=""), size of categories names
(cex.names= ), orientation of the names (las= )
length(Dataset_Movie_v4$runtime_m used to nd sample size n of the variable "runtime_minutes" in the dataset
inutes) "Dataset_Movie_v4"
BREAKS = de nes a variable "BREAK" which contains width of classes that we want
c(60,80,100,120,140,160,180,200,22 to group our data into
0)
hist(Dataset_Movie_v4$runtime_minu plots a histogram with class width speci ed in the variable "BREAK"
tes, breaks=BREAKS)
hist(Dataset_Movie_v4$runtime_minu plots a histogram with frequency density displayed on the y axis (if we
tes, freq = FALSE) don’t specify it and the classes are of the same width, histogram displays
absolute frequency on the y axis)
hist(Dataset_Movie_v4$runtime_minu plots a histogram in which right extreme in the class does not belong to
tes, right = FALSE) each bar eg. [0,5[
h= if we de ne a variable "h" as a histogram we can later explore some
hist(Dataset_Movie_v4$runtime_minu variables inside the histogram eg.
tes, breaks=BREAKS) - "h$counts" will display absolute frequencies of each class
- "h$density" will display density of each class
- "h$mids" will display middle values of each class
dbinom(3, size=10, prob=0.6) calculates the P(X=3) of the Binomial distribution X∼B(10, 0.6)
or
dbinom(3, 10, 0.6)
dnorm(3, mean=2, sd=2) calculates the P(X=3) of the Normal distribution X∼N(2, 4)
or
dnorm(3, 2, 2)
fi fi fi