Business Analytics - Answers The process of gathering, inspecting, cleaning, transforming, and
modeling data with the goal of discovering useful information to suggest conclusions and support
decision making
data cleaning - Answers The process of detecting and correcting coding errors in a machine-readable
dataset
EDA Process - Answers the process of reorganizing the data and make it easy to read and understand.
(exploratory data)
Descriptive Statistics - Answers uses data to provide descriptions of the population either through
charts, tables, or numerical calculations; describes the KNOWN data; draws insights from past data -->
more meaningful
measures of centre - Answers Mean
Median
Mode
Geometric Mean
Midrange
Trimmed Mean
measures of spread - Answers Range
Variance
Standard Deviation
Coefficient of Variation
Mean Absolute Deviation
Interquartile Range- IQR
Skewness vs Kurtosis - Answers skewness: extent (typically absence) of symmetry in a distribution
kurtosis: the STEEPNESS of a distribution in its center
-platykurtic: relatively flat
-leptokurtic: relatively peaked
-mesokurtic: somewhere in the middle
Examples of Descriptive Statistics - Answers mean, median, mode, range, standard deviation
Inferential Statistics - Answers making inferences by comparing and predicting future outcomes (in
the form of probability scores) --> make conclusions beyond the available data
Data Visualization - Answers The representation of data in the form of charts, dashboards, etc
Descriptive BA - Answers "What happened?"; understanding underlying trends and causes
Predictive BA - Answers "What will happen and when?"; accurate projections of future events and
outcomes by looking at past data
Prescriptive BA - Answers "What should I do?"; makes decisions to achieve the best performance
possible; uses descriptive and predictive analytics to create alternatives --> find the best one
EX: netflix creates personalized movie recommendations
Script - Answers Top Left Corner; where the code is written
Console - Answers bottom left corner; shows the output of the code that has been run
Environment - Answers top right corner; displays the set of external elements that have been added
EX: x=3
Graphical Output - Answers bottom right corner; displays the graphs created during exploratory data
analysis
<- OR = - Answers symbols to assign values to R
Numeric - Answers real numbers
Integer - Answers whole numbers
Logical - Answers true/false
Vectors - Answers OBJECTS which represent one-dimensional arrays that can hold NUMERIC data,
character data, or logical data
c() - Answers what is the function to create a vector
class() - Answers what is the function for the class of an object
vector() - Answers what is the function for a vector of a numeric type
as.integer() - Answers what is the function for an integer variable in R
"as." command - Answers what is the command to convert the class of a vector
, Data Frame - Answers important object types used to store tabular data
data.frame() - Answers what is the function for a data frame
Factors - Answers uses the range to categorize values as a vector of integers; each integer type has a
label <-- they are assigned these values
Data - Answers raw facts, figures, measurements, and amounts that we gather for analysis or
reference + descriptive information + collects and stores data typically through observations
Categorical Data - Answers can be put into groups/categories using names/labels; can be assigned to
only one category based on its qualities (each is mutually exclusive)
EX: hair color, smoking status, rank, major
Nominal Data - Answers type of categorical data in which objects fall into unordered categories
(order is NOT important)
Ordinal Data - Answers type of categorical data in which order is important
EX: year in school, level of illness
Binary Data - Answers only two categories exist (there is no in between)
EX: attendance (present OR absent)
Measurement - Answers numerical data "measured" based on some quantitative trait
Discrete Measurement - Answers only certain values are possible; no gaps between the values
EX: SAT scores
Continuous Measurement - Answers any value within an interval
EX: height, age, GPA
Dataset - Answers data obtained through observations, measurements, study, analysis; it is an
ordered collection of data usually presented in a tabular pattern
Attributes/Variables (in a dataset_ - Answers "the category names"; last name, age, rank
Observations/Data Records (in a dataset) - Answers "the actual input for the category"
EX: people, age
Dataset Columns - Answers each has its own characteristic, name, unit, and format
Dataset Rows - Answers "observation or record"; each has its own (and one only) observation ex:
topic, subject, person
Numerical Data - Answers age, gpa
Textual Data - Answers Last and first name
data(datasetname) - Answers what is the function to load a built-in dataset
data(iris) - Answers what is the function to load the iris dataset
head() - Answers returns the first few rows of the dataset
tail() - Answers returns the last few rows of a dataset
str() - Answers returns the structure of the dataset
is.na() - Answers identifies any missing values in the dataset
summary() - Answers returns the basic structure of the dataset
Exploratory Data Analysis - Answers The critical process of performing initial investigations on data
to... with the help of graphical representations and summary statistics
EDA Step 1 - Answers Load the Data (import, load, read)
EDA Step 2 - Answers Data Visualization; data is represented in a chart, graph, etc
EDA Step 3 - Answers Data Imputation; deal with missing data
EDA Step 4 - Answers Statistical Analysis; provide descriptions of the data characteristics through
some numerical calculations
Comma - Separated - Values - Answers what does CSV stand for
head(dataset_name, 10) - Answers returns the first 10 rows of a dataset
tail(dataset_name, 15) - Answers returns the last 15 rows of a dataset
View() - Answers returns the entire dataset
?() - Answers returns the specification of the variables in the dataset
ncol() - Answers returns the number of columns in the dataset
nrow() - Answers returns the number of rows in the dataset
Redir package - Answers used for importing external datasets
ggplot2 package - Answers used for data visualization
fBasics package - Answers used for statistical analysis
mice package - Answers used for data imputation
install.packages() - Answers function used to install a package
library() - Answers function used to load a package