analytics
the process of developing actionable decisions or recommendations for actions based upon
insights generated from historical data
primary data
data collected specifically for the research problem at hand (ex: survey, interviews)
secondary data
data collected for some purpose other than the problem at hand (ex: firm's proprietary data,
internet data, stock/capital market data)
stimulated data
data based on assumption and simulation
importance of data visualization
1. visual elements allow us to see and understand trends, outliers, & patterns in data
2. can comprehend difficult concepts or identify new patterns more easily
3. humans LOVE visuals
3 main principles of data visualization
1. chart should tell a story / yield insight beyond text
2. chart should have graphical integrity (Tufte's "Lie" factor)
3. chart should min graphical complexity (Tufte's "data ink" ratio)
statistics
science concerned with developing and studying methods for collecting, analyzing, interpreting,
and presenting empirical data to assist in making effective decisions
descriptive statistic
study data in entirety
3 principles of describing data: center, spread, shape
inferential statistics
utilize random sample of data taken from population to describe and make INFERENCES about
the population
- reliability of conclusion dependent on CL
, 3 principles of descriptive statistics
1. Data centrality (mean, median, mode)
2Data spread / variability (range, MAD, variance, stdv)
3. Data shape (kurtosis)
kurtosis
measure of whether the data are peaked or flat relative to a normal distribution
High kurtosis = data is peaked near mean, declines rather rapidly, and has heavy tails
Low kurtosis = data is flat near mean
covariance
measure of the DIRECTION of linear association between two variables
scaled between negative infinity and positive infinity
aka how variables vary from each other
correlation
measure of linear relationship between two variables which does NOT depend on units of
measurement
Scaled between -1 (perfect negative correlation) and 1 (perfect positive correlation) w/ 0 = no
correlation
margin of error
the percentage in which the sample will DIFFER from the population
sampling error
an error that occurs when a sample somehow does not represent the target population
Central Limit Theorem (CLT)
Says that when n is large (n >= 30), the sampling distribution of the sample mean is
approximately Normal
standard error
the standard deviation of a sampling distribution
Confidence Interval (CI)
roughly speaking, the RANGEEE of scores (that is, the scores between an upper and lower
value) that is likely to include the true population mean
confidence level (CL)