BADM 211 Data Visualization
1. Know the uses of data exploration, particularly data visualization. - answer-3 steps of
exploratory data analysis
1. Data inspection
•Check the raw data
2. Data visualization
•Inspect relationships and conditions visually
3. Data exploration
•Use charts and tables to explore relationships further
-visualization is typically done through graphical representations of the data (e.g.
histograms, box plots, violin plots, scatter plots, line graphs)
-visualization is used to help preprocess data, to support data cleaning, for variable
derivation and feature selection, bin sizes, for combining categories in data reduction, to
determine which variables and metrics are useful
2. Understand how to interpret a box plot, line graph, scatter plot, and histogram
regarding distributions, relationships, and extreme values. - answerbox plot:
-top of the box represents the 75th percentile
-bottom of the box represents the 25th percentile
-box captures 50% of the data (75-25)
-line in box is the median or 50th percentile
-lines above or below the box represent the rest of the data range
-outliers appear as circles
line graph:
-connected points, look for trend in data
scatter plot:
-scattered points, look for trend in data
histogram:
-split up into bins
-depicts the amount of data that goes into each bin
3. Know the methods of adding a third variable to a 2D scatter plot, whether categorical
or numeric. - answer-categorical: color code the data
-numeric: use a scatter plot matrix to compare all pairwise scatter plots
1. Know the uses of data exploration, particularly data visualization. - answer-3 steps of
exploratory data analysis
1. Data inspection
•Check the raw data
2. Data visualization
•Inspect relationships and conditions visually
3. Data exploration
•Use charts and tables to explore relationships further
-visualization is typically done through graphical representations of the data (e.g.
histograms, box plots, violin plots, scatter plots, line graphs)
-visualization is used to help preprocess data, to support data cleaning, for variable
derivation and feature selection, bin sizes, for combining categories in data reduction, to
determine which variables and metrics are useful
2. Understand how to interpret a box plot, line graph, scatter plot, and histogram
regarding distributions, relationships, and extreme values. - answerbox plot:
-top of the box represents the 75th percentile
-bottom of the box represents the 25th percentile
-box captures 50% of the data (75-25)
-line in box is the median or 50th percentile
-lines above or below the box represent the rest of the data range
-outliers appear as circles
line graph:
-connected points, look for trend in data
scatter plot:
-scattered points, look for trend in data
histogram:
-split up into bins
-depicts the amount of data that goes into each bin
3. Know the methods of adding a third variable to a 2D scatter plot, whether categorical
or numeric. - answer-categorical: color code the data
-numeric: use a scatter plot matrix to compare all pairwise scatter plots