MISY 5330 ACTUAL EXAM QUESTIONS AND
COMPLETE STUDY GUIDE 2026
▶ Clustered column (or bar) chart? Answer:An alternative chart to stacked
column chart for comparing quantitative variables
▶ Scatter chart matrix? Answer:Useful chart for displaying multiple
variables.
▶ Geographic Information Systems (GIS)? Answer:A system that merges
maps and statistics to present data collected over different geographies
▶ Data dashboard? Answer:Data visualization tool that illustrates multiple
metrics and automatically updates these metrics as new data become
available
▶ Supervised learning? Answer:Data Mining approach for prediction and
classification
▶ Unsupervised learning? Answer:Data Mining approach for to detect
patterns and relationships in the data
▶ Data Sampling? Answer:When dealing with large volumes of data, it is
best practice to extract a representative sample for analysis.
A sample is representative, if the analyst can make the same conclusions
from it as from the entire population of data.
The sample of data must be large enough to contain significant information,
yet small enough to be manipulated quickly.
▶ Data Preparation? Answer:The data in a data set are often said to be
"dirty" and "raw" before they have been preprocessed.
We need to put them into a form that is best suited for a data-mining
algorithm.
Data preparation makes heavy use of the descriptive statistics and data
visualization methods.
, ▶ Unsupervised learning application? Answer:The goal is to use the
variable values to identify relationships between observations.
Qualitative assessments, such as how well the results match expert
judgment, are used to assess unsupervised learning methods.
▶ Cluster Analysis? Answer:The goal of this unsupervised learning method
is to segment observations into similar groups based on the observed
variables
Can be employed during the data preparation step to identify variables or
observations that can be aggregated or removed from consideration
▶ Types of Clustering Methods? Answer:Hierarchical and K-Means
▶ Euclidean distance? Answer:Most common method to measure
dissimilarity between observations, when observations include continuous
variables
▶ Hierarchical clustering? Answer:Bottom-up approach
Determines the similarity of two clusters by considering the similarity
between the observations composing either cluster
▶ Single linkage? Answer:The similarity between two clusters is defined by
the similarity of the pair of observations (one from each cluster) that are the
most similar
▶ Complete linkage? Answer:This clustering method defines the similarity
between two clusters as the similarity of the pair of observations (one from
each cluster) that are the most different
▶ Average linkage? Answer:Defines the similarity between two clusters to
be the average similarity computed over all pairs of observations between
the two clusters
▶ Ward's method? Answer:Computes dissimilarity as the sum of the
squared differences in similarity between each individual observation in the
union of the two clusters and the centroid of the resulting merged cluster
▶ k-Means clustering? Answer:Given a value of k, the k-means algorithm
randomly partitions the observations into k clusters.
After all observations have been assigned to a cluster, the resulting cluster
centroids are calculated.
COMPLETE STUDY GUIDE 2026
▶ Clustered column (or bar) chart? Answer:An alternative chart to stacked
column chart for comparing quantitative variables
▶ Scatter chart matrix? Answer:Useful chart for displaying multiple
variables.
▶ Geographic Information Systems (GIS)? Answer:A system that merges
maps and statistics to present data collected over different geographies
▶ Data dashboard? Answer:Data visualization tool that illustrates multiple
metrics and automatically updates these metrics as new data become
available
▶ Supervised learning? Answer:Data Mining approach for prediction and
classification
▶ Unsupervised learning? Answer:Data Mining approach for to detect
patterns and relationships in the data
▶ Data Sampling? Answer:When dealing with large volumes of data, it is
best practice to extract a representative sample for analysis.
A sample is representative, if the analyst can make the same conclusions
from it as from the entire population of data.
The sample of data must be large enough to contain significant information,
yet small enough to be manipulated quickly.
▶ Data Preparation? Answer:The data in a data set are often said to be
"dirty" and "raw" before they have been preprocessed.
We need to put them into a form that is best suited for a data-mining
algorithm.
Data preparation makes heavy use of the descriptive statistics and data
visualization methods.
, ▶ Unsupervised learning application? Answer:The goal is to use the
variable values to identify relationships between observations.
Qualitative assessments, such as how well the results match expert
judgment, are used to assess unsupervised learning methods.
▶ Cluster Analysis? Answer:The goal of this unsupervised learning method
is to segment observations into similar groups based on the observed
variables
Can be employed during the data preparation step to identify variables or
observations that can be aggregated or removed from consideration
▶ Types of Clustering Methods? Answer:Hierarchical and K-Means
▶ Euclidean distance? Answer:Most common method to measure
dissimilarity between observations, when observations include continuous
variables
▶ Hierarchical clustering? Answer:Bottom-up approach
Determines the similarity of two clusters by considering the similarity
between the observations composing either cluster
▶ Single linkage? Answer:The similarity between two clusters is defined by
the similarity of the pair of observations (one from each cluster) that are the
most similar
▶ Complete linkage? Answer:This clustering method defines the similarity
between two clusters as the similarity of the pair of observations (one from
each cluster) that are the most different
▶ Average linkage? Answer:Defines the similarity between two clusters to
be the average similarity computed over all pairs of observations between
the two clusters
▶ Ward's method? Answer:Computes dissimilarity as the sum of the
squared differences in similarity between each individual observation in the
union of the two clusters and the centroid of the resulting merged cluster
▶ k-Means clustering? Answer:Given a value of k, the k-means algorithm
randomly partitions the observations into k clusters.
After all observations have been assigned to a cluster, the resulting cluster
centroids are calculated.