HBX CORE CERTIFICATION EVALUATION
2026 COMPLETE REVIEW GRADED A+
⩥ adjusted R-squared.
Answer: A measure of the explanatory power of a regression analysis.
Adjusted R-squared is equal to R-squared multiplied by an adjustment
factor that decreases slightly as each independent variable is added to a
regression model. Unlike R-squared, which can never decrease when a
new independent variable is added to a regression model, Adjusted R-
squared drops when an independent variable is added that does not
improve the model's true explanatory power. Adjusted R2 should always
be used when comparing the explanatory power of regression models
that have different numbers of independent variables.
⩥ alternative hypothesis.
Answer: An alternative hypothesis is the theory or claim we are trying to
substantiate, and is stated as the opposite of a null hypothesis. When our
data allow us to nullify the null hypothesis, we substantiate the
alternative hypothesis.
⩥ asymmetric distribution.
Answer: A probability distribution that is not symmetric around the
mean.
⩥ average.
,Answer: The most common statistic used to describe the center of the
values in a data set. The mean is also known as the average. For a
distribution that has discrete values, the mean is equal to sum of the
values of all the data points in the set, divided by the number of data
points.
⩥ base case.
Answer: The category of a categorical variable for which a dummy
variable is NOT included in a regression model. A regression model with
a categorical variable that has n categories should have n-1 dummy
variables. The coefficients of the dummy variables included in the
regression model are interpreted in relation to the base case. The analyst
can select any category to be excluded from the regression model;
however, different base cases lead to different interpretations of the
dummy variables' coefficients. For example, suppose we are trying to
determine the average difference in height between men and women in a
sample, and suppose that on average men are 5 inches taller than women
in the sample. If we use Female as the base case then the coefficient for
the dummy variable for Male would be +5. If we use Male as the base
case, the coefficient for the dummy variable for Female would be -5.
⩥ bias.
Answer: The tendency of a measurement process to over- or under-
estimate the value of a population parameter. Although a sample statistic
will almost always differ from the population parameter, for an unbiased
sample, the difference will be random. In contrast, for a biased sample,
the statistic will differ in a systematic way (e.g., tend to be too high).
,Some common reasons for bias include non-random sampling methods
and non-neutral question phrasing.
⩥ biased sample.
Answer: A sample that is not representative of the population from
which it is collected. Sampling practices that can introduce bias include
poorly phrased survey questions and non-random sampling.
⩥ bimodal distribution.
Answer: A multi-modal distribution with two clearly discernable peaks.
The two peaks may be of the same height (that is, have equal frequency),
or one may be the true mode while the other has a very high (but not the
highest) frequency.
⩥ bin.
Answer: A range of values used to categorize data. In a histogram,
observations are divided into a set of non-overlapping bins, each
corresponding to a range of values. The bins are constructed to ensure
that the set of bins contains all observations in the data set. The height of
the bar corresponding to a bin is equal to the number of observations in
the data set that fall within that bin's range. Typically, all bins in a given
histogram are the same width (i.e., the difference between the largest
value and the smallest value is the same for each bin). In an Excel
histogram, each bin is labeled by the value of the upper boundary of the
bin's range. For example, in a histogram with three bins (each of width
1), labeled 1, 2, and 3, the bin labeled 2 contains all observations greater
than 1 and less than or equal to 2. See histogram.
, ⩥ binomial distribution.
Answer: A distribution of the possible successful outcomes in a given
number of trials, where there are only two possible outcomes for each
trial, and each trial has the same probability of success (e.g., flipping a
coin). For example, the binomial distribution for the number of "heads"
that result from flipping a coin 50 times specifies the probability for
each possible outcome, from observing 0 "heads" to observing 50
"heads". The binomial distribution is used to create confidence intervals
for proportions.
⩥ Central Limit Theorem.
Answer: A theorem stating that if we take sufficiently large randomly-
selected samples from a population, the means of these samples will be
normally distributed regardless of the shape of the underlying
population. (Technically, the underlying population must have a finite
variance.)
⩥ coefficient of variation (CV).
Answer: A measure of a data set's variability relative to its mean. The
coefficient of variation (CV) is particularly helpful when comparing the
variability of two data sets with different means. Calculated as the
standard deviation divided by the mean, the CV is typically expressed as
a percentage. For example the CV of a data set with mean = 100 hours
and standard deviation = 15 hours is 15 hours/100 hours = 15%.
2026 COMPLETE REVIEW GRADED A+
⩥ adjusted R-squared.
Answer: A measure of the explanatory power of a regression analysis.
Adjusted R-squared is equal to R-squared multiplied by an adjustment
factor that decreases slightly as each independent variable is added to a
regression model. Unlike R-squared, which can never decrease when a
new independent variable is added to a regression model, Adjusted R-
squared drops when an independent variable is added that does not
improve the model's true explanatory power. Adjusted R2 should always
be used when comparing the explanatory power of regression models
that have different numbers of independent variables.
⩥ alternative hypothesis.
Answer: An alternative hypothesis is the theory or claim we are trying to
substantiate, and is stated as the opposite of a null hypothesis. When our
data allow us to nullify the null hypothesis, we substantiate the
alternative hypothesis.
⩥ asymmetric distribution.
Answer: A probability distribution that is not symmetric around the
mean.
⩥ average.
,Answer: The most common statistic used to describe the center of the
values in a data set. The mean is also known as the average. For a
distribution that has discrete values, the mean is equal to sum of the
values of all the data points in the set, divided by the number of data
points.
⩥ base case.
Answer: The category of a categorical variable for which a dummy
variable is NOT included in a regression model. A regression model with
a categorical variable that has n categories should have n-1 dummy
variables. The coefficients of the dummy variables included in the
regression model are interpreted in relation to the base case. The analyst
can select any category to be excluded from the regression model;
however, different base cases lead to different interpretations of the
dummy variables' coefficients. For example, suppose we are trying to
determine the average difference in height between men and women in a
sample, and suppose that on average men are 5 inches taller than women
in the sample. If we use Female as the base case then the coefficient for
the dummy variable for Male would be +5. If we use Male as the base
case, the coefficient for the dummy variable for Female would be -5.
⩥ bias.
Answer: The tendency of a measurement process to over- or under-
estimate the value of a population parameter. Although a sample statistic
will almost always differ from the population parameter, for an unbiased
sample, the difference will be random. In contrast, for a biased sample,
the statistic will differ in a systematic way (e.g., tend to be too high).
,Some common reasons for bias include non-random sampling methods
and non-neutral question phrasing.
⩥ biased sample.
Answer: A sample that is not representative of the population from
which it is collected. Sampling practices that can introduce bias include
poorly phrased survey questions and non-random sampling.
⩥ bimodal distribution.
Answer: A multi-modal distribution with two clearly discernable peaks.
The two peaks may be of the same height (that is, have equal frequency),
or one may be the true mode while the other has a very high (but not the
highest) frequency.
⩥ bin.
Answer: A range of values used to categorize data. In a histogram,
observations are divided into a set of non-overlapping bins, each
corresponding to a range of values. The bins are constructed to ensure
that the set of bins contains all observations in the data set. The height of
the bar corresponding to a bin is equal to the number of observations in
the data set that fall within that bin's range. Typically, all bins in a given
histogram are the same width (i.e., the difference between the largest
value and the smallest value is the same for each bin). In an Excel
histogram, each bin is labeled by the value of the upper boundary of the
bin's range. For example, in a histogram with three bins (each of width
1), labeled 1, 2, and 3, the bin labeled 2 contains all observations greater
than 1 and less than or equal to 2. See histogram.
, ⩥ binomial distribution.
Answer: A distribution of the possible successful outcomes in a given
number of trials, where there are only two possible outcomes for each
trial, and each trial has the same probability of success (e.g., flipping a
coin). For example, the binomial distribution for the number of "heads"
that result from flipping a coin 50 times specifies the probability for
each possible outcome, from observing 0 "heads" to observing 50
"heads". The binomial distribution is used to create confidence intervals
for proportions.
⩥ Central Limit Theorem.
Answer: A theorem stating that if we take sufficiently large randomly-
selected samples from a population, the means of these samples will be
normally distributed regardless of the shape of the underlying
population. (Technically, the underlying population must have a finite
variance.)
⩥ coefficient of variation (CV).
Answer: A measure of a data set's variability relative to its mean. The
coefficient of variation (CV) is particularly helpful when comparing the
variability of two data sets with different means. Calculated as the
standard deviation divided by the mean, the CV is typically expressed as
a percentage. For example the CV of a data set with mean = 100 hours
and standard deviation = 15 hours is 15 hours/100 hours = 15%.