Linear Regression
ACTUAL EXAM QUESTIONS WITH
COMPLETE SOLUTION GUIDE
(A+ GRADED 100% VERIFIED)
LATEST VERSION 2026/2027
Prepared By: Dr. Evelyn Sterling, PhD, RN, FAAN Distinguished Professor of Biostatistics
& Clinical Informatics Department of Nursing Research & Analytics Ivy League Nursing
Consortium
Date: December 14, 2025
Document Type: Comprehensive Elite Exam & Solution Guide
Course Code: NURS-STAT-503 (Advanced Quantitative Methods)
Table of Contents
1. Topic Index & Concept Map
2. Glossary of Statistical & Clinical Terms
3. Comprehensive Formula Sheet
4. Exam Section I: Foundations of Association (Questions 1–10)
5. Exam Section II: Pearson Correlation Mechanics (Questions 11–20)
6. Exam Section III: Simple Linear Regression Models (Questions 21–30)
7. Exam Section IV: Model Evaluation & Diagnostics (Questions 31–40)
8. Exam Section V: Assumptions, Outliers, & Errors (Questions 41–50)
9. Exam Section VI: Advanced Clinical Application (Questions 51–55)
Topic Index & Concept Map
This document is structured to guide the doctoral or masters-level nursing student through the
intricate landscape of correlational analysis and linear regression. The following concept map
delineates the hierarchy of knowledge required to master Module 7.
1. The Architecture of Association
● Theoretical Basis: Understanding the distinction between deterministic relationships
(physics) and stochastic relationships (biological/clinical).
, ● Visual Diagnostics: The primacy of the scatterplot in identifying linearity, direction, and
magnitude before calculation.
● Covariance: The mathematical foundation of correlation—how two variables vary
together from their respective means.
2. Pearson Product-Moment Correlation (r)
● Quantification: The standardization of covariance into a metric between -1.0 and +1.0.
● Interpretation: Decoding the magnitude (Weak vs. Moderate vs. Strong) and direction
(Positive vs. Inverse).
● Significance Testing: The role of sample size (n) and degrees of freedom (df = n-2) in
determining the p-value of r.
3. Simple Linear Regression (OLS)
● Prediction vs. Association: Moving from "Are they related?" to "Can we predict Y from
X?".
● The Linear Model: Y = \alpha + \beta X + \epsilon (The deterministic line plus the
stochastic error).
● Least Squares Criterion: Minimizing the sum of squared residuals (\sum (y - \hat{y})^2)
to find the line of best fit.
4. Regression Diagnostics
● Goodness of Fit: The Coefficient of Determination (R^2) as a measure of explained
variance.
● Residual Analysis: Checking assumptions of Homoscedasticity (constant variance) and
Normality of Errors.
● Influence Analysis: Identifying Outliers and High Leverage points that distort clinical
conclusions.
5. Clinical Translation
● Causality Fallacy: Navigating the "Correlation \neq Causation" trap in Evidence-Based
Practice (EBP).
● Spurious Correlations: Identifying confounding variables (e.g., patient acuity, age,
comorbidities).
● Predictive Analytics: Using regression equations for dosage calculation, risk
stratification, and resource allocation.
Glossary of Statistical & Clinical
Terms
Bivariate Normal Distribution A statistical assumption required for Pearson’s correlation
coefficient significance testing. It implies that for any fixed value of the independent variable X,
the dependent variable Y is normally distributed, and vice versa. In clinical terms, this suggests
, that if we look at all patients with a specific BMI (X), their blood pressures (Y) should form a bell
curve.
Coefficient of Determination (R^2) A key statistic in regression analysis that represents the
proportion of the variance in the dependent variable (Y) that is predictable from the independent
variable (X). For example, if R^2 = 0.50 in a pain management study, it means 50% of the
variation in pain relief is explained by the medication dose, while the other 50% is due to
unexplained individual differences.
Homoscedasticity Derived from Greek roots meaning "same scatter." This is the assumption
that the variance of the residuals (errors) is constant across all levels of the independent
variable. In a residual plot, this looks like a consistent band of dots. If the dots fan out (cone
shape), the data is heteroscedastic, meaning the model's predictions are less reliable at certain
levels of the predictor.
Multicollinearity A condition in multiple regression where two or more predictor variables are
highly correlated with one another. This makes it difficult to isolate the individual effect of each
predictor on the outcome. For instance, "Heart Rate" and "Pulse" are perfectly correlated;
including both in a model would cause mathematical instability and inflated standard errors.
Pearson’s Product-Moment Correlation Coefficient (r) A measure of the linear strength and
direction of the relationship between two continuous variables. It is sensitive to outliers and only
detects linear patterns. It is calculated by dividing the covariance of the two variables by the
product of their standard deviations.
Residual (Error Term, \epsilon) The vertical distance between an observed data point and the
regression line (Predicted Value). Mathematically, Residual = Observed Y - Predicted \hat{Y}.
Clinically, this represents the "unexplained" portion of a patient's outcome—the nuance that the
standard protocol (the regression line) did not capture.
Spurious Correlation A relationship between two variables that appears causal but is actually
due to coincidence or the presence of a third, unseen confounding variable. For example, a
positive correlation between "Number of Doctors" and "Patient Mortality" is spurious; the
confounder is "Hospital Size/Acuity".
Variance Inflation Factor (VIF) A diagnostic tool used to detect the severity of multicollinearity.
A VIF value of 1 indicates no correlation between predictors, while a VIF exceeding 5 or 10
suggests problematic multicollinearity that may invalidate the regression coefficients.
Formula Sheet
1. Pearson Correlation Coefficient (r)
The definitive formula for calculating the linear relationship strength:
● Where: n is the sample size, x and y are individual observations.
2. Linear Regression Equation (The Prediction Model)
● Y: The Predicted Dependent Variable (Outcome)
● X: The Independent Variable (Predictor)
● b: The Slope (Rate of Change)
● a: The Y-Intercept (Baseline)