PRFA — NBM2
TASK OVERVIEW SUBMISSIONS EVALUATION REPORT
COMPETENCIES
4030.5.1 : Multiple Regression
The graduate employs multiple regression algorithms with categorical and numerical predictors in describing
phenomena.
4030.5.3 : Regression Implications
The graduate makes assertions based on regression modeling.
INTRODUCTION
As a data analyst, you will assess continuous data sources for their relevance to specific research questions
throughout your career.
In your previous coursework, you have performed data cleaning and exploratory data analysis on your data.
You have seen basic trends and patterns and now can start building more sophisticated statistical models. In
this course, you will use and explore both multiple regression and logistic regression models and their
assumptions.
For this task, you will select one of the Data Sets and Associated Data Dictionaries from the following link:
Data Sets and Associated Data Dictionaries
You will then review the data dictionary related to the raw data file you have chosen, and prepare the data
set file for multiple regression modeling. The organizations connected with the given data sets for this task
seek to analyze their operations and have collected variables of possible use to support decision-making
processes. You will analyze your chosen data set using multiple regression modeling, create visualizations,
and deliver the results of your analysis. It is recommended that you use the cleaned data set from your
previous course.
REQUIREMENTS
1/10
, Your submission must be your original work. No more than a combined total of 30% of the submission and no
more than a 10% match to any one individual source can be directly quoted or closely paraphrased from
sources, even if cited correctly. The originality report that is provided when you submit your task can be used
as a guide.
You must use the rubric to direct the creation of your submission because it provides detailed criteria that
will be used to evaluate your work. Each requirement below may be evaluated by more than one rubric
aspect. The rubric aspect titles may contain hyperlinks to relevant portions of the course.
Tasks may not be submitted as cloud links, such as links to Google Docs, Google Slides, OneDrive, etc., unless
specified in the task requirements. All other submissions must be file types that are uploaded and submitted
as attachments (e.g., .docx, .pdf, .ppt).
Part I: Research Question
A. Describe the purpose of this data analysis by doing the following:
1. Summarize one research question that is relevant to a real-world organizational situation captured in
the data set you have selected and that you will answer using multiple regression.
2. Define the objectives or goals of the data analysis. Ensure that your objectives or goals are reasonable
within the scope of the data dictionary and are represented in the available data.
Part II: Method Justification
B. Describe multiple regression methods by doing the following:
1. Summarize the assumptions of a multiple regression model.
2. Describe the benefits of using the tool(s) you have chosen (i.e., Python, R, or both) in support of various
phases of the analysis.
3. Explain why multiple regression is an appropriate technique to analyze the research question
summarized in Part I.
Part III: Data Preparation
C. Summarize the data preparation process for multiple regression analysis by doing the following:
1. Describe your data preparation goals and the data manipulations that will be used to achieve the goals.
2. Discuss the summary statistics, including the target variable and all predictor variables that you will
need to gather from the data set to answer the research question.
3. Explain the steps used to prepare the data for the analysis, including the annotated code.
4. Generate univariate and bivariate visualizations of the distributions of variables in the cleaned data set.
Include the target variable in your bivariate visualizations.
5. Provide a copy of the prepared data set.
Part IV: Model Comparison and Analysis
D. Compare an initial and a reduced multiple regression model by doing the following:
1. Construct an initial multiple regression model from all predictors that were identified in Part C2.
2. Justify a statistically based variable selection procedure and a model evaluation metric to reduce the
initial model in a way that aligns with the research question.
3. Provide a reduced multiple regression model that includes both categorical and continuous variables.
Note: The output should include a screenshot of each model.
E. Analyze the data set using your reduced multiple regression model by doing the following:
1. Explain your data analysis process by comparing the initial and reduced multiple regression models,
including the following elements:
2/10