Journey ACTUAL QUESTIONS AND
CORRECT ANSWERS
Business Understanding (order) - CORRECT ANSWERS 1st in Data Analytics Lifecycle
Data Acquisition (order) - CORRECT ANSWERS 2nd in Data Analytics Lifecycle
Data Cleaning (order) - CORRECT ANSWERS 3rd in Data Analytics Lifecycle
Data Exploration (order) - CORRECT ANSWERS 4th in Data Analytics Lifecycle
Predictive Modeling (order) - CORRECT ANSWERS 5th in Data Analytics Lifecycle
Data Mining/Machine Learning (order) - CORRECT ANSWERS 6th in Data Analytics
Lifecycle
Reporting and Visualization (order) - CORRECT ANSWERS 7th in Data Analytics
Lifecycle
Business Understanding - CORRECT ANSWERS - Also known as the discovery phase or
planning phase
- Analyst defines the major questions of interest
- Assesses the resource constraints of the project
- Determines the needs of stakeholders
,Data Acquisition - CORRECT ANSWERS - Collecting Data
- Data retrieved from DB
- Use SQL to obtain from Data Warehouse
- If data is not available, web scraping and surveys are used to acquire it
Data Cleaning - CORRECT ANSWERS - Referred to as Data cleansing, data wrangling,
data munging, and feature engineering
- When this phase is ignored or skipped, the results from the analysis may become irrelevant.
- There is no one common tool supporting this phase. An analyst will use SQL, Python, R, or
Excel to perform various data modifications and transformations.
- Data quality is measured in terms of uniqueness and relevance.
Data Exploration - CORRECT ANSWERS - the analyst begins to understand the basic
nature of data and the relationships within it.
- This phase often relies on the use of data visualization tools and numerical summaries, such as
measures of central tendency and variability.
Predictive Modeling - CORRECT ANSWERS - These tools allow an analyst to move
beyond describing the data to creating models that enable predictions of outcomes of interest.
- Tools such as Python and R play an important role in automating the training and use of
models.
Data Mining - CORRECT ANSWERS - These tools became popular with the ability of
computers to look for patterns in large amounts of data. Tools such as Python and R play an
important role in this phase.
,- At times you may find that "machine learning" is used as a synonym for "data mining."
However, some in the industry might refer to "machine learning" as a specialized segment of
data mining techniques that continually update (i.e., "learn") to improve its modeling over time.
Reporting and Visualization - CORRECT ANSWERS - an analyst tells the story of the
data and uses graphs or interactive dashboards to inform others of the findings from the analyses.
- Interactive dashboard tools, such as Tableau, give even the novice user the ability to interact
with the data and spot trends and patterns.
- Often, the goal of this phase is to provide actionable insights for various stakeholders.
Business Understanding Problems - CORRECT ANSWERS Lack of clear focus on
stakeholders, timeline, limitations and budget could potentially derail an analysis
Data Acquisition Problems - CORRECT ANSWERS Quality and type of data may make
access more difficult
Data Cleaning Problems - CORRECT ANSWERS Some cleaning techniques could
dramatically change data/outcomes
Outliers not dealt with can cause problems with statistical models due to excessive variability.
Data Exploration Problems - CORRECT ANSWERS Skipping this step could enable
faulty perceptions of the data which hurt advanced analytics.
Predictive Modeling Problems - CORRECT ANSWERS - Too many input variables
(predictors) can cause problems
- Correlation does not imply causation.
- Time series models often need sufficient time data to offer precise trending.
, - Predictive model accuracy should be assessed using cross-validation.
Data Mining Problems - CORRECT ANSWERS Running on entire data is problematic;
need to subset data into training and testing datasets to build models.
Reporting and visualization Problems - CORRECT ANSWERS - Due to potential large
audience consumption, mistakes can cause bad business decisions and loss of revenue
- Improper scales used in graphs could push for interpretations of the story that is inaccurate
Descriptive - CORRECT ANSWERS Key focus: Observation
Main question: What happened?
Diagnostics - CORRECT ANSWERS Key focus: Explained reason
Main question: Why did it happen?
Descriptive Example - CORRECT ANSWERS In a healthcare setting, an unusually high
number of people are admitted to the emergency room in a short period of time. ____ analytics
tells you that this is happening and provides real-time data with all the corresponding statistics
(date of occurrence, volume, patient details, etc.).
Diagnostic Example - CORRECT ANSWERS In the healthcare example mentioned
earlier, ___ analytics would explore the data and make correlations. For instance, it may help you
determine that all of the patients' symptoms — high fever, dry cough, and fatigue — point to the
same infectious agent. You now have an explanation for the sudden spike in volume at the ER.
Predictive Example - CORRECT ANSWERS Back in our hospital example, ___ analytics
may forecast a surge in patients admitted to the ER in the next several weeks. Based on patterns
in the data, the illness is spreading at a rapid rate.