120 questions with verified answers
Data quality is measured in terms of this: Ans✓✓✓ Uniqueness and relevance
Define Bayes Theorem Ans✓✓✓ diagnosis is based on probabilities. Probability
of observing various data given a hypothesis and observed data.
Define decomposition and when it's used Ans✓✓✓ Breaking time series data into
components. Its procedures are used in time series to describe the reasons for
variations in trends.
Define factor analysis Ans✓✓✓ finds underlying common factor that gives rise to
multiple indicators.
Define hypothesis testing Ans✓✓✓ predicts binary outcomes based on a set of
independent variables.
Define Principle Component Analysis (PCA) Ans✓✓✓ checks if variables group in
a meaningful way and reduces the dimensionality of large data sets
For example businesses may be able to use a Twitter API to pull in Twitter data in
what kind of format?
What level of structure is the data? Ans✓✓✓ JSON
Semi-structured data
,ggplot2, tidyverse, caret are essential libraries of which tool? Ans✓✓✓ R
Google Trends is an example of what kind of data? Ans✓✓✓ Open Data --> Social
Media
How and where should the safety margin (halfway amount between average time
completed and slowest possible time completed) be added? Ans✓✓✓ Spread
throughout the critical path
How is law confined? Ans✓✓✓ It's confined to the territory or the place that
created it, not the technology.
If an analyst wants to help create an online store that intelligently recommends
certain products for customers to buy, what type of analysis would they be
focusing on? Ans✓✓✓ Predictive (because they are predicting FUTURE habits)
If you're needing to crash a project (speed up the project to get it done on
schedule), what are three ways to do it? Ans✓✓✓ Money up
Quality down
Overlapping tasks
In what phase does the analyst deal with the following:
Central Tendency/ Measures of center (e.g., mean, median, mode), variability
(e.g., standard deviations and quartiles) and distributions (e.g., normal, skewed,
etc)
, Identify basic correlations between variables
Pattern discovery Ans✓✓✓ Data exploration/Exploratory Data
Analysis(EDA)/Descriptive Statistics
In what phase does the analyst deal with the following:
Creating training and testing datasets to build models from
Identify/detect patterns
Determine if groups (clusters) exist in data
Classify data into groups
Create models that "learn" and improve (e.g., machine/deep learning, AI, etc)
Ans✓✓✓ Data Mining/Machine Learning/AI/Supervised, Unsupervised Models
In what phase does the analyst deal with the following:
Estimate/project future values or likelihood of an event.
Extend correlations found in EDA to mathematical models
Predict/determine output values based on input values
Cross-validation of predictive models to ensure accuracy. Ans✓✓✓ Predictive
Modeling/Data Modeling/Correlation based models/Regression models/Time
Series
In what phase does the analyst deal with the following:
Fixing improperly formatted values
Dealing with duplicates, missing data, and outliers