A data analyst has identified combinations of sales transactions that frequently
occur together in data over the past 5 years. Which phase of the DA life cycle is
represented by this analysis Ans✓✓✓ Data mining
A data analyst needs to contact a specific member of the database administration
team. Which method should be used to discover the person's email address?
Ans✓✓✓ Send an email to the team members manager
A data analyst notices the data selected for an analytics project is slightly
misaligned with the research question. How can the data analyst resolve the
situation? Ans✓✓✓ Adjust the research question to reframe the analysis.
A data analytics project team is preparing to develop a predictive model that will
be included within a business intelligence tool for upper management. Which
step should be considered for inclusion when creating the project schedule?
Ans✓✓✓ Business intelligence tool interface training
An analyst realizes that the data set has been reduced significantly, resulting in
sample sizes that are too small. And which phase of the DA lifecycle would this
likely occur? Ans✓✓✓ Data mining
Anomaly Detection Ans✓✓✓ Is the identification of rare items, events or
observations in a dataset which differ from the norm or raise suspicions. It can be
used to detect fraud, intrusion, outliers, technical glitches, etc. in a dataset. Tools
include R, RStudio, Tableau, MS Excel, Editor. Techniques include local outlier
factor (LOF), alfa function, etc.
,API (Application Programming Interface) Ans✓✓✓ An API is a software
intermediary that allows two applications to talk to each other. In other words, an
API is a messenger that delivers your request to the provider that you are
requesting it from and then delivers the response back. Example - PayPal, SQL
Are you as company collects and sells information on consumers. Which law
prevents the company from collecting information on European Union consumers
without their permission? Ans✓✓✓ General data protection regulation
Artificial Intelligence (AI) Ans✓✓✓ AI is the development of smart machines
capable of performing tasks that typically require human intelligence. Example is
Visual perceptions, speech recognition, online check processing, decision-making,
natural language processing (NLP).
Bayes' Theorem Ans✓✓✓ Is the probability of observing various data, given the
hypothesis, and the observed data. It gives you the after-the-data probability of a
hypothesis as a function of the likelihood of the data, the probability of getting
the data you found.
Bell Curve with a Long Tail End Ans✓✓✓ The long tail is the portion of the
distribution having many occurrences far from the central part of the distribution.
In sales, it may mean more people are buying individualized niche products.
Boxplot Ans✓✓✓ Provides a concise summary of the quartiles of numerical data
(dividing data into 25% percentile segments). This graph is also useful in detecting
outliers and skewness.
Clustering Ans✓✓✓ Is a machine learning technique where groupings are
unknown, and the analyst wishes to determine if the objects belong to any group.
, An example of clustering is when data on search queries are analyzed to
determine if they group in a particular way and how many groups exist. Examples:
genome patterns, google news, pointcloud processing.
Cross validation and testing new data are used to what? Ans✓✓✓ Validating
models
D3.js (Data driven document) Ans✓✓✓ This is a JavaScript library for
manipulating documents based on data. D3 helps bring data to life using HTML,
SVG, and CSS.
Data Reduction Ans✓✓✓ is simply reducing the amount or volume of data in
each storage or database. One of the goals is to optimize storage capacity.
Dealing with data types such as: unstructured, semi-structured, quantitative, and
qualitative AND quality like uniqueness, relevance, reliability, validity, and
accuracy which make access difficult are POTENTIAL PROBLEMS in what phase?
Ans✓✓✓ Data Acquisition Phase
Decision Trees Ans✓✓✓ machine learning technique where answers to yes or no
questions lead to additional questions until the end of the tree is reached.
Decisions and consequences. A sequence of binary decisions based on your data,
that can combine to predict an outcome. It branches out from one decision to the
next.
Decomposition Ans✓✓✓ Breaking trend overtime into components. It's
procedures are used in the time series to describe the reasons for variations in
trend.