WGU D204 / WGU D204 THE DATA ANALYTICS
JOURNEY LATEST 2024 ACTUAL TEST QUESTIONS AND
CORRECT DETAILED ANSWERS (100% VERIFIED
SOLUTION)
Data preparation Time - ANSWER: data preparation 80%, and everything else falls
into about 20%
GIGO - ANSWER: garbage in, garbage out. That's a truism from computer science.
The information you're going to get from your analysis is only as good as the
information that you put into it
Upside to In-house data - ANSWER: It's the fastest way to start., you may actually be
able to talk with the people who gathered the data in the first place.
Downside to In-house data - ANSWER: if it was an ad-hoc project, it may not be well
documented. And the biggest one is the data simply may not exist. Maybe what you
need really isn't there in your organization.
Open data - ANSWER: Basically it's data that is free because it has no cost and it's
free to use that you can integrate in your projects. Sources: Number one is
government data, number two is scientific data and the third one is data from social
media and tech companies
APIs - ANSWER: An API or Application Programming Interface isn't a source of data
but rather it's a way of sharing data, it can take data from one application to
another. Uses JSON files
Scraping data - ANSWER: Data scraping is, in a sense, the found art of data science.
It's when you take the data that's around you, tables on pages and graphs in
newspapers, and integrate that information into your data science work. Unlike the
data that's available with API's or Application Programming Interfaces, which is
specifically designed for sharing, Data scraping is for data that isn't necessarily
created with that integration in mind.
Scraping Data and Ethics - ANSWER: there's still legal and ethical constraints that you
need to be aware of. For instance, you need to respect people's privacy. If the data is
private, you still need to maintain that privacy. You need to respect copyright. Just
because something's on the web doesn't mean that you can use it for whatever you
want. The idea here is Visible Doesn't Mean Open just like in an open market just
because it's there in front of you and doesn't have a price tag doesn't mean it's free.
There are still these important elements of laws, policies, social practices that need
to be maintained to not get yourself in some very serious trouble. And so keep that
in mind when you're doing Data scraping.
,Creating data/Get your own Data - ANSWER: natural observation, informal
discussions with, for instance, potential clients. You can do this in person in a one on
one, or a focus group setting. You can do it online through email, or through chat,
and this time you're asking specific questions to get the information you need to
focus your own projects.Surveys. Words > Numbers. Let ppl express themselves.
Start general
Research Ethics when gathering data - ANSWER: informed consent,Also sometimes
confidentiality, or anonymity
Passive collection of training data - ANSWER: gathering enormous amounts of data
doesn't always involve enormous amounts of work. In certain respects, you can just
sit there and wait for it to come to you. Photo Classificaiton. issue with this:One, and
this is actually a huge issue, is that you need to ensure that you have adequate
representation; things like categorizing photos/ limit cases
Self-generated data - ANSWER: external reinforcement learning.generative
adversarial networks. internal
The enumeration of explicit rules - ANSWER: business strategies, flowcharts, Or
criteria for medical diagnoses.
expert system - ANSWER: An expert system is an approach to machine decision-
making in which algorithms are designed that mimic the decision-making process of
a human domain expert.
linear regression - ANSWER: linear regression, which is a common and powerful
technique for combining many variables in an equation to predict a single outcome.
decision tree
decision tree - ANSWER: This is a whole series, a sequence of binary decisions, based
on your data, that can combine to predict an outcome. It's called a tree because it
branches out from one decision to the next
Neural networks - ANSWER: look at things in a different way than humans do and in
certain situations they're able to develop rules for classification, even when humans
can't see anything more than static.
implicit rules - ANSWER: o the implicit rules are rules that help the algorithms
function. They are the rules that they develop by analyzing the test data. And they're
implicit because they cannot be easily described to humans.
Microsoft Excel and its many versions. Google Sheets - ANSWER: spreadsheets the
universal data tool. It's my untested theory that there are more datasets in
spreadsheets than in any other format in the world. The rows and columns are very
familiar to a very large number of people and they know how to explore the data
and access it using those tools. The most common by far
, MLaaS - ANSWER: machine learning as a service.Amazon Machine Learning, and
Google AutoML, and IBM Watson Analytics,
Algebra - ANSWER: Number one is that it allows you to scale up. The solution you
create to a problem should deal efficiently with many instances at once. Basically
create it once, run it many times. And the other one closely related to that is the
ability to generalize. Your solution should not apply to just a few specific cases with
what's called Magic Numbers, but to cases that vary in a wide range of arbitrary
ways, so you want to prepare for as many contingencies as possible
Calculus - ANSWER: to do a maximization and a minimization, when you're trying to
find the balance between these disparate demands.
Optimization and the combinatorial explosion - ANSWER: You're trying to find an
optimum solution, but randomly going through every possibility doesn't work. This is
called the combinatorial explosion because the growth is explosive as the number of
units and the number of possibilities rises and so you need to find another way that
can save you some time and still help you find an optimum solution.
Bayes' theorem - ANSWER: What Bayes' Theorem does is it gives you the posterior or
after-the-data probability of a hypothesis as a function of the likelihood of the data
given the hypothesis, the prior probability of the hypothesis and the probability of
getting the data you found.
Descriptive analyses - ANSWER: descriptive analyses are one way of doing this. It's a
little like cleaning up the mess in your data to find clarity in the meaning of what you
have. And I like to think that there are three very general steps to descriptive
statistics. Number one, visualize your data, make a graph and look at it. Number two,
compute univariate descriptive statistics. There's things like the mean. It's an easy
way of looking at one variable at a time. And then go on to measures of association,
or the connection between the variables in your data.
Steps for Descriptive Analyses - ANSWER: looking at your data through charts, i.e
Historgram.
skews - ANSWER: **positively-skewed distributions ie Think of the valuations at
companies, the cost of houses. negative skew, where most of the people are at the
high end and the trailing ones are at the low end. If you think of something like birth
weight. U-shaped distribution:polarizing movie and the reviews that it gets**
what's a univariate descriptive - ANSWER: you can look for one number that might
be able to represent the entire collection.
measures that give you a numerical description of association - ANSWER: correlation
coefficient or regression analysis
JOURNEY LATEST 2024 ACTUAL TEST QUESTIONS AND
CORRECT DETAILED ANSWERS (100% VERIFIED
SOLUTION)
Data preparation Time - ANSWER: data preparation 80%, and everything else falls
into about 20%
GIGO - ANSWER: garbage in, garbage out. That's a truism from computer science.
The information you're going to get from your analysis is only as good as the
information that you put into it
Upside to In-house data - ANSWER: It's the fastest way to start., you may actually be
able to talk with the people who gathered the data in the first place.
Downside to In-house data - ANSWER: if it was an ad-hoc project, it may not be well
documented. And the biggest one is the data simply may not exist. Maybe what you
need really isn't there in your organization.
Open data - ANSWER: Basically it's data that is free because it has no cost and it's
free to use that you can integrate in your projects. Sources: Number one is
government data, number two is scientific data and the third one is data from social
media and tech companies
APIs - ANSWER: An API or Application Programming Interface isn't a source of data
but rather it's a way of sharing data, it can take data from one application to
another. Uses JSON files
Scraping data - ANSWER: Data scraping is, in a sense, the found art of data science.
It's when you take the data that's around you, tables on pages and graphs in
newspapers, and integrate that information into your data science work. Unlike the
data that's available with API's or Application Programming Interfaces, which is
specifically designed for sharing, Data scraping is for data that isn't necessarily
created with that integration in mind.
Scraping Data and Ethics - ANSWER: there's still legal and ethical constraints that you
need to be aware of. For instance, you need to respect people's privacy. If the data is
private, you still need to maintain that privacy. You need to respect copyright. Just
because something's on the web doesn't mean that you can use it for whatever you
want. The idea here is Visible Doesn't Mean Open just like in an open market just
because it's there in front of you and doesn't have a price tag doesn't mean it's free.
There are still these important elements of laws, policies, social practices that need
to be maintained to not get yourself in some very serious trouble. And so keep that
in mind when you're doing Data scraping.
,Creating data/Get your own Data - ANSWER: natural observation, informal
discussions with, for instance, potential clients. You can do this in person in a one on
one, or a focus group setting. You can do it online through email, or through chat,
and this time you're asking specific questions to get the information you need to
focus your own projects.Surveys. Words > Numbers. Let ppl express themselves.
Start general
Research Ethics when gathering data - ANSWER: informed consent,Also sometimes
confidentiality, or anonymity
Passive collection of training data - ANSWER: gathering enormous amounts of data
doesn't always involve enormous amounts of work. In certain respects, you can just
sit there and wait for it to come to you. Photo Classificaiton. issue with this:One, and
this is actually a huge issue, is that you need to ensure that you have adequate
representation; things like categorizing photos/ limit cases
Self-generated data - ANSWER: external reinforcement learning.generative
adversarial networks. internal
The enumeration of explicit rules - ANSWER: business strategies, flowcharts, Or
criteria for medical diagnoses.
expert system - ANSWER: An expert system is an approach to machine decision-
making in which algorithms are designed that mimic the decision-making process of
a human domain expert.
linear regression - ANSWER: linear regression, which is a common and powerful
technique for combining many variables in an equation to predict a single outcome.
decision tree
decision tree - ANSWER: This is a whole series, a sequence of binary decisions, based
on your data, that can combine to predict an outcome. It's called a tree because it
branches out from one decision to the next
Neural networks - ANSWER: look at things in a different way than humans do and in
certain situations they're able to develop rules for classification, even when humans
can't see anything more than static.
implicit rules - ANSWER: o the implicit rules are rules that help the algorithms
function. They are the rules that they develop by analyzing the test data. And they're
implicit because they cannot be easily described to humans.
Microsoft Excel and its many versions. Google Sheets - ANSWER: spreadsheets the
universal data tool. It's my untested theory that there are more datasets in
spreadsheets than in any other format in the world. The rows and columns are very
familiar to a very large number of people and they know how to explore the data
and access it using those tools. The most common by far
, MLaaS - ANSWER: machine learning as a service.Amazon Machine Learning, and
Google AutoML, and IBM Watson Analytics,
Algebra - ANSWER: Number one is that it allows you to scale up. The solution you
create to a problem should deal efficiently with many instances at once. Basically
create it once, run it many times. And the other one closely related to that is the
ability to generalize. Your solution should not apply to just a few specific cases with
what's called Magic Numbers, but to cases that vary in a wide range of arbitrary
ways, so you want to prepare for as many contingencies as possible
Calculus - ANSWER: to do a maximization and a minimization, when you're trying to
find the balance between these disparate demands.
Optimization and the combinatorial explosion - ANSWER: You're trying to find an
optimum solution, but randomly going through every possibility doesn't work. This is
called the combinatorial explosion because the growth is explosive as the number of
units and the number of possibilities rises and so you need to find another way that
can save you some time and still help you find an optimum solution.
Bayes' theorem - ANSWER: What Bayes' Theorem does is it gives you the posterior or
after-the-data probability of a hypothesis as a function of the likelihood of the data
given the hypothesis, the prior probability of the hypothesis and the probability of
getting the data you found.
Descriptive analyses - ANSWER: descriptive analyses are one way of doing this. It's a
little like cleaning up the mess in your data to find clarity in the meaning of what you
have. And I like to think that there are three very general steps to descriptive
statistics. Number one, visualize your data, make a graph and look at it. Number two,
compute univariate descriptive statistics. There's things like the mean. It's an easy
way of looking at one variable at a time. And then go on to measures of association,
or the connection between the variables in your data.
Steps for Descriptive Analyses - ANSWER: looking at your data through charts, i.e
Historgram.
skews - ANSWER: **positively-skewed distributions ie Think of the valuations at
companies, the cost of houses. negative skew, where most of the people are at the
high end and the trailing ones are at the low end. If you think of something like birth
weight. U-shaped distribution:polarizing movie and the reviews that it gets**
what's a univariate descriptive - ANSWER: you can look for one number that might
be able to represent the entire collection.
measures that give you a numerical description of association - ANSWER: correlation
coefficient or regression analysis