QUESTIONS WITH VERIFIED ANSWERS 2025/2026
data preparation 80%, and everything else falls into about 20% - CORRECT
ANSWER Data preparation Time
garbage in, garbage out. That's a truism from computer science. The information
you're going to get from your analysis is only as good as the information that you
put into it - CORRECT ANSWER GIGO
It's the fastest way to start., you may actually be able to talk with the people who
gathered the data in the first place. - CORRECT ANSWER Upside to In-house data
if it was an ad-hoc project, it may not be well documented. And the biggest one is
the data simply may not exist. Maybe what you need really isn't there in your
organization. - CORRECT ANSWER Downside to In-house data
Basically it's data that is free because it has no cost and it's free to use that you
can integrate in your projects. Sources: Number one is government data, number
two is scientific data and the third one is data from social media and tech
companies - CORRECT ANSWER Open data
An API or Application Programming Interface isn't a source of data but rather it's a
way of sharing data, it can take data from one application to another. Uses JSON
files - CORRECT ANSWER APIs
,Data scraping is, in a sense, the found art of data science. It's when you take the
data that's around you, tables on pages and graphs in newspapers, and integrate
that information into your data science work. Unlike the data that's available with
API's or Application Programming Interfaces, which is specifically designed for
sharing, Data scraping is for data that isn't necessarily created with that
integration in mind. - CORRECT ANSWER Scraping data
there's still legal and ethical constraints that you need to be aware of. For
instance, you need to respect people's privacy. If the data is private, you still need
to maintain that privacy. You need to respect copyright. Just because something's
on the web doesn't mean that you can use it for whatever you want. The idea
here is Visible Doesn't Mean Open just like in an open market just because it's
there in front of you and doesn't have a price tag doesn't mean it's free. There are
still these important elements of laws, policies, social practices that need to be
maintained to not get yourself in some very serious trouble. And so keep that in
mind when you're doing Data scraping. - CORRECT ANSWER Scraping Data and
Ethics
natural observation, informal discussions with, for instance, potential clients. You
can do this in person in a one on one, or a focus group setting. You can do it online
through email, or through chat, and this time you're asking specific questions to
get the information you need to focus your own projects.Surveys. Words >
Numbers. Let ppl express themselves. Start general - CORRECT ANSWER Creating
data/Get your own Data
informed consent,Also sometimes confidentiality, or anonymity - CORRECT
ANSWER Research Ethics when gathering data
,gathering enormous amounts of data doesn't always involve enormous amounts
of work. In certain respects, you can just sit there and wait for it to come to you.
Photo Classificaiton. issue with this:One, and this is actually a huge issue, is that
you need to ensure that you have adequate representation; things like
categorizing photos/ limit cases - CORRECT ANSWER Passive collection of training
data
external reinforcement learning.generative adversarial networks. internal -
CORRECT ANSWER Self-generated data
business strategies, flowcharts, Or criteria for medical diagnoses. - CORRECT
ANSWER The enumeration of explicit rules
An expert system is an approach to machine decision-making in which algorithms
are designed that mimic the decision-making process of a human domain expert. -
CORRECT ANSWER expert system
linear regression, which is a common and powerful technique for combining many
variables in an equation to predict a single outcome. decision tree - CORRECT
ANSWER linear regression
This is a whole series, a sequence of binary decisions, based on your data, that can
combine to predict an outcome. It's called a tree because it branches out from
one decision to the next - CORRECT ANSWER decision tree
look at things in a different way than humans do and in certain situations they're
able to develop rules for classification, even when humans can't see anything
more than static. - CORRECT ANSWER Neural networks
, o the implicit rules are rules that help the algorithms function. They are the rules
that they develop by analyzing the test data. And they're implicit because they
cannot be easily described to humans. - CORRECT ANSWER implicit rules
spreadsheets the universal data tool. It's my untested theory that there are more
datasets in spreadsheets than in any other format in the world. The rows and
columns are very familiar to a very large number of people and they know how to
explore the data and access it using those tools. The most common by far -
CORRECT ANSWER Microsoft Excel and its many versions. Google Sheets
machine learning as a service.Amazon Machine Learning, and Google AutoML,
and IBM Watson Analytics, - CORRECT ANSWER MLaaS
Number one is that it allows you to scale up. The solution you create to a problem
should deal efficiently with many instances at once. Basically create it once, run it
many times. And the other one closely related to that is the ability to generalize.
Your solution should not apply to just a few specific cases with what's called Magic
Numbers, but to cases that vary in a wide range of arbitrary ways, so you want to
prepare for as many contingencies as possible - CORRECT ANSWER Algebra
to do a maximization and a minimization, when you're trying to find the balance
between these disparate demands. - CORRECT ANSWER Calculus
You're trying to find an optimum solution, but randomly going through every
possibility doesn't work. This is called the combinatorial explosion because the
growth is explosive as the number of units and the number of possibilities rises
and so you need to find another way that can save you some time and still help