What are the 3 types of big data? - Answers structured, unstructured, semi-structured
What percent of data is structured? - Answers 5-10%
What percent of data is unstructured? - Answers 80%
What percent of data is semi-structured? - Answers 5-10%
What type of data is email? - Answers unstructured
What is structured data? - Answers Data that can be processed in a fixed format. Highly organized
information that can be readily available
An excel spreadsheet would be which type of data? - Answers Structured, due to organized rows and
columns
What are the 7 steps of the Data Science Lifestyle? - Answers Business Understanding, Data Mining,
Data Cleaning, Data Exploration, Feature Engineering, Predictive Modeling, Data Visualization
What is Business Understanding? - Answers Asking relevant questions and defining objectives for the
problem that needs to be solved.
What is Data Mining? - Answers "mining the data"
-gathering and scoping the data necessary for the project
What is Data Cleaning? - Answers "cleaning the data"
-fixing the inconsistencies in the data and handling missing values
What is Data Exploration? - Answers "exploring the data"
-forming a hypothesis about your defined problem by visually analyzing the data
What is Feature Engineering? - Answers Selecting important features & constructing more meaningful
ones using the raw data you have.
What is Predictive Modeling? - Answers -"using models to make predictions"
-train machine learning models, evaluate their performance and use them to make predictions.
What is Data Visualization? - Answers "visualizing the data"
-communicate the findings with key shareholders using plots and interactive visualizations.
What is data science? - Answers The practice of mining large data sets of raw data to identify patterns
and get insight from them.
-raw data is both structured and unstructured
What is AI? - Answers Getting a computer to mimic human behavior in some way.
What is machine learning? - Answers subset of AI
-giving the computer a brain to learn from data
-helps computers figure things out from data and deliver AI applications
What is one of the most common tools data scientists use? - Answers open source notebooks- web
applications for writing & running code and visualizing data all in one place
ex: Jupyter, RStudio, & Zeppelin
Who oversees data science process? - Answers Business managers, IT managers, and Data Science
managers
What is the contribution of the business manager? - Answers develop the problem and develop a
strategy of analysis
What is the contribution of the IT manager? - Answers may help build and update IT environments for
data science teams
-monitor operations and resource usage
What is the contribution of the data science manager? - Answers oversees data science team
team builders who balance development with project planning and monitoring
Who is the most important role in the data science process? - Answers The data scientist.
What is secondary use? - Answers Information collected for one purpose used for another
How is Google's Personalized Search a secondary source? - Answers The information collected from
Google about your search queries and Web pages can be used by companies/retailers for direct
marketing.
What is collaborative filtering? - Answers Analyzing information about preferences of large numbers
of people to predict what one person may prefer.
What are the explicit and implicit methods of collaborative filtering? - Answers Explicit filtering- asking
people to rank preferences,