Answers
Big Data
large
volumes
of
complex
data that cannot be processed effectively
using traditional methods/ applications
5 V's Model
1) Volume
2) Velocity
3) Variety
4) Veracity
5) Value
Challenges of big data:
1) Storage
2) Transmission
3) Computation
Data Science
The scientific study of the creation, validation and transformation of data to create meaning
Data Scientist
A professional who uses scientific methods to liberate and create meaning from raw data.
Exploits data available to derive meaningful information using computational, statistical and analytical
tools to convert them into "value"!
Discovery of Value Steps:
1) Data Acquisition
2) Data preprocessing
3) Data Analysis
4) Data Interpretation
Data Acquisition
Data collection from different sources
Data Preprocessing
Data preparation for further processing
Data Analysis
Data is analyzed using complex computational, analytical, and statistical techniques, such as data
mining techniques, data visualization techniques, scatter clouds, forecasting, predictive modeling,
clustering, classification, and advanced time-series analysis
, Data Interpretation
Obtain data driven meaningful results. -Use analytical tools and techniques to derive meaningful
results and insights in order to propose a solution. This requires in-depth knowledge of the business
and the data, and it demands common sense!
Main types of research questions:
1) Inductive
2) Deductive
3) Abductive
Inductive
Data driven. Data tells something new.
Deductive
Theory driven Data validates hypothesis.
Abductive
Mixed model. Combination of both inductive and deductive research.
Categories of Data Generated:
1) Structured
2) Unstructured
3) Semi - structured
Structured Data
Used to capture relationships between different entities and is therefore most often stored in a
relational database
Unstructured data
Does not conform to a data model or data schema. It makes up 80% of the data within any given
enterprise
Semi - structured
Defined level of structure and consistency, but is not relational in nature. Data is hierarchical or
graph-based.
Categories of Data Analytics
1) Descriptive
2) Diagnostic
3) Predictive
4) Prescriptive
Descriptive analytics
Enables to answer questions about events that have occurred via contextualized data to generate
information.