Areas and Complexities in Data Science
Data Science Field and Terminologies
Data Science: multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract
knowledge and insights from structured and unstructured data.
Big Data: large and complex datasets that cannot be easily managed or processed by traditional data-processing
software.
Data Mining: process of discovering patterns and knowledge from large datasets using statistical and
mathematical methods.
Machine Learning: subfield of data science that deals with the design and development of algorithms that can
learn and make predictions or decisions based on data.
Areas and Complexities in Data Science
Data Collection: the process of gathering and measuring information on variables of interest, in an established
systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate
outcomes.
Data Pre-processing: the process of transforming raw data into an understandable format, which includes
cleaning, normalization and transformation of data.
Data Analysis: the process of inspecting, cleaning, transforming, and modeling data to discover useful
information, draw conclusions, and support decision-making.
Data Visualization: the representation of data in a graphical format. It helps to analyze and illuminate patterns,
trends and outliers in groups of data.
Data Interpretation: the process of understanding and making sense of the data and the insights generated from
it.
Data Science Disciplines and Intersections
Computer Science: provides the theoretical and algorithmic foundations for data science.
Statistics: provides the mathematical foundations for data science.
Mathematics: provides the theoretical foundations for data science.
Domain Expertise: knowledge and understanding of a specific field or industry.
Complexities in Data Science
Data Quality: refers to the issues that exist in data that can affect its ability to be used effectively.
Data Security and Privacy: protecting data from unauthorized access, use, disclosure, disruption, modification, or
destruction.
Data Bias: is a phenomenon where the data that is used to train a machine learning model contains some form
of bias which can result in the model making biased predictions.
Data Scale and Complexity: the sheer volume and variety of data being generated is increasing exponentially.
Data Interpretation: understanding and making sense of the data and the insights generated from it can be
challenging.
Note: Data science is a complex and multi-disciplinary field that deals with extracting insights from data. The process of
data science includes data collection, pre-processing, analysis, visualization, and interpretation. Data science has various
disciplines such as computer science, statistics, mathematics, and domain expertise. Along with these there are various
complexities that are involved in data science such as data quality, security and privacy, bias, scale and complexity, and
interpretation.
Data Science Field and Terminologies
Data Science: multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract
knowledge and insights from structured and unstructured data.
Big Data: large and complex datasets that cannot be easily managed or processed by traditional data-processing
software.
Data Mining: process of discovering patterns and knowledge from large datasets using statistical and
mathematical methods.
Machine Learning: subfield of data science that deals with the design and development of algorithms that can
learn and make predictions or decisions based on data.
Areas and Complexities in Data Science
Data Collection: the process of gathering and measuring information on variables of interest, in an established
systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate
outcomes.
Data Pre-processing: the process of transforming raw data into an understandable format, which includes
cleaning, normalization and transformation of data.
Data Analysis: the process of inspecting, cleaning, transforming, and modeling data to discover useful
information, draw conclusions, and support decision-making.
Data Visualization: the representation of data in a graphical format. It helps to analyze and illuminate patterns,
trends and outliers in groups of data.
Data Interpretation: the process of understanding and making sense of the data and the insights generated from
it.
Data Science Disciplines and Intersections
Computer Science: provides the theoretical and algorithmic foundations for data science.
Statistics: provides the mathematical foundations for data science.
Mathematics: provides the theoretical foundations for data science.
Domain Expertise: knowledge and understanding of a specific field or industry.
Complexities in Data Science
Data Quality: refers to the issues that exist in data that can affect its ability to be used effectively.
Data Security and Privacy: protecting data from unauthorized access, use, disclosure, disruption, modification, or
destruction.
Data Bias: is a phenomenon where the data that is used to train a machine learning model contains some form
of bias which can result in the model making biased predictions.
Data Scale and Complexity: the sheer volume and variety of data being generated is increasing exponentially.
Data Interpretation: understanding and making sense of the data and the insights generated from it can be
challenging.
Note: Data science is a complex and multi-disciplinary field that deals with extracting insights from data. The process of
data science includes data collection, pre-processing, analysis, visualization, and interpretation. Data science has various
disciplines such as computer science, statistics, mathematics, and domain expertise. Along with these there are various
complexities that are involved in data science such as data quality, security and privacy, bias, scale and complexity, and
interpretation.