QUESTIONS AND ANSWERS FOR EXAM PREP.
2024/2025 UPDATE.
1. Statistical Model: A statistical model is a class of mathematical
model, which embodies a set of assumptions concerning the
generation of some sample data, and similar data from a larger
population. A statistical model represents, often in considerably
idealized form, the data-generating process.
The assumptions embodied by a statistical model describe a set of
probability distributions, some of which are assumed to adequately
approximate the distribution from which a particular data set is
sampled. The probability distributions inherent in statistical models are
what distinguishes statistical models from other, non-statistical,
mathematical models.
A statistical model is usually specified by mathematical equations that
relate one or more random variables and possibly other non-random
variables. As such, "a model is a formal representation of a theory".
All statistical hypothesis tests and all statistical estimators are derived
from statistical models. More generally, statistical models are part of
the foundation of statistical inference.
2. Data Science: Data science is an interdisciplinary field about
processes and systems to extract knowledge or insights from data
in various forms, either structured or unstructured,[l ][2] which is a
continuation of some of the data analysis fields such as statistics,
machine learning, data mining, and predictive analytics,[3] similar
to Knowledge Discovery in Databases (KDD).
Data science employs techniques and theories drawn from many
fields within the broad areas of mathematics, statistics, operations
research,[4] information science, and computer science, including
signal processing, probability models, machine learning, statistical
learning, data mining, database, data engineering, pattern
recognition and learning, visualization, predictive analytics,
1/27
, A+ GRADE-INTRODUCTION TO DATA SCIENCE
QUESTIONS AND ANSWERS FOR EXAM PREP.
2024/2025 UPDATE.
uncertainty modeling, data warehousing, data compression,
computer programming, artificial intelligence, and high
performance computing. Methods that scale to big data are of
particular interest in data science, although the discipline is not
generally considered to be restricted to such big data, and big data
technologies are often focused on organizing and preprocessing
the data instead of analysis. The development of machine learning
has enhanced the growth and importance of data science.
Data science affects academic and applied research in many
domains, including machine translation, speech recognition,
robotics, search engines, digital economy, but also the biological
sciences, medical informatics, health care, social sciences and the
humanities. It heavily influences economics, business and finance.
From the business perspective, data science is an integral part of
competitive intelligence, a newly emerging field that encompasses
a number of activities, such as data mining and data analysis.[5]
3. Data Scientist: Data scientists use their data and analytical
ability to find and interpret rich data sources; manage large
amounts of data despite hardware, software, and bandwidth
constraints; merge data sources; ensure consistency of datasets;
create visualizations to aid in understanding data; build
mathematical models using the data; and present and
communicate the data insights/findings. They are often expected
to produce answers in days rather than months, work by
exploratory analysis and rapid iteration, and to produce and
present results with dashboards (displays of current values) rather
than papers/reports, as statisticians normally do.[6]
4. Data Vizualization: Data visualization or data visualisation is
viewed by many disciplines as a modern equivalent of visual
communication. It involves the creation and study of the visual
representation of data, meaning "information that has been abstracted
2/27
, A+ GRADE-INTRODUCTION TO DATA SCIENCE
QUESTIONS AND ANSWERS FOR EXAM PREP.
2024/2025 UPDATE.
in some schematic form, including attributes or variables for the units
of information".[l]
A primary goal of data visualization is to communicate information
clearly and efficiently via statistical graphics, plots and information
graphics. Numerical data may be encoded using dots, lines, or
bars, to visually communicate a quantitative message.[2] Effective
visualization helps users analyze and reason about data and
evidence. It makes complex data more accessible, understandable
and usable. Users may have particular analytical tasks, such as
making comparisons or understanding causality, and the design
principle of the graphic (i.e., showing comparisons or showing
causality) follows the task. Tables are generally used where users
will look up a specific measurement, while charts of various types
are used to show patterns or relationships in the data for one or
more variables.
Data visualization is both an art and a science. It is viewed as a
branch of descriptive statistics by some, but also as a grounded
theory development tool by others. The rate at which data is
generated has increased. Data created by internet activity and an
expanding number of sensors in the environment, such as
satellites, are referred to as "Big Data". Processing, analyzing and
communicating this data present a variety of ethical and analytical
challenges for data visualization. The field of data science and
practitioners called data scientists have emerged to help address
this challenge.[3]
5. Exploratory Data Analysis: In statistics, exploratory data
analysis (EDA) is an approach to analyzing data sets to summarize
their main characteristics, often with visual methods. A statistical
model can be used or not, but primarily EDA is for seeing what the
data can tell us beyond the formal modeling or hypothesis testing
3/27