1. What is Data Validation
Data validation means checking the accuracy and quality of source data before using, importing
or processing data, it is a form of data cleansing. Data validation is a general term and can be
performed on any type of data, that includes data within a single application (such as Microsoft
Excel) or when merging data within a single data store.
2. Differentiate between Hold on and cross validation
Hold on Cross validation
The training and test set is separated into two The training and test set is separated
equal set randomly into several criteria
Can create lots of misleading estimate Validation will be accurate by using different
forms of cross validation methods
3. Illustrate the flowchart for validation
4. Differentiate between qualitative and quantative data interpretation
Qualitative Quantative
Countable or measurable relating to number Descriptive relating to words and language
Analysed using statistical analysis Analysed by grouping the data into
meaningful theme by observation and
interview
5. What is iteration
The iterative process is an approach to continuously improving a concept, design, or product.
Creators produce a prototype, test it, tweak it, and repeat the cycle with the goal of getting
closer to the solution.
Data Modelling
Data modelling is the process of analysing and establishing the relationships between different
data types which you collect and process for your business. It creates visual representations of
data using text, symbols, diagrams and data modelling concepts which is stored and processed
for your business development.
,Data Model is basically an architect's building plan. It is a process of documenting complex
software system design as in a diagram that can be easily understood. The diagram will be
created using text and symbols to represent how the data will flow. It is also known as the
blueprint for constructing new software or re-engineering any application.
There are three main types of data models that organizations use. These are produced during
the course of planning a project in analytics. They range from abstract to discrete specifications,
involve contributions from a distinct subset of stakeholders, and serve different purposes.
1. Conceptual Model
It is a visual representation of database concepts and the relationships between them identifying
the high-level user view of data. It focuses on establishing entities, characteristics of an entity,
and relationships between them.
2. Logical Model
This model further defines the structure of the data entities that is it includes the attributes and
explains their relationships. Usually, a logical data model is used for a specific project since
the purpose is to develop a technical map of rules and data structures.
3. Physical Data
It is created by using the database language and queries. The physical data model represents
each table, column, constraints like primary key, foreign key etc. The main work of the physical
data model is to create a database. This type of Data Modelling gives us the concept of the
databases and helps to create the graphical representation.
, Data Modeling Types
The best way to picture a data model is to think about a building plan of an architect. An
architectural building plan assists in putting up all subsequent conceptual models, and so does
a data model. These data modeling examples will clarify how data models and the process of
data modeling highlights essential data and the way to arrange it. Below given are 4 different
types of data modeling used to organize the data
1. Hierarchical Model
This data model arranges the data in the form of a tree with one root, to which other data is
connected. The hierarchy begins with the root and extends like a tree. This model effectively
explains several real-time relationships with a single one-to-many relationship between two
different kinds of data.
For example, one supermarket can have different departments and many aisles. Thus, the ‘root’
node supermarket will have two ‘child’ nodes of (1) Pantry, (2) Packaged Food.
2. Network Model
This database model enables many-to-many relationships among the connected nodes. The
data is arranged in a graph-like structure, and here ‘child’ nodes can have multiple ‘parent’
nodes. The parent nodes are known as owners, and the child nodes are called members.
3. Relational Model
This popular data model example arranges the data into tables. The tables have columns and
rows, each cataloguing an attribute present in the entity. It makes relationships between data
points easy to identify.
For example, e-commerce websites can process purchases and track inventory using the
relational model.
4. ER (Entity-Relationship) Model
ER model (Entity-relationship model) is a high-level relational model which is used to define
data elements and relationship for the entities in a system. In this model, the entire database is
represented in a diagram called an entity-relationship diagram, consisting of Entities,
Attributes, and Relationships.