Data engineering
- Data engineers are the designers, builders, and managers of the information or ‘big data’
infrastructure
o They develop the architecture that helps analyze and process data in the way the
organization needs it
o And they make sure those systems are performing smoothly
Hierarchy of needs
Stages in a Big Data pipeline
,General pipeline components
Data engineering and processing:
- Underlies (necessary for) Data Science and data-driven decision making
- Has other positive effects on data processing
Data mesh
Main purpose of a database: storing data and processing it into information
Terminology
- Data: given facts, denoted e.g. by sequences of characters or numbers
- Information: the interpretation of data within a certain context
- Database: a collection of permanently and digitally stored data
, Relational database
- Relationships
- Rows and columns
o Row, records, or tuples
o Columns, or attributes
- General language: SQL
Database Management System
- Providing one logical structure for everyone
- Applications access data at the same time
Different models for organizing data
- A database model is a collection of rules with which it is possible to describe the structure,
the consistency rules, and the behavior of a database
- The database model describes how data are to be structured in a database system and, thus,
in a database management system
NoSQL databases: common classifications
- Column store or column-oriented database
o Data is structured in columns
o Name, value, timestamp
- Document store or document-oriented database
o Data is structured in documents
o Typically in some standard format or encoding
- Key-value store/database
o Data is structured into associative array
o Like a dictionary or hash table
o A collection of objects, which in turn have many different fields within them, each
containing data
- Graph database
o Data is structured in nodes, edges and properties describing the nodes
Structured vs. unstructured data
- Unstructured data
o Text files
- Structured files
o XML, database