Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Antwoorden Examenvragen

Beoordeling
-
Verkocht
4
Pagina's
48
Geüpload op
03-06-2021
Geschreven in
2020/2021

In dit document staan uitgeschreven antwoorden van meer dan 100 vragen voor het vak Data Engineering, gegeven door Len Feremans

Instelling
Vak

Voorbeeld van de inhoud

Exam Questions Data Engineering

Week 1: Introduction, file formats, python for data engineering
● What is a data pipeline? When is a data pipeline expected to finish?
Which are other, technical requirements, that are ensured by a data
engineer?

○ what is data pipeline:




■ A data pipeline is a series of data processing steps. It consists
of three key elements:
● Data source(s).
● Processing step(s).
● Destination: data warehouse.
■ different data sources within organization
■ extract data and put in central repository in structured format
via ETL (extract/transform/load)
■ data pipeline can contain machine learning models
■ data processing is either
● real-time: online/streaming
● once per day: offline/batch
○ When is a data pipeline expected to finish? (the answer on this
question are our own thoughts because we think that he didn’t say
anything about this during classes)
■ A data pipeline needs to be updated constantly and must be
available at all times to support the business processes of the
organization. Therefore, a data pipeline is only expected to
finish if a better data pipeline is implemented or if the
business processes (which this data pipeline supports) cease
to exist.
■ Real-time := online/streaming processing (link week 8)
● Eg. User goes to Dreamland: the products they get on
the page is real-time, there’s some query that goes to
database and they get result immediately
■ Once per hour/day := offline/batch processing (link week 8)
○ Data engineer:
■ data engineer is responsible for implementing necessary
components for managing the data flow to enable data
scientists to do analysis and gain necessary insights
1

, ■ data engineer ensures processing is:
● scalable: support huge amount of users (link with
distributed processing)
● reliable/available: min downtime and operational robust
(back-ups and online appli’s available 24/7)
● maintainable: support continuous change (software
and hardware updates)

● We saw three different data models for representing data? Name and
provide a short summary of each data model.
○ The relational model:
■ Consists of tables and rows (or tuples /records)
■ Each column contains primitive value such as string, integer,
float or date
■ Two types of tables:
● Entities, i.e. Persons, groups, objects
● Relations between entities: i.e. part-of, has-a, has-many,
linked-to
■ Each table can be saved as Comma-Seperated-Values (or CSV)
file
Strengths Weaknesses

structured static and less flexible schema

schema checking joins = necessary evil (they are
complex)

natural model for batch
processing

flexible queries

○ The document-oriented model:
■ Consists of keys and documents, that is, each key is associated
with one document
■ Document is a tree containing:
● Primitive values
● Nested entities
● On-to-many relations
■ Each document can be stored (and transferred) in JSON or XML
Strengths Weaknesses

structured no static schema checking

flexible: dynamic scheme less flexible queries
checking

natural model for tree many intra document relations

2

, structured data

performance

○ The graph-oriented model:
■ Consists of nodes and edges
■ A node is an instance of an entity and has a unique ID
■ An edge is a relation between two nodes and has a unique ID
■ A node and edge have named properties with a primitive value
Strengths Weaknesses

structured no static schema checking

flexible: schema can be easily used less in industry
changed (academic model)

natural model for when used in domains where
everything is connected with everything is connected
each other f.ex. social through everything (not really a
networks weakness said Len)

variable number of joins

● What are the strengths and weaknesses of the relation model versus the
document-oriented model? Which model would you prefer?


Relational model Document-oriented model

Strengths Weaknesses Strengths Weaknesses

structured static and less structured no static
flexible schema schema
checking

schema joins = flexible less flexible
checking necessary evil queries

natural model natural model many Intra
for batch (when data is document
processing tree-structured relations
with few intra
document (or
many-to-many)
relations)

flexible queries performance

○ Which model would you prefer?
3

, ■ Each model is widely used for different purposes, there is no
one-size-fits-all solution !!!
■ Decision depends on domain, that is, the structure of the data and
type of application
■ Mixed systems are available, for instance, JSON columns are
supported in most Relational databases these days.

● Which file formats are used for storing and communication data?
Provide two short examples in JSON and XML for storing student
grades.
○ CSV = Comma-Seperated-Values:
■ A plain text format
■ Represents single table in relational data model
■ values can be surrounded by “ “ marks.
■ Used very commonly for batch processing, export/input
larger amounts of data
■ Easy to partition, (i.e. 2020-10-01_sales.csv,
2020-10-02_sales.csv (= sales data for each month))
■ Can be easily compressed using zip

⇒ CSV is niet echt gebruikt voor communicating data dus denk bij deze vraag
da ge alleen JSON en XML moet geven

○ JSON = JavaScript Object Notation:
■ A plain text format
■ Same syntax as data in Python and Javascript
■ Represents single tree of data in document-oriented model
■ makes use of arrays and dictionaries
■ Common format for sharing data between client (browser)
and server or communicating data between any two
applications / services
■ For configuration of applications / services
■ Typically single JSON documents is small, but NoSQL
databases such as MongoDB store millions of documents
with a unique ID for each document

○ XML = eXtensible Markup Language:
■ Represents single tree of data in document-oriented model
■ Common format for sharing data between client (browser)
and server or communicating data between any two
applications / services
■ instead of arrays and dicts, it uses TAGS (<>) with
attributes
■ For communication and configuration of applications /
services
■ XHTML, for formatting web-pages, is a type of XML
■ (As the name suggests XML is not really a format, but a
4

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
3 juni 2021
Aantal pagina's
48
Geschreven in
2020/2021
Type
SAMENVATTING

Onderwerpen

$9.93
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
arnoverlinden2014

Maak kennis met de verkoper

Seller avatar
arnoverlinden2014 Universiteit Antwerpen
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
7
Lid sinds
4 jaar
Aantal volgers
7
Documenten
2
Laatst verkocht
1 jaar geleden

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen