Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary SUMMERY DTZ2025

Beoordeling
-
Verkocht
-
Pagina's
51
Geüpload op
16-06-2025
Geschreven in
2024/2025

summery van de onderwijsgroepen

Instelling
Vak

Voorbeeld van de inhoud

Samenvatting DTZ: tutorials


Week 1

Tutorial 1

Literature

 Chapter 1 and chapter 2 of the Fundamentals of Clinical Data Science Book
 Healthcare Big Data and the Promise of Value-Based
Care https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290Links to an
external site.
 Understanding the data science life-
cycle https://docs.google.com/document/d/1JAp-
YREYLF6E9dWYgWCuoyQOEqaMFhoXCJImlw8m85k/edit?
usp=sharingLinks to an external site.
 Understand the data science life-cycle with a specific case: Doing Data Science:
A Framework and Case Study by Sallie Ann Keller, Stephanie S. Shipp, Aaron
D. Schroeder, and Gizem
Korkmaz https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/8Links to an
external site.

Additional readings (non mandatory):

 A brief history of Data Science: https://scientistcafe.com/ids/a-brief-history-of-
data-science.htmlLinks to an external site.




- Termen over big data
- Challenges and opportunities that big data brings
- Different data sources and data types
- Understanding the data science life-cycle
o What is it and what it used for
- Wat is het verschil tussen data sources en data at scale
- Chapter 1 en 2 waar gaan deze over
o Data sources

, - Why do we need datascience?




Fundamental Book chapter 1 en 2

Chapter 1: data sources
1. Data source
Electronic medical records
Other medical information systems
Mobile apps
Internet of things and big data
Social media

2. GDPR  general data protection regulation

3. Data types:
- Tabular data
- Time series
- Natural language
- Images and video’s
4. Data standards

- Definition of data elements—determination of the data content to be
collected and exchanged.

- Data interchange formats—standard formats for electronically
encoding the data elements (including sequencing and error handling). Interchange
standards
can also include document architectures for structuring data
elements as they are exchanged and information models that define the
relationships among data elements in a message.

- Terminologies—the medical terms and concepts used to describe,
classify, and code the data elements and data expression languages and syntax that
describe the relationships among the terms/concepts.

- Knowledge Representation—standard methods for electronically
representing medical literature, clinical guidelines, and the like for decision
support.

Chapter 2: data at scale

Data fragmentation occurs when a collection of data in memory is broken
up into many pieces that are not close together. The
problem becomes even more enhanced when willing to perform
multicenter studies.

,new technologies / scanners enabling the possibility to acquire images of a
patient in less than a second have determined what has been called ‘data
explosion’ [3] for medical imaging data.

Missing values happen when no data value is stored for the variable in an
observation [4]. Missing data is a common occurrence and can have a
significant
effect on the conclusions that can be drawn from the data common
occurrence.
Statistical techniques such as data imputation (explained later in the book)
could be
used to replace missing values.
Unstructured data is information that either does not have a pre-defined
data
model or is not organized in a pre-defined manner [5]. A data model is an
agreement
between several institutions on the format and database structure of
storing data.
Unstructured information is typically text-heavy, but may contain data
such
as dates, numbers, and facts as well. But also audiovisual, locations,
sensors data.

the terms big (clinical) data refers to not only a large volume of
data, but on a large volume of complex, unstructured and fragmented data
coming from different sources.

Hospitals generate large volumes of clinical data, stored across different
departments and systems. These systems often use incompatible formats,
making it hard to share or combine data. This fragmentation is a major
challenge, especially for multi-center studies. At the same time, data production
is growing exponentially, especially with advances in imaging and digital
technology. However, our ability to process and analyze this data hasn’t
kept pace. Many datasets have missing values and lack a clear structure,
making them hard to use for machine learning or predictive models.


2.2 big clinical data: the four v’s

Fragmentation: that data is collected in different formats
and stored in various separated databases

The community agrees that big data can be summarized by the four ‘V’
con-
cepts: volume, variety, velocity, and veracity.

1. Volume: volume of data exponentially increases every day, since not
only
humans, but also and especially machines are producing faster and faster
new

, information (refer to previous example of ‘data explosion’ in medical
imaging,
but also “Internet of Things”). In the community, data of the order of
Terabyte
and larger is considered as ‘big volume’. Volume contributes to the big
issue that
traditional storage systems such as traditional database are not suitable
anymore
to welcome a huge amount of data.
2. Variety: big data comes from different sources and are stored in
different formats:
(a) (b) Different types: in the past, major sources of clinical data were
databases or
spreadsheets. Now data can come under the form of free text (electronic
report) or images (patients’ scans). This type of data is usually
characterized
by structured or, less often, semi-structured data (e.g. databases with
some
missing values or inconsistencies)
Different sources: variety is also used to mean that data can come from
differ-
ent sources. These sources do not necessarily belong to the same
institution.
Variety affects both data collection and storage. Two major challenges
must be faced:
(a) storing and retrieving this data in an efficient and cost-effective way,
(b) aligning
data types from different sources, so that all the data is mined at the same
time.
There is also an additional complexity due to interaction between variety
and
volume. In fact, unstructured data is growing much faster than structured
data. An
estimation says that unstructured data doubles around every 3 months
[1].
Therefore, the complexity and fragmentation of data is far from being
slowed down:
we will have to deal with much more unstructured data than we expected.
3. Velocity: the production of big data (by machines or humans) is a
continuous
and massive flow.
(a) (b) Data in motion and real time big data analytics: big data are
produced ‘real time’
and most of the time need to be analyzed ‘real time’. Therefore, an
architecture
for capturing and mining big data flows must support real-time turnaround.
Lifetime of data utility: a second dimension of data velocity is for how long
data will be valuable. Understanding this additional ‘temporal’ dimension
of

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
16 juni 2025
Aantal pagina's
51
Geschreven in
2024/2025
Type
SAMENVATTING

Onderwerpen

$10.11
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
LB2004

Maak kennis met de verkoper

Seller avatar
LB2004 Maastricht University
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
9 jaar
Aantal volgers
0
Documenten
2
Laatst verkocht
-

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen