Samenvatting

Summary SUMMERY DTZ2025

Beoordeling

Verkocht

Pagina's

Geüpload op

16-06-2025

Geschreven in

2024/2025

summery van de onderwijsgroepen

Instelling

Vak

Voorbeeld van de inhoud

Samenvatting DTZ: tutorials

Week 1

Tutorial 1

Literature

 Chapter 1 and chapter 2 of the Fundamentals of Clinical Data Science Book
 Healthcare Big Data and the Promise of Value-Based
Care https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290Links to an
external site.
 Understanding the data science life-
cycle https://docs.google.com/document/d/1JAp-
YREYLF6E9dWYgWCuoyQOEqaMFhoXCJImlw8m85k/edit?
usp=sharingLinks to an external site.
 Understand the data science life-cycle with a specific case: Doing Data Science:
A Framework and Case Study by Sallie Ann Keller, Stephanie S. Shipp, Aaron
D. Schroeder, and Gizem
Korkmaz https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/8Links to an
external site.

Additional readings (non mandatory):

 A brief history of Data Science: https://scientistcafe.com/ids/a-brief-history-of-
data-science.htmlLinks to an external site.

- Termen over big data
- Challenges and opportunities that big data brings
- Different data sources and data types
- Understanding the data science life-cycle
o What is it and what it used for
- Wat is het verschil tussen data sources en data at scale
- Chapter 1 en 2 waar gaan deze over
o Data sources

, - Why do we need datascience?

Fundamental Book chapter 1 en 2

Chapter 1: data sources
1. Data source
Electronic medical records
Other medical information systems
Mobile apps
Internet of things and big data
Social media

2. GDPR  general data protection regulation

3. Data types:
- Tabular data
- Time series
- Natural language
- Images and video’s
4. Data standards

- Definition of data elements—determination of the data content to be
collected and exchanged.

- Data interchange formats—standard formats for electronically
encoding the data elements (including sequencing and error handling). Interchange
standards
can also include document architectures for structuring data
elements as they are exchanged and information models that define the
relationships among data elements in a message.

- Terminologies—the medical terms and concepts used to describe,
classify, and code the data elements and data expression languages and syntax that
describe the relationships among the terms/concepts.

- Knowledge Representation—standard methods for electronically
representing medical literature, clinical guidelines, and the like for decision
support.

Chapter 2: data at scale

Data fragmentation occurs when a collection of data in memory is broken
up into many pieces that are not close together. The
problem becomes even more enhanced when willing to perform
multicenter studies.

,new technologies / scanners enabling the possibility to acquire images of a
patient in less than a second have determined what has been called ‘data
explosion’ [3] for medical imaging data.

Missing values happen when no data value is stored for the variable in an
observation [4]. Missing data is a common occurrence and can have a
significant
effect on the conclusions that can be drawn from the data common
occurrence.
Statistical techniques such as data imputation (explained later in the book)
could be
used to replace missing values.
Unstructured data is information that either does not have a pre-defined
data
model or is not organized in a pre-defined manner [5]. A data model is an
agreement
between several institutions on the format and database structure of
storing data.
Unstructured information is typically text-heavy, but may contain data
such
as dates, numbers, and facts as well. But also audiovisual, locations,
sensors data.

the terms big (clinical) data refers to not only a large volume of
data, but on a large volume of complex, unstructured and fragmented data
coming from different sources.

Hospitals generate large volumes of clinical data, stored across different
departments and systems. These systems often use incompatible formats,
making it hard to share or combine data. This fragmentation is a major
challenge, especially for multi-center studies. At the same time, data production
is growing exponentially, especially with advances in imaging and digital
technology. However, our ability to process and analyze this data hasn’t
kept pace. Many datasets have missing values and lack a clear structure,
making them hard to use for machine learning or predictive models.

2.2 big clinical data: the four v’s

Fragmentation: that data is collected in different formats
and stored in various separated databases

The community agrees that big data can be summarized by the four ‘V’
con-
cepts: volume, variety, velocity, and veracity.

1. Volume: volume of data exponentially increases every day, since not
only
humans, but also and especially machines are producing faster and faster
new

, information (refer to previous example of ‘data explosion’ in medical
imaging,
but also “Internet of Things”). In the community, data of the order of
Terabyte
and larger is considered as ‘big volume’. Volume contributes to the big
issue that
traditional storage systems such as traditional database are not suitable
anymore
to welcome a huge amount of data.
2. Variety: big data comes from different sources and are stored in
different formats:
(a) (b) Different types: in the past, major sources of clinical data were
databases or
spreadsheets. Now data can come under the form of free text (electronic
report) or images (patients’ scans). This type of data is usually
characterized
by structured or, less often, semi-structured data (e.g. databases with
some
missing values or inconsistencies)
Different sources: variety is also used to mean that data can come from
differ-
ent sources. These sources do not necessarily belong to the same
institution.
Variety affects both data collection and storage. Two major challenges
must be faced:
(a) storing and retrieving this data in an efficient and cost-effective way,
(b) aligning
data types from different sources, so that all the data is mined at the same
time.
There is also an additional complexity due to interaction between variety
and
volume. In fact, unstructured data is growing much faster than structured
data. An
estimation says that unstructured data doubles around every 3 months
[1].
Therefore, the complexity and fragmentation of data is far from being
slowed down:
we will have to deal with much more unstructured data than we expected.
3. Velocity: the production of big data (by machines or humans) is a
continuous
and massive flow.
(a) (b) Data in motion and real time big data analytics: big data are
produced ‘real time’
and most of the time need to be analyzed ‘real time’. Therefore, an
architecture
for capturing and mining big data flows must support real-time turnaround.
Lifetime of data utility: a second dimension of data velocity is for how long
data will be valuable. Understanding this additional ‘temporal’ dimension
of

Meld schending auteursrecht

Geschreven voor

Instelling: Maastricht University (UM)
Studie: Gezondheidswetenschappen
Vak: Datascience in Healthcare (DTZ2025)

Alle documenten voor dit vak (2)

Documentinformatie

Geüpload op: 16 juni 2025
Aantal pagina's: 51
Geschreven in: 2024/2025
Type: SAMENVATTING

Onderwerpen

datascience
healthcare
2025
python
programming

$10.11

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

LB2004

Maak kennis met de verkoper

LB2004 Maastricht University

Bekijk profiel

Volgen

Verkocht

Lid sinds

9 jaar

Aantal volgers

Documenten

Laatst verkocht

0.0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper LB2004. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $10.11. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 49904 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary SUMMERY DTZ2025

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?