Samenvatting

Samenvatting - Fundamentals of data science (5294FUDS6Y)

Name: Samenvatting - Fundamentals of data science (5294FUDS6Y)
SKU: doc_9070421
Rating: 5.00 (1 reviews)
Author: samirahbakker1107

Beoordeling

5.0

(1)

Verkocht

Pagina's

Geüpload op

06-09-2025

Geschreven in

2024/2025

Summary of fundamentals of data science, fully in english

Instelling

Vak

Voorbeeld van de inhoud

Fundamentals of Data Science Summary Exam
Samirah Bakker

Introduction:
Data Science focuses on exploiting the modern deluge of data for prediction, exploration,
understanding, and intervention.
“ (...) the practice of data science is not just a single step of analyzing a dataset. Rather, it
cycles between data preprocessing, exploration, selection, transformation, analysis, interpretation, and
communication. One of the main priorities for data science is to develop the tools and methods that
facilitate this cycle. “

Python:
- Lists: ordered and mutable collection of objects [a,b,c]
- Can store any type of object
- Flexible yet inefficient → Therefore we have NumPy
- Tuple: ordered and immutable collection of objects (a,b,c)
- Set: unordered collection of unique values {a,b,c}
- Dictionary: collection of key : value {a:1, c:2}

NumPy:
- NumPy arrays at the core of any data science tool in Python
- Efficient interface to store and operate numerical data
- Efficient storage of numerical data
- Efficient manipulation of numerical data
- Implements efficient operations (e.g., matrix multiplication)

NumPy slicing:
- a[start:stop:step, start:stop:step, …]
- Some values can be omitted; by default: start=0, stop=end, step=1
- Values can be negative

NumPy array aggregation / reduction:
- In aggregation operations, the axis specifies which dimension to collapse!
- a.sum(axis=0) → array([4.,4.,4.])

- a.sum(axis=1) → array([3.,3.,3.,3.])

,NumPy Broadcasting:

a

Pandas:
- Pandas is built on top of NumPy, providing easy manipulation of labeled arrays (with 1 or
multiple dimensions) with heterogeneous data.

Data structures:
- Series → One dimensional array of indexed data. Here indexes can be other than sequence of
integers (indexes can be strings for example).
- DataFrame → Two dimensional array with flexible row indices and column names.
- DataFrame = dictionary of Series with different labels (keys) and common index
- Can be seen as a collection of Series, all sharing the same index.

Indexing and selection:
- NumPy ndarray: array[0] selects row 0
- Pandas DataFrame: states[‘area’] selects column area
- For dictionary-style indexing use df[‘column_name’][‘index’]
- For NumPy array-style indexing use loc, iloc df.loc[‘index’,‘column_name’] df.iloc[i,j]
- .loc -> array-style indexing, explicit indexing using labels
- .iloc -> array-style indexing, implicit indexing using positions

, - i.loc and loc → first access rows then columns!
- Dictionary style indexing → first we access columns and then rows!

Slicing and masking:

Handling missing data:

- df.notnull()
- df.isnull()
- df.dropna()
- df.dropna(axis=’columns)
- df.fillna(0)

Data science life-cycle:
- Does not consist of a single step
- Statistics and plotting are not everything, but simply a part of the cycle
- Problem driven: start by posing and understanding the question
- It is a cycle

The most frequent failure in data analysis is mistaking the type of question being considered.
- Any type of question can be interesting, but we need to define it upfront and be aware and
clear about its type
- Type of questions:
- Descriptive: what is out there? (e.g, national census; no interpretations are made)
- Exploratory: are there (apparently) trends, correlations, or relationships between the
measurements to generate ideas or hypotheses? Should we study further?
- Inferential: will an observed pattern likely hold beyond the data set we have? Any
significant correlation? Can we infer a population state from our small sample?
- Predictive: can we use features to predict an outcome?
- Causal: what happens to one measurement (statistically, on average) if we change
another?
- Mechanistic: what happens (deterministically) to one measurement if we change
another? How does a variable change another?

, Exploratory data analysis (EDA):

Exploratory data analysis: (informal definition) process of transforming, describing and visualizing a
data set to better understand it, identify problems and inform subsequent hypothesis and analysis.
EDA steps:
- Formulate initial question
- Collect raw data and understand the format
- Clean and pre-process the data
- Describe the dataset
- Make plots to visualize data distribution and relationship between some variables
- Is there any interesting trend that suggests further analysis? Do we have the right question and
data?

Principles of Data Visualization:
Rule 1: Know the audience
Rule 2: Identify your message beforehand
Rule 3: Adapt figure to medium
Rule 4: Caption is important
Rule 5: Do not trust the defaults
Rule 6: Use color effectively
- Use diverging shades if there is a meaningful middle point
- Use a sequential color scale for a more intuitive reading
Rule 7: Do not mislead the audience
- Scale and visual perception are important
Rule 8: Avoid “chartjunk” (unnecessary visual elements)
Rule 9: Choose message over beauty
Rule 10: Know and use the right tool

(t-) Stochastic neighbor embedding (t-SNE):

Data visualization of high-dimensional data: t-SNE:
Goal: visualize in a reduced number of dimensions while keeping structure of data (e.g., be able to tell
apart clusters).

Meld schending auteursrecht

Geschreven voor

Instelling: Universiteit van Amsterdam (UvA)
Studie: information studies: data science
Vak: Fundamentals of data science (5294FUDS6Y)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 6 september 2025
Aantal pagina's: 70
Geschreven in: 2024/2025
Type: SAMENVATTING

Onderwerpen

data science
information studies
uva

$11.56

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

samirahbakker1107

3.7

(3)

Beoordelingen van geverifieerde kopers

Alle reviews worden weergegeven

stormanderson Artificial Intelligence · 14 beoordelingen

6 maanden geleden

5.0

1 beoordelingen

Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

samirahbakker1107 Universiteit van Amsterdam

Bekijk profiel

Volgen

Verkocht

Lid sinds

8 maanden

Aantal volgers

Documenten

Laatst verkocht

1 maand geleden

3.7

3 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper samirahbakker1107. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $11.56. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 48421 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Samenvatting - Fundamentals of data science (5294FUDS6Y)

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Beoordelingen van geverifieerde kopers

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?