Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Advanced Data Analysis class - For open book exam (content table with links to pages)

Beoordeling
-
Verkocht
-
Pagina's
34
Geüpload op
04-12-2022
Geschreven in
2021/2022

Summary of course Advanced Data Analysis made for the open book exam containing a content table with clickable links bringing you to the exact page. Description of all theory classes + notes made during classes.

Instelling
Vak

Voorbeeld van de inhoud

Summary Advanced Data Analysis
Content table
1. Introduction ............................................................................................................ 5
Big data ............................................................................................................................. 5
Data volume .................................................................................................................... 5
Data velocity .................................................................................................................... 5
Data variety ..................................................................................................................... 5
Data veracity ................................................................................................................... 5
Data .................................................................................................................................. 5
Attribute values ................................................................................................................ 5
Attribute types ................................................................................................................. 5
Properties of attributes ...................................................................................................... 5
Discrete vs. Continuous ..................................................................................................... 5
Dataset types ..................................................................................................................... 6
Record data ..................................................................................................................... 6
Graph ............................................................................................................................. 6
Ordered data ................................................................................................................... 6
Data mining ........................................................................................................................ 6
Definitions ....................................................................................................................... 7
Statistics ...................................................................................................................... 7
Data mining & Statistics ................................................................................................. 7
Challenges in Data mining ................................................................................................. 7
Tasks ................................................................................................................................. 7
Supervised classification .................................................................................................... 7
Applications .................................................................................................................. 8
Unsupervised classification ................................................................................................ 8
Overview ............................................................................................................................ 8

2. Processing principles........................................................................... 9
Common steps .................................................................................................................... 9
Feature extraction ............................................................................................................ 9
Attribute transformation .................................................................................................... 9
Discretization ................................................................................................................... 9
Aggregation ..................................................................................................................... 9
Noise removal .................................................................................................................. 9
Outlier removal ................................................................................................................ 9
Sampling ......................................................................................................................... 9
Simple Random Sampling ............................................................................................... 9
Stratified Sampling ....................................................................................................... 10
Handling duplicate data .................................................................................................... 10
Handling missing values ................................................................................................... 10

1

, Dimensionality reduction .................................................................................................. 10
PCA ............................................................................................................................. 10
Feature subset selection ................................................................................................ 10
Feature creation ........................................................................................................... 11
Processing steps for specific data types ................................................................................. 11
Image data ..................................................................................................................... 11
Survey data .................................................................................................................... 11
Sequence data ................................................................................................................ 11
Text ............................................................................................................................... 12
Category/Ontologies ..................................................................................................... 12
Bag of words ................................................................................................................ 12
Omics ............................................................................................................................ 12
Genomics .................................................................................................................... 12
Transcriptomics ............................................................................................................ 12
Meta-genomics ............................................................................................................. 13
Proteomics ................................................................................................................... 13
Metabolomics ............................................................................................................... 14
Conclusion ......................................................................................................................... 14

3. Unsupervised clustering .................................................................... 15
Definitions ......................................................................................................................... 15
Introduction....................................................................................................................... 15
Clustering ....................................................................................................................... 15
Similarities ..................................................................................................................... 15
Distance measures ........................................................................................................ 15
Measure similarity......................................................................................................... 15
Dendrogram ................................................................................................................... 16
Hierarchical clustering ......................................................................................................... 16
Determination of distance ................................................................................................. 16
Partitional clustering ........................................................................................................... 17

4. Principal component analysis ............................................................ 18
Data & basic variable statistics ............................................................................................. 18
Multivariate data ............................................................................................................. 18
Basic variable statistics .................................................................................................... 18
Data transformation ......................................................................................................... 18
Normalization .................................................................................................................. 18
Comparison between variables ............................................................................................. 18
Covariance ..................................................................................................................... 18
Correlation...................................................................................................................... 18
Data projection .................................................................................................................. 19
Principal component analysis (PCA) ...................................................................................... 19
t-SNE................................................................................................................................ 20



2

,5. Supervised learning........................................................................... 22
Linear classifier .................................................................................................................. 22
Binary classification ............................................................................................................ 22
Support vector machines (SVMs) ....................................................................................... 23
Classification overview ..................................................................................................... 23
Predictive accuracy ............................................................................................................. 23
Class labels..................................................................................................................... 23
Thresholds and accuracy .................................................................................................. 24
Linear threshold ........................................................................................................... 24
ROC-curve ................................................................................................................... 24
PR curve ...................................................................................................................... 24
ROC vs PR curves ............................................................................................................ 24
Nearest neighbour classifier ................................................................................................. 25
K-nearest neighbour (KNN) algorithm ................................................................................ 25

6. Regression ........................................................................................ 26
Simple linear regression ...................................................................................................... 26
Multiple linear regression..................................................................................................... 26
Best fit & objective function ................................................................................................. 26
Non-linear regression.......................................................................................................... 27
Problems ........................................................................................................................... 27
Overfitting ...................................................................................................................... 27
Speed & scalability .......................................................................................................... 28
Interpretability ................................................................................................................ 28
Robustness ..................................................................................................................... 28
Regularized regression ........................................................................................................ 28
Elastic net ...................................................................................................................... 28
Common approach ............................................................................................................. 29

7. Machine learning methods................................................................. 30
Classification ..................................................................................................................... 30
Algorithms ...................................................................................................................... 30
Decision tree ..................................................................................................................... 30
Choosing features ............................................................................................................ 30
Gini impurity ................................................................................................................... 30
Advantages .................................................................................................................. 31
Disadvantages .............................................................................................................. 31
Example Decision Tree ..................................................................................................... 31
Random forest ................................................................................................................... 31
Bootstrapping ................................................................................................................. 31
Bagging.......................................................................................................................... 32
Out-of-bag performance ................................................................................................ 32
Gini importance ............................................................................................................... 32



3

, Example Random Forest ................................................................................................... 32
Neural networks & deep learning .......................................................................................... 32
Neurons ......................................................................................................................... 32
Neural network................................................................................................................ 33
Perceptron ................................................................................................................... 33
Artificial Neural Networks ................................................................................................. 33
Deep learning .................................................................................................................... 34
Performance ................................................................................................................... 34
Google DeepMind ............................................................................................................ 34




4

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
4 december 2022
Aantal pagina's
34
Geschreven in
2021/2022
Type
SAMENVATTING

Onderwerpen

$19.33
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
e18

Maak kennis met de verkoper

Seller avatar
e18 Universiteit Antwerpen
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
2
Lid sinds
7 jaar
Aantal volgers
2
Documenten
2
Laatst verkocht
2 jaar geleden

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen