Summary

FULL data mining summary

Rating

Sold

Pages

Uploaded on

19-03-2025

Written in

2023/2024

This is a full summary of the course Data mining given by Prof. Fransen, Prof. Laukens and Prof. Meysman. It includes all the topics of the theory classes. Using this together with the notes from the practical classes gave me a 16/20.

Show more Read less

Institution

Course

Content preview

Introduction
Confidence Confident

Moment of lecture @February 14, 2024

Review @February 17, 2024

Materials advdatanalysis-01-introduction.pdf

Last Edited @March 12, 2025 10:32 AM

People always looked at the human body in a disciplinary way, they have their
own perspective but comprehend with each other. This is not changed but a
new perspective is emerging such as large scale data. New technologies have
become available which is a new perspective. This new perspective requires
new techniques to look into but they all fit into ‘big data’ (nowadays: deep
learning, AI, etc).
People refer to big data as data for which conventional computer-techniques
are not sufficient anymore (in example, size). The present tools will need to
solve more complex problems which means people need to be smarter with the
tools. Big data is also considered as a disruptive trend in computer sciences.
Big data is characterized by 4 main aspects:

Volume: the amount of data you’re dealing with. Example: having a genome
on paper, stacks and stacks of data

Moore's law is a prediction that the number of transistors on a
chip doubles every two years, making computers faster and cheaper.

Velocity: the speed at which data is produced/collected and the fact that it
is produced all the time, machines are producing data all the time. There is
data everywhere and it changes our world.

For example: smartphones have a massive amount of data that it holds
at all times

There is need for new, effective, high-tech data transfer approach

The speed increases faster then the staff

Introduction 1

, Variety: in life sciences there are different data types/data sets. A
distinction is made between structured and unstructured data → 80% of the
data is unstructured. Life sciences have much more variability in the data
that is collected.

Examples: DNA sequencing, morphology, metabolic data, protein
structures, etc

Transcriptome is more variable than the genome

Veracity: the data is never perfect (for example: noise, biases, missing
points) and it is problematic in life sciences because it is present almost
everywhere (it is also present in other aspects of life but always in life
sciences).

⇒ Large scale data and AI brought a new data intensive research paradigm. A
lot of science nowadays is started from data from which predictions and
hypothesis are made. Mostly the paradigm during the research shifts.
Terminology

Data = collection of objects (known as record, point, case, etc) and their
attributes, objects could be the samples and the attributes could be the
measurements performed on the objects but also a feature or a variable.

Attributes = property or characteristic of an object. A collection of attributes
describes the object → more attributes means more knowledge about the
object.
Example: student = object, attributes of student are grades, student number,
etc
It is typical to have the objects in rows and attributes in columns.

Introduction 2

, An attribute value = numbers or symbols assigned to an attribute. Examples of
difference with attributes:

Same attribute can be mapped to different attribute values: height can be
measured in feet or meters

Different attributes can be mapped to the same set of values: attribute value
for ID and age are integers (= gehele getallen)

→ Properties of attribute values can still be different (ID number has no limit but
age does).
There are different types of attributes:

Nominal → only has the distinction mathematical property

Examples: ID numbers, eye color, zip codes

Ordinal → has both the distinction and order mathematical property

Examples: rankings (e.g., taste of potato chips on a scale from 1-10),
grades, height in {tall, medium, short}

Interval → has the distinction, order or addition mathematical properties

Examples: calendar dates, temperatures in Celsius or Fahrenheit.

Ratio → has all 4 mathematical properties

Examples: temperature in Kelvin, length, time, counts

⇒ Distinction is based on mathematical properties they have: distinction, order,
addition and/or multiplication.

Introduction 3

, You can also make a distinction between discrete and continuous attribute
(discrete is an integer and continuous is a real number which means it can have
a comma):

Discrete Attribute: has only a finite or countable infinite set of values. Often
represented as integer variables.

Examples: zip codes, counts, or the set of words in a collection of
documents

Continuous Attribute: has real numbers as attribute values. Practically, real
values can only be measured and represented using a finite number of
digits. Continuous attributes are typically represented as floating-point
variables.

Examples: temperature, height, or weight.

Dataset types
There are 3 main types of datasets:

Record data

Graph data

Ordered data

Record data

= data that consists of a collection of records, each of which consists of a fixed
set of attributes.
Data Matrix

Introduction 4

Report Copyright Violation

Written for

Institution: Universiteit Antwerpen (UA)
Study: Biomedische Wetenschappen
Course: Data mining

All documents for this subject (11)

Document information

Uploaded on: March 19, 2025
Number of pages: 84
Written in: 2023/2024
Type: SUMMARY

Subjects

data mining
full summary
summary

$15.56

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

Studentje2001

Get to know the seller

Studentje2001 Universiteit Antwerpen

View profile

Sold

Member since

4 year

Number of followers

Documents

Last sold

1 week ago

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Studentje2001. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $15.56. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48421 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

FULL data mining summary

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?