Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

FULL data mining summary

Rating
-
Sold
1
Pages
84
Uploaded on
19-03-2025
Written in
2023/2024

This is a full summary of the course Data mining given by Prof. Fransen, Prof. Laukens and Prof. Meysman. It includes all the topics of the theory classes. Using this together with the notes from the practical classes gave me a 16/20.

Show more Read less
Institution
Course

Content preview

Introduction
Confidence Confident

Moment of lecture @February 14, 2024

Review @February 17, 2024

Materials advdatanalysis-01-introduction.pdf

Last Edited @March 12, 2025 10:32 AM

People always looked at the human body in a disciplinary way, they have their
own perspective but comprehend with each other. This is not changed but a
new perspective is emerging such as large scale data. New technologies have
become available which is a new perspective. This new perspective requires
new techniques to look into but they all fit into ‘big data’ (nowadays: deep
learning, AI, etc).
People refer to big data as data for which conventional computer-techniques
are not sufficient anymore (in example, size). The present tools will need to
solve more complex problems which means people need to be smarter with the
tools. Big data is also considered as a disruptive trend in computer sciences.
Big data is characterized by 4 main aspects:

Volume: the amount of data you’re dealing with. Example: having a genome
on paper, stacks and stacks of data

Moore's law is a prediction that the number of transistors on a
chip doubles every two years, making computers faster and cheaper.

Velocity: the speed at which data is produced/collected and the fact that it
is produced all the time, machines are producing data all the time. There is
data everywhere and it changes our world.

For example: smartphones have a massive amount of data that it holds
at all times

There is need for new, effective, high-tech data transfer approach

The speed increases faster then the staff




Introduction 1

, Variety: in life sciences there are different data types/data sets. A
distinction is made between structured and unstructured data → 80% of the
data is unstructured. Life sciences have much more variability in the data
that is collected.

Examples: DNA sequencing, morphology, metabolic data, protein
structures, etc

Transcriptome is more variable than the genome

Veracity: the data is never perfect (for example: noise, biases, missing
points) and it is problematic in life sciences because it is present almost
everywhere (it is also present in other aspects of life but always in life
sciences).

⇒ Large scale data and AI brought a new data intensive research paradigm. A
lot of science nowadays is started from data from which predictions and
hypothesis are made. Mostly the paradigm during the research shifts.
Terminology

Data = collection of objects (known as record, point, case, etc) and their
attributes, objects could be the samples and the attributes could be the
measurements performed on the objects but also a feature or a variable.

Attributes = property or characteristic of an object. A collection of attributes
describes the object → more attributes means more knowledge about the
object.
Example: student = object, attributes of student are grades, student number,
etc
It is typical to have the objects in rows and attributes in columns.




Introduction 2

, An attribute value = numbers or symbols assigned to an attribute. Examples of
difference with attributes:

Same attribute can be mapped to different attribute values: height can be
measured in feet or meters

Different attributes can be mapped to the same set of values: attribute value
for ID and age are integers (= gehele getallen)

→ Properties of attribute values can still be different (ID number has no limit but
age does).
There are different types of attributes:

Nominal → only has the distinction mathematical property

Examples: ID numbers, eye color, zip codes

Ordinal → has both the distinction and order mathematical property

Examples: rankings (e.g., taste of potato chips on a scale from 1-10),
grades, height in {tall, medium, short}

Interval → has the distinction, order or addition mathematical properties

Examples: calendar dates, temperatures in Celsius or Fahrenheit.

Ratio → has all 4 mathematical properties

Examples: temperature in Kelvin, length, time, counts

⇒ Distinction is based on mathematical properties they have: distinction, order,
addition and/or multiplication.




Introduction 3

, You can also make a distinction between discrete and continuous attribute
(discrete is an integer and continuous is a real number which means it can have
a comma):

Discrete Attribute: has only a finite or countable infinite set of values. Often
represented as integer variables.

Examples: zip codes, counts, or the set of words in a collection of
documents

Continuous Attribute: has real numbers as attribute values. Practically, real
values can only be measured and represented using a finite number of
digits. Continuous attributes are typically represented as floating-point
variables.

Examples: temperature, height, or weight.


Dataset types
There are 3 main types of datasets:

Record data

Graph data

Ordered data

Record data

= data that consists of a collection of records, each of which consists of a fixed
set of attributes.
Data Matrix




Introduction 4

Written for

Institution
Study
Course

Document information

Uploaded on
March 19, 2025
Number of pages
84
Written in
2023/2024
Type
SUMMARY

Subjects

$15.56
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
Studentje2001

Get to know the seller

Seller avatar
Studentje2001 Universiteit Antwerpen
Follow You need to be logged in order to follow users or courses
Sold
7
Member since
4 year
Number of followers
1
Documents
6
Last sold
1 week ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions