Samenvatting

Summary Data Mining for Business & Governance

Name: Summary Data Mining for Business & Governance
SKU: doc_1324542
Rating: 4.00 (1 reviews)
Author: xtessaroes

Beoordeling

4.0

(1)

Verkocht

Pagina's

Geüpload op

06-10-2021

Geschreven in

2020/2021

Summary Data Mining for Business & Governance, written in the spring semester of 2021 for Data Science & Society, Tilburg University.

Instelling

Vak

Voorbeeld van de inhoud

Recap before midterm
What is data mining?
(slides) Data mining is the computational process of discovering patterns
in large data sets involving methods at the intersection of artificial
intelligence, machine learning, statistics and database systems.
(google) Data mining is searching for patterns in data. In exact words,
data is the actual extraction of knowledge from data via technologies that
incorporate these principles.
(slides Chris) Data mining is a concept to unify statistics, data analysis
and their related methods in order to understand and analyze actual
phenomena with data.
With data mining, we want to prove that something can be predicted
better than the baseline, or that a certain method works better than a
method that has been explored before.

What are the related disciplines?
The related disciplines that have overlap with data mining are;
1. Artificial Intelligence (AI): interdisciplinary field aiming to develop
intelligent machines
2. Machine Learning (ML): branch of computer science studying
learning from data
3. Statistics: branch of mathematics focused on data
4. Information retrieval/knowledge discovery in databases
Others are;

,
,What are the applications?
In companies, data mining is applied as business intelligence (market
analysis and management).
In science, data mining is applied as knowledge discovery (scientific
discovery in large data). In science, also text mining (natural language
processing) is used, which is going form unstructured text to structured
knowledge.

What is big data?
(slides) Big data consists of three parts;
1. Volume: data that is too big for manual analysis, too big to fit in
RAM and too big to store on disk.
2. Variety: big data has high ranges of values (variance), has outliers,
confounders and noise, and consists of different data types.
3. Velocity: big data changes quickly (require results before data
changes) and big data is streaming data (no storage).
(readings) Datasets that are too large for traditional data-processing
systems and that therefore require new technology. There is big data 1.0
(businesses got the basic internet technologies in place so that they could
establish a web presence, build electronic commerce capability and
improve operating efficiency. With big data 2.0, new systems and
companies started to exploit the interactive nature of the web. The
changes brought on by this shift in thinking are extensive and pervasive;
the most obvious are the incorporation of social-networking components
and the rise of the ‘voice’ of the individual consumer and citizen.

Different types of learning: supervised and unsupervised
Supervised learning (classification, regression) is done using a ground
truth; we have prior knowledge of what the output values of our samples
should be. The goal of supervised learning is to learn a function that,
given a sample of data and desired outputs, best approximates the
relationship between input and output observable in the data. Supervised

, learning means that the data is labeled. In supervised learning, you know
x and y.
Unsupervised learning (clustering, dimensionality reduction) does not
have labeled outputs, so its goal is to infer the natural structure present
within a set of data points. Unsupervised learning means that the data is
not labeled, we want to find patterns within the data. In unsupervised
learning, you know only x (you do not know yet what to research). In
short, unsupervised learning can be defined as data mining algorithms
that infer patterns from a dataset without reference to outcomes or
decisions.
Semi-supervised classification is a combination of both. It means that
we have some instances we shall attach to the decision classes, and we
have a small amount of labeled data with a large amount of unlabeled
data.

Examples of supervised and unsupervised learning (regression,
classification, clustering, dimensionality reduction)
Supervised: regression, classification (3 parts; input, output and function)
Unsupervised: clustering, dimensionality reduction

 Workflow of supervised learning
1. Collect data
2. Label examples
3. Choose representation (features are numerical or categorical,
possibly convert to feature vector)
4. Train models (use a training set for learning, and a validation
set for tuning. hyperparameters are settings of learning
algorithms. For each value of hyperparameters, apply
algorithm to training set to learn, check performance on
validation set and find the best-performing setting)
5. Evaluate (check performance of tuned model on test set. You
want to estimate how well your model will be do in the real
world).

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: Data Science & Society
Vak: Data Mining For Business & Governance (880022M6)

Alle documenten voor dit vak (7)

Documentinformatie

Geüpload op: 6 oktober 2021
Aantal pagina's: 65
Geschreven in: 2020/2021
Type: SAMENVATTING

Onderwerpen

data mining

$4.55

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

xtessaroes

4.0

(1)

Beoordelingen van geverifieerde kopers

Alle reviews worden weergegeven

bvdbogaart Bedrijfseconomie · 25 beoordelingen

3 jaar geleden

4.0

1 beoordelingen

Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

xtessaroes Tilburg University

Bekijk profiel

Volgen

Verkocht

Lid sinds

5 jaar

Aantal volgers

Documenten

Laatst verkocht

3 jaar geleden

4.0

1 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper xtessaroes. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $4.55. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 48077 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary Data Mining for Business & Governance

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Beoordelingen van geverifieerde kopers

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?