Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Intro to Data Science - Lecture Notes

Beoordeling
-
Verkocht
1
Pagina's
71
Geüpload op
22-10-2022
Geschreven in
2022/2023

This is a summary of all lectures for the Minor course, "Intro to Data Science"

Instelling
Vak

Voorbeeld van de inhoud

1




Vrije Universiteit Amsterdam

Minor: Data Science (2022-23)

Course: Intro to Data Science

Course Code: XB_0018




1

, 2



Week 1 Lecture 1
Data Science History
Statistical Approach (1900s) but today computer simulation will give a quicker answer
- More modern approach: Data science approach
- Handles large amounts of data
- Focuses on visualising trends and variance

Dataset → make decisions based on data
- Data science does not replace experts, but augments expertise with knowledge derived
from data
- Data scientists often help other experts handle data through data hacking skills
- If you are a data scientist, you should consider building up domain expertise

Venn-diagram to describe the field of data science:




Data visualisation can help solve problems (see historical examples from medicineI)




Using Data to make decisions: MBA fees
Target group: 12 students
Make profit or make a loss?


2

, 3


Marginal cost level
Or: each student should pay what they can based on company, work-experience, current job,
country, previous study within 1000 euros
→ individual fee from each student
- Let go of the solution that we can only pick one idea that works for everybody

Data Science → look for the patterns
Check for correlations in the data

Correlation -1 --- + 1
Correlation expresses whether the values of two variables are related. If variables are
correlated, one of the values can be used to predict the other.

Correlation can be zero, positive or negative:
- Negative correlation means they move in opposite direction: e.g. house price and
distance to city centre
- Positive correlation means they move in the same direction: e.g. house-size and tax
value
- Near-zero correlation means that there is no linear relation. There can be other relations

Correlation is a small first step towards data science. You could try to manually transform your
data and construct features until you have good correlation between your inputs and output. Or
you could use actual data science and use algorithms that can handle complex relations.

We will compute correlations in our practical examples to get experience.




Week 1 Lecture 2
Data Science Basics
Programming Languages:
Maturity level of programming languages the company chooses
- Choice of programming language a company chooses depends on needs and what is
available
- Common roles: big data engineer - BI (business intelligence) developer - data analyst -
data scientist - machine learning engineer




3

, 4




- Python: most popular
- There is technology for each phase of data

Data Science Role Descriptions
- Big Data Engineer
- Bi Developer / Data Analyst
- Data Scientist
- Machine Learning Engineer

Many different programming languages have been invented, each with different technical
strengths and weaknesses:
- Performance / resource efficiency: C is most efficient, python probably least efficient
- Interactiveness: R, Matlab, SPSS are designed for interactive use, but do not work well
as a microservice to support a website
- Licensing costs: Many companies and universities prefer open source since it saves
legal issues and vendor lock in
- Cloud / on premise: Not all data can be shared with the cloud
- Integration: Some companies want to have compatible technologies, e.g. all
AWS-based, web-friendly or Microsoft based

Fortran: 1957, nearly invented before computer screens (fastest code)
Python: Most popular - best image processing/computer vision
C / C++: high performance, structured language that allows for detailed memory control and
inline assembly (hard to learn, easy to crash, not for beginners)
Java: the best programming language in the world 1994 - 2005


4

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
22 oktober 2022
Aantal pagina's
71
Geschreven in
2022/2023
Type
SAMENVATTING

Onderwerpen

$15.41
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
LyssaAndLucifer

Maak kennis met de verkoper

Seller avatar
LyssaAndLucifer Universiteit van Amsterdam
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
1
Lid sinds
4 jaar
Aantal volgers
1
Documenten
1
Laatst verkocht
2 jaar geleden

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen