Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Statistics I - Exam and Class Notes - GRADE 8.5

Beoordeling
3.0
(1)
Verkocht
2
Pagina's
47
Geüpload op
18-04-2024
Geschreven in
2022/2023

Extensive notes and exam revision for Statistics I, IRO 1st year, Bloc 4. This is just for the exam, not for the seminars. I have a weekly overview of the content, with some examples of exercises which are useful for the exam. It is the same statistics professor, with the same exam format. My grade was an 8.5 for the exam only.

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

Statistics I
Exam Revision




Week 1
Variables – anything that differs (across entities or across time) and can be measured over time.




(Categorical)
- Nominal: two or more exclusive categories. The data in categories has no order or
ranking (eye color, marital status, hair color, political party affiliation).
- Ordinal: categories have a real ordering/ranking. Often used for subjective data
(opinions, attitudes, education levels, political interests, performance ratings, agreement
to a statement). The spacing between the variables is not the same across variables.
(Numerical) – real numbers
- Continuous: can take on any value within a range. Can be decimals, fractions – an infinite
number of values (height, weight, temperature, time) (some can be measured as discrete,
by rounding them).
- Discrete: can only take countable values – usually whole numbers (international conflicts,
number of pets owned, number of car accidents).


Alternative levels of measurement (Stevens):

,Interval: the zero is arbitrary/meaningless (temperature, like 0C does not mean an absence of
anything, pH (pH=0 does not mean absence of anything), IQ scores
Ratio: the zero is meaningful (salary, 0K, number of international conflicts)




Independent variable: causes, x, has an effect on the DV
Dependent variable: outcomes, y




Measures of central tendency
When we collect data, we can show ow the data is distributed in comparison to other values.
This is frequency distribution, it shows all the intervals, and how often they occur.




Uniform – every outcome has a roughly equal chance of happening
Multimodal – more than 2 likely values


Skewness: a distribution can skew to the right or left, positive of negative skew, respectively.
This depends on where the mass “tail” is longest. Long tail on the right = skewed right.


Measure of central tendency: single value that attempts to describe a set of data by identifying
the central position within that set. For example, mean, median and mode.
Measures of dispersion: give an indication of how stretched the data set is.

, - Mode: most frequent score in a data set, that with most frequencies. There can be several
modes, when the frequencies for two categories are the same.
- Median: the middle score for a data set, arranged in order of magnitude. Then, we find
the value in the middle, in the order. With an even number of scores, we just add the two
in middle, and divide them by two: constructing a new middle point.
- Mean: the mean is calculated by adding up every value in a variable, and divide by the
number of observations (n). When there are extreme values, the median may be more
useful, because the mean is sensitive to extreme values, and the median isn’t.


How to calculate the standard deviation given the sum of all squared errors?
First, we calculate the sum of all squared errors by taking each individual observation and
subtracting it from the mean. Then, squaring each of the differences, and adding them all up.
(mean = 11.44. X1 = 3
11.44 – 3 = -8.44
-8.44^2 = 71.2336.
Do this for each X, and then add everything up.)


Once we have the sum of squared errors, we calculate the standard deviation using




This is similar to calculating the variance – the variance, s^2, is the same calculating without the
squared root. (The formula for standard deviation in the formula sheet is just s = sqrt(s^2) –
confusing)


Measures of dispersion

, An indicator to the extent which a distribution is stretched or squeezed.




The range is the difference between the lowest and the highest values. The highest – lower is the
range.
We can divide this into “chunks” called “quantile”. The more common quantiles are: percentiles,
deciles, quintiles, quartiles. The common range to use here is the interquartile range. This is the
range of the middle 50% of the data.
How to calculate the IQR? Calculate the median – calculate the median of the lower half
(when there is none, we calculate the sum of the two middle values/2) – do the same for the
upper half – then we can lay out the quartiles, by calculating the difference between the upper
half quartile and the lower half. The same is done with even numbers, except we do not need to
calculate the man of the middle values.
When calculating this, the IQR uses only a selection of the data. It is resistance against outliers –
a “robust” statistic.
- The deviance is used to calculate how such easy value deviates from the mean
- To calculate it, we find out how much each of the frequency deviate from the mean
- So, we need the mean
- Then, we do this for each observation: subtract the mean from the frequency
- Then we add the sum of each of these of deviances = total deviance
The total deviance is not a useful measure of spread – it usually totals to zero. We fix this by
squaring the differences.
So, we square the deviances, and we add these up. This makes every value positive (which as the
prob before, positive and negative.



Week 2
Introduction Graphs and Visualizations
The goal of data visualization to make it easier to identify patterns, data and find relations. A
good visualization shows the important features of the data.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
18 april 2024
Aantal pagina's
47
Geschreven in
2022/2023
Type
SAMENVATTING

Onderwerpen

$7.26
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF


Ook beschikbaar in voordeelbundel

Beoordelingen van geverifieerde kopers

Alle reviews worden weergegeven
1 jaar geleden

3.0

1 beoordelingen

5
0
4
0
3
1
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
lauragfsilva Universiteit Leiden
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
9
Lid sinds
2 jaar
Aantal volgers
7
Documenten
2
Laatst verkocht
11 maanden geleden

3.5

2 beoordelingen

5
0
4
1
3
1
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen