Samenvatting

Summary How to do Linguistics with R - Natalia Levshina, Chap. 6, 7, 8, 12, 13

Beoordeling

Verkocht

Pagina's

Geüpload op

05-06-2021

Geschreven in

2020/2021

Summary of the book How to do Linguistics with R from author Natalia Levshina. Summary contains chapters 6, 7, 8, 12, 13, and a small summary of Baayen Chapter 7.1. Note: this summary is created for the course Statistics II from the bachelor Communication and Information Science and Informatiekunde. I only included information that was covered in the lectures, so irrelevant sections are left out. The document is written in English.

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

CHAPTER 6
MEASURING RELATIONSHIPS BETWEEN TWO QUANTITATIVE VARIABLES

6.1 WHAT IS CORRELATION?
Positive correlation = when the values of variable X and variable Y decrease or increase
together → X increases, Y increases
Negative correlation = when the values of variable X and variable Y change in opposite
directions → X increases, Y decreases

De strength of this relationship is measured by means of a correlation coefficient → ranges
from -1 (perfect negative correlation) to 1 (perfect positive correlation)

Interval- and ratio-scaled variables: Pearson’s product-moment coefficient r
Ordinal data, interval- and ratio-scaled data transformed into ranks: Spearman’s ρ
and Kendall’s τ

6.2 THE PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT
You can create a scatterplot to visualize the relationship, including a regression line (= line
that shows the general trend in the data)

> plot(variable1 ~ variable2, main = “name of scatterplot”
> m <- lm(variable1 ~ variable2)
> abline(m)

Pearson’s product-moment coefficient r is the most common used correlation coefficient
→ is used for interval- and ratio-scaled data (requirement: normally distributed data)
> cor.test(variabele1, variabele2)

De strength of the r-value is determined as follows:
- Similar to or greater than 0.7 or smaller than -0.7 = strong
- Between 0.3 and 0.7 or between -0.3 and -0.7 = moderate
- Between 0 and 0.3 or between 0 and -0.3 = weak
- Merely 0 = no correlation
→ the closer to 0, the more points deviate from the correlation line in the plot and the
weaker the correlation

Note: a steep slope does not mean that the correlation is strong, it only shows the number of
units by which y will change if x changes.

Fitted value = a value that presents the expected location of a certain x-value on the
correlation line
Observed value = a value that presents the actual/observed value of a particular x-value on
the correlation line
Residuals = difference between the observed values and the fitted values → the smaller the
residuals, the stronger the correlation

1

,REMARKS ON THE PEARSON CORRELATION TEST
1. The relationship between variables should be monotonic and linear
→ a relationship between variables is monotonic when a decrease/increase of X results in a
decrease/increase of Y
→ a relationship between variables is linear when Y decreases/increases to the same extent
as X decreases/increases
A linear relationship is always monotonic, but a monotonic relationship is not always linear!
2. It is very sensitive towards outliers
→ outliers may result in a false correlation because of one or multiple extremely high values
→ these are called leverage points, as they draw the regression line into a particular
direction

Outliers can be excluded from the data:
> variable1_1 <- variable1(variable1 < critical point)
> length(variable1_1)

> variable2_1 <- variable2(variable 1 < critical point)
> length(variable2_1)

Create new regression line:
> m1 <- lm(variable1_1 ~ variable2_1)
> abline(m1, lty = 2)

ASSUMPTIONS OF PEARSON CORRELATION
1. The sample is randomly selected from the population it represents
2. Both variables are at least interval-scaled
3. Both variables come from a bivariate normal distribution (= for any given value of X, the
scores on Y are normally distributed) and/or the sample size is large (>30)
> mvnorm.etest(cbind(variable1_1, variable2_1), R = 999)
(H0 = normality)
4. The residual (error) variance is homoscedastic (= the relationship between variables
should be of equal strength across the entire range of both variables)
> ncvTest(lm(variable1_1 ~ variable1_2))
(H0 = error variance is homoscedastic)
5. The residuals are independent, there is no autocorrelation (= when the value of a variable
depends on its previous or next value)
> durbinWatsonTest(lm(variable1_1 ~ variable1_2))
(H0 = no autocorrelation)

SPEARMAN AND KENDALL
When the relationship is not linear but monotonic, one should use non-parametric
correlation statistics, such as Spearman’s ρ and Kendall’s τ

Spearman’s ρ is identical to Pearson’s r, with ranked scores
> cor.test(variable1, variable2, method = “spearman”)

Kendall’s τ works with differences in the ranks of each pair of observations (x1, y1). A pair of
ranks is concordant if two coordinates x2, y2 are both higher/lower than coordinates x1, y1. A
pair is discordant if one of the two coordinates x2, y2 differs positively, whereas the other
differs negatively regarding coordinates x1, y1 (and vice versa).

2

, This method is preferred when the dataset is small and has tied ranks (when two or more
observations have identical scores and therefore identical ranks)
> cor.test(variable1, variable2, method = “kendall”)

Two assumptions:
1. The sample is randomly drawn from the population
2. Both variables are on the ordinal scale of measurement (they will be transformed to ranks
by R automatically)

CHAPTER 7
MORE ON FREQUENCIES AND REACTION TIMES: LINEAR REGRESSION

7.1 THE BASIC PRINCIPLES OR LINEAR REGRESSION ANALYSIS
Regression explains and models the relationship between the response (dependent) variable,
and one or more explanatory (independent) variables
- one explanatory variable: simple linear regression
- more than one explanatory variable: multiple linear regression

Explanatory variables can be categorical to ratio-scaled, but the response variable should be
on interval or ratio scale

Regression is the same as correlation, but with directionality:
- correlation: the degree to which x and y are related
- regression: how variable x is related to variable y by means of a formula

REGRESSION LINE
A regression line visualizes the relationship between x and y. Its position and orientation can
be described by a formula:

ŷ = b0 + bx

ŷ = the fitted (expected) values of the response variable y
b0 = the intercept, i.e. the predicted value of y when x is equal to zero → when x increases by
one unit, y increases by the intercept
b = the coefficient the determines the slope of the regression line
x = the explanatory variable

The difference between ŷ and the actual value of y are the residuals.

The actual values of y can be described by the following formula:

y=ŷ+ε

So, the observed value of y for a given observation is the sum of its fitted value and the
residual.

3

Meld schending auteursrecht

Gekoppeld boek

Natalia Levshina How to Do Linguistics with R

Uitgave:november 2015
ISBN:9789027212252
Druk:Onbekend

Geschreven voor

Instelling: Rijksuniversiteit Groningen (RuG)
Studie: Communicatie- En Informatiewetenschappen
Vak: Statistics 2 (LIX002X05)

Alle documenten voor dit vak (3)

Documentinformatie

Heel boek samengevat?: Nee
Wat is er van het boek samengevat?: Chapter 6, 7, 8, 12 & 13
Geüpload op: 5 juni 2021
Aantal pagina's: 16
Geschreven in: 2020/2021
Type: SAMENVATTING

Onderwerpen

levshina statistics
linguistics stati
how to do linguistics with r
summary levshina statistics
summary statistics ii university of groningen
summary statistics ii communication and information science

$4.78

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

aesther30

3.9

(9)

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

aesther30 Rijksuniversiteit Groningen

Bekijk profiel

Volgen

Verkocht

Lid sinds

9 jaar

Aantal volgers

Documenten

Laatst verkocht

6 maanden geleden

Op deze pagina vind je alle samenvattingen die ik heb geschreven voor de studie Communication and Information Sciences aan de Rijksuniversiteit Groningen. Ik heb voor vrijwel alle vakken die werden afgesloten met een tentamen een samenvatting gemaakt, waarbij ik geen enkel tentamen heb hoeven herkansen. Momenteel ben ik bezig met het samenvatten van: - Visual Language, Van den Broek et al. (vak Pictures in Professional Communication)

Lees meer Lees minder

3.9

9 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper aesther30. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $4.78. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50860 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary How to do Linguistics with R - Natalia Levshina, Chap. 6, 7, 8, 12, 13

Voorbeeld van de inhoud

Gekoppeld boek

Geschreven voor

Documentinformatie

Onderwerpen

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?