Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
College aantekeningen

College aantekeningen Natural Language Processing Technology Speech and Language Processing, ISBN: 9780131227989

Beoordeling
-
Verkocht
4
Pagina's
42
Geüpload op
08-06-2021
Geschreven in
2020/2021

This summary provides information of all described learning goals and answers to the reading comprehension questions

Instelling
Vak

Voorbeeld van de inhoud

Natural Language Processing Technology
Learning Goals & Summary


May, 2021


Contents
1 Lecture 1 - Analyzing Language 3
1.1 Term definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Analysis steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Word classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Part-of-speech tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Concept of ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Named entity classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 BIO tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 NLP shared task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9 NLP pipeline spaCy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.10 Bender rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.11 Linguistic property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12 Reading Comprehension Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Lecture 2 - Classifying Language 14
2.1 Supervised machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Output of a classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Macro-average and weighted F1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Feed-forward network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Matrix notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Single-label classification and sequence-to-sequence generation . . . . . . . . . . . . . 18
2.7 Sequence processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Feed-forward and recurrent neural networks . . . . . . . . . . . . . . . . . . . . . . . 19
2.9 Hidden states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Parameter sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.11 Reading Comprehension Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Lecture 3 - Representing Language 24
3.1 Distributional hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Recurrent neural network as a language model . . . . . . . . . . . . . . . . . . . . . 24
3.3 Training objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Transformer models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Feature extraction and fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Fine-tune BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 BERT representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8 Reading Comprehension Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9 Advantages of fine tuning BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


1

,4 Lecture 4 - Language Structure 31
4.1 Syntactic tree represent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Types of grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Universal Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 Grammar vs. parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7 Parsing strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8 Dependency parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.9 Transition based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.10 Parser evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.11 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.12 Syntactic tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.13 Syntactic features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.14 Run a parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Lecture 5 - Error Analysis 35
5.1 Why error analysis is important . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Perform error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Modify the input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 Trends for error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Lecture 6 - Spoken Language 38
6.1 Speech signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Representing speech digitally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3 HMM-based and neural end-to-end ASR system . . . . . . . . . . . . . . . . . . . . . 40
6.4 Word error rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.5 ASR error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.6 What are phonetic errors and lexical errors? . . . . . . . . . . . . . . . . . . . . . . . 41
6.7 What kinds of errors can be attributed to the acoustic model, pronunciation lexicon,
& language model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42




2

,1 Lecture 1 - Analyzing Language
In the first week, we will learn about linguistic pre-processing and natural language processing
pipelines. We will acquire the most relevant terminology for analyzing language and the practical
skills to analyze a dataset using the spaCy package.

1.1 Term definition
Provide a definition and an example for the following terms: token, word, sub-word,
lemma, morpheme, POS-tag, named entity, content word, function word.


1.1.1 Token
A token is a string of contiguous characters between two spaces, or between a space and punctuation
marks. A token can also be an integer, real, or a number with a colon (time, for example: 2:00).
All other symbols are tokens themselves except apostrophes and quotation marks in a word (with
no space), which in many cases symbolize acronyms or citations. A token can present a single word
or a group of words (in morphologically rich languages such as Hebrew).
Example: ”They picknicked by the pool, then lay back on the grass and looked at the stars” has
16 tokens.


1.1.2 Word
A word is a single distinct meaningful element of speech or writing, used with others (or sometimes
alone) to form a sentence and typically shown with a space on either side when written or printed.
Some words can be treated as such even though they contain spaces.
Example: New York, rock ’n’ roll

1.1.3 Sub-word
More frequent tokens are unique, less frequent tokens are decomposed into subwords. Subwords are
sets of tokens that include tokens smaller than words.
Example: ”I was supernervous and started stuttering” –> [’I’, ’was’, ’super’, ’##ner’, ’##vous’,
’and’, ’started’, ’s’, ’##tu’, ’##ttering’]

1.1.4 Lemma
A lemma is a set of lexical forms having the same stem, the same major part-of-speech, and the
same word sense.
Example: happier, happiest –> happy

1.1.5 Morpheme
A morpheme is a single unit of meaning that cannot be further divided.
Example: ”un-”, ”break”, ”-able” in the word ”unbreakable”.

1.1.6 POS-tag
Part-of-speech tagging is the process of assigning a part-of-speech to each word in part-of-speech
tagging a text.
Example: ”Janet(NOUN) will(AUX) back(VERB) the(DET) bill(NOUN)”


3

, 1.1.7 Named entity
Anything that can be referred to with a proper name:

Type Tag Sample Categories Example
People PER people, characters Turing is a giant of computer science
Organization ORG companies, sport teams The IPCC warned about the cyclone
Location LOC regions, mountains, seas Mt. Santias is in Sunshine Canyon
Geo-Political Entity GPE countries, states Palo Alto is reasing the fees for parking

1.1.8 Content word
Content words (or open class words) are words that possess semantic content and contribute
to the meaning of the sentence in which they occur. They include:
Open class words Example
ADJ big, old, green
ADV up, down, tomorrow, very
INTJ ouch, bravo
NOUN girl, cat, tree
PROPN Mary, John, London
VERB run, eat, running

1.1.9 Function word
Function words(or closed class words) are words a word whose purpose is to contribute to the
syntax rather than the meaning of a sentence Thus they form important elements in the structures
of sentences. They include:
Closed class words Example
ADP in, to, during
AUX should, must
CCONJ and, or, but
DET a, the
NUM 0,1, one, seventy
PART [en]not, [de] nicht, [en]’s
PRON mine, yours, myself
SCONJ if, while

Explain the difference between two related terms (of the list above).


1.2 Analysis steps
Explain the functionality of the following analysis steps: text normalization, sentence
segmentation, tokenization, byte-pair encoding, lemmatization, POS-tagging, named
entity recognition.


1.2.1 Text normalization
Normalizing text means converting it to a more convenient, standard form.



4

Gekoppeld boek

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
8 juni 2021
Aantal pagina's
42
Geschreven in
2020/2021
Type
College aantekeningen
Docent(en)
Lisa beinborn
Bevat
Alle colleges

Onderwerpen

$6.47
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
cdh Vrije Universiteit Amsterdam
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
43
Lid sinds
6 jaar
Aantal volgers
36
Documenten
13
Laatst verkocht
2 jaar geleden

4.0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen