Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

Complete NLP Exam Summary | From Naive Bayes to Transformers, Speech & LLM Fine-Tuning

Rating
-
Sold
-
Pages
236
Uploaded on
22-03-2026
Written in
2025/2026

This complete summary for Natural Language Processing (Master Data Science) helps you repeat the entire course quickly, in a structured, and in-depth way. The material is not only listed, but also clearly explained from the basics, so that you can see connections between all parts of the course and understand how the topics can come back to the exam. The summary covers the complete structure of NLP: from the introduction to corpora, classification and Naive Bayes, through preprocessing such as normalization, tokenization, lemmatization, edit distance and regular expressions, to language modeling with n-grams, Markov models, perplexity, OOV words and smoothing. This is followed by sequence labeling and syntax, including Part-of-Speech tagging, Hidden Markov Models, decoding, the Viterbi algorithm, parsing, CFGs, PCFGs, and dependency parsing. Semantics is also extensively covered with lexics, WordNet, word vectors, PMI, dimensionality reduction and embeddings. In addition, this summary includes the more modern neural NLP components, such as logistic regression, feed-forward neural networks, word2vec, evaluation of embeddings, transformers, self-attention, BERT, generative language models, prompting, in-context learning, speech modeling, spoken dialogue modeling, and pretraining/fine-tuning large language models. PEFT, LoRa, cloze-style fine-tuning and instruction tuning are also included in an understandable way. What to expect: - clear explanation of key concepts - logical structure of the entire course - links between older statistical models and modern transformer-based models - important definitions, exam-relevant concepts and comparisons - ideal as exam preparation, last repetition or addition to the lectures These notes are especially suitable for students who have not been able to attend all the lectures, have difficulty with the bigger picture of NLP, or are looking for a complete and understandable exam summary that focuses on both theory and intuition. Due to its broad coverage, this document is also useful as open-book support during the exam, because you can quickly search for key words and find the right explanation immediately.

Show more Read less
Institution
Course

Content preview

Table of Contents
MODULE 1: INTRODUCTION TO NLP, CORPORA, CLASSIFICATION, EVALUATION, AND NAIVE BAYES ............ 7
1. WHAT NLP IS AND WHY IT IS HARD..................................................................................................................................7
2. CORPORA: WHERE NLP GETS ITS EVIDENCE.....................................................................................................................8
3. CLASSIFICATION: TURNING TEXT INTO DECISIONS .......................................................................................................... 10
4. EVALUATING CLASSIFIERS: HOW DO YOU KNOW WHAT WORKS? ...................................................................................... 12
5. NAIVE BAYES: THE FIRST FULL PROBABILISTIC CLASSIFIER IN YOUR COURSE ...................................................................... 15
6. HOW ALL PARTS OF THIS MODULE CONNECT ................................................................................................................. 18
7. KEY TERMS AND SEARCHABLE DEFINITIONS ................................................................................................................... 19
8. FINAL MENTAL MAP FOR THIS MODULE .......................................................................................................................... 22
MODULE 2: NORMALISATION, TOKENIZATION, LEMMATIZATION, EDIT DISTANCE, AND REGULAR
EXPRESSIONS ...................................................................................................................................................... 24
1. NORMALISATION: REDUCING VARIATION ON PURPOSE .................................................................................................... 24
2. WHY THIS MODULE MATTERS IN THE FULL NLP PIPELINE ................................................................................................. 25
3. TOKENIZATION: WHAT IS A WORD? ............................................................................................................................... 26
4. LEMMATIZATION AND STEMMING: REDUCING WORD FORMS ............................................................................................ 28
5. SPELLING CORRECTION AND EDIT DISTANCE: MEASURING STRING SIMILARITY .................................................................... 30
6. REGULAR EXPRESSIONS: FINDING STRUCTURED PATTERNS IN TEXT .................................................................................. 33
7. HOW THE CONCEPTS CONNECT TO EACH OTHER ........................................................................................................... 35
8. WHAT THIS MODULE SOLVES CHRONOLOGICALLY IN THE COURSE ................................................................................... 36
9. KEY TERMS AND SEARCHABLE DEFINITIONS ................................................................................................................... 37
10. FINAL MENTAL MAP FOR THIS MODULE ........................................................................................................................ 39
MODULE 3: LANGUAGE MODELING, N-GRAMS, MARKOV MODELS, PERPLEXITY, OOV WORDS,
UNOBSERVED TRANSITIONS, AND SMOOTHING ............................................................................................... 40
1. LANGUAGE MODELING: WHAT PROBLEM IS IT SOLVING? .................................................................................................. 40
2. N-GRAMS: THE BASIC BUILDING BLOCKS ....................................................................................................................... 42
3. THE CHAIN RULE: HOW TO GET THE PROBABILITY OF A WHOLE SENTENCE .......................................................................... 43
4. MARKOV MODELS: APPROXIMATING THE HISTORY .......................................................................................................... 44
5. MAXIMUM LIKELIHOOD ESTIMATION IN LANGUAGE MODELS ............................................................................................. 45
6. CHOICE OF N: WHY NOT JUST MAKE N VERY LARGE? ....................................................................................................... 46
7. MARKOV CHAINS: STATES, TRANSITIONS, AND THE TRANSITION MATRIX............................................................................. 46
8. EVALUATING LANGUAGE MODELS: INTRINSIC AND EXTRINSIC .......................................................................................... 47
9. PERPLEXITY: THE MAIN INTRINSIC EVALUATION METRIC ................................................................................................... 48
10. THE VOCABULARY CONSTRAINT: WHY PERPLEXITY COMPARISONS CAN BE MISLEADING .................................................... 49
11. OOV WORDS: OUT-OF-VOCABULARY WORDS ............................................................................................................. 49
12. UNOBSERVED TRANSITIONS: THE DEEPER SPARSITY PROBLEM ....................................................................................... 50
13. SMOOTHING: MOVING PROBABILITY MASS TO UNSEEN EVENTS ...................................................................................... 51
14. HOW ALL THE CONCEPTS CONNECT ........................................................................................................................... 53
15. BIGGER-PICTURE COURSE MAP ................................................................................................................................. 53
16. KEY TERMS AND SEARCHABLE DEFINITIONS ................................................................................................................. 54
17. FINAL MENTAL MAP FOR THIS MODULE ........................................................................................................................ 57
MODULE 4: PART-OF-SPEECH TAGGING, HIDDEN MARKOV MODELS, DECODING, AND THE VITERBI
ALGORITHM ......................................................................................................................................................... 58
1. WHAT IS PART-OF-SPEECH TAGGING AND WHY DO WE NEED IT? ..................................................................................... 58
2. WORD CLASSES, TAGSETS, AND ANNOTATED CORPORA.................................................................................................. 59
3. TYPES, TOKENS, AND AMBIGUITY: WHY TAGGING IS WORTH DOING ................................................................................... 60
4. BASELINES: THE SIMPLEST POSSIBLE TAGGING SYSTEMS ................................................................................................. 60

, 5. CONTEXT AND THE CONNECTION TO LANGUAGE MODELING ............................................................................................ 61
6. UNKNOWN WORDS IN POS TAGGING ........................................................................................................................... 62
7. HIDDEN MARKOV MODELS: THE STATISTICAL MODEL BEHIND SEQUENCE TAGGING ............................................................ 62
8. TRANSITION PROBABILITIES AND EMISSION PROBABILITIES............................................................................................... 64
9. THE BAYESIAN LOGIC INSIDE HMM TAGGING ................................................................................................................ 65
10. THE TWO KEY ASSUMPTIONS OF HMMS...................................................................................................................... 65
11. HOW HMM PARAMETERS ARE ESTIMATED .................................................................................................................. 66
12. DECODING: FINDING THE BEST HIDDEN TAG SEQUENCE ................................................................................................ 67
13. VITERBI ALGORITHM: DYNAMIC PROGRAMMING FOR HMM DECODING ........................................................................... 67
14. STEP-BY-STEP INTUITION WITH A TINY EXAMPLE ........................................................................................................... 69
15. THE TRELLIS: WHAT IT REALLY MEANS ......................................................................................................................... 69
16. HOW THIS MODULE CONNECTS TO EARLIER AND LATER COURSE TOPICS ......................................................................... 70
17. WHAT PROBLEM DOES HMM TAGGING SOLVE, AND WHAT LIMITATIONS REMAIN? ............................................................ 71
18. KEY TERMS AND SEARCHABLE DEFINITIONS ................................................................................................................. 71
19. FINAL MENTAL MAP FOR THIS MODULE ........................................................................................................................ 74
MODULE 5: PARSING, FORMAL GRAMMARS, PROBABILISTIC CONTEXT-FREE GRAMMARS, PROBLEMS OF
PCFGS, AND DEPENDENCY PARSING ................................................................................................................ 76
1. WHAT IS PARSING AND WHAT PROBLEM DOES IT SOLVE? ................................................................................................ 76
2. CONSTITUENTS AND PARSE TREES: HOW SENTENCE STRUCTURE CAN BE REPRESENTED ..................................................... 77
3. TREEBANKS: WHERE PARSERS GET SUPERVISION ........................................................................................................... 78
4. PARSING AS RECOGNITION VS FULL PARSING ................................................................................................................. 78
5. FORMAL GRAMMARS: THE SYSTEM BEHIND PARSING ....................................................................................................... 79
6. CONTEXT-FREE GRAMMARS (CFGS): THE MAIN PHRASE-STRUCTURE FORMALISM ............................................................ 80
7. CHOMSKY NORMAL FORM (CNF): A USEFUL RESTRICTED GRAMMAR FORMAT................................................................... 81
8. TREEBANKS CAN DEFINE GRAMMARS ............................................................................................................................ 81
9. STRUCTURAL AMBIGUITY: WHY ONE SENTENCE CAN HAVE MULTIPLE PARSES ..................................................................... 82
10. PROBABILISTIC CONTEXT-FREE GRAMMARS (PCFGS): ADDING PROBABILITIES TO CFGS ................................................ 82
11. WHY PCFGS ARE USEFUL, AND WHAT THEY STILL MISS ................................................................................................ 83
12. HOW TO IMPROVE PCFGS A BIT: SPLITTING NON-TERMINALS AND PARENT ANNOTATION .................................................. 84
13. DEPENDENCY PARSING: A DIFFERENT VIEW OF SYNTAX ................................................................................................. 85
14. WHY DEPENDENCY PARSING IS USEFUL ...................................................................................................................... 86
15. SYNTACTIC ROLES IN DEPENDENCY PARSING .............................................................................................................. 86
16. DEPENDENCY TREES AS DIRECTED GRAPHS................................................................................................................. 87
17. TRAINING DATA FOR DEPENDENCY PARSING ................................................................................................................ 87
18. CONSTITUENT PARSING VS DEPENDENCY PARSING ...................................................................................................... 87
19. HOW THIS MODULE FITS INTO THE FULL COURSE TIMELINE ............................................................................................ 88
20. WHAT PROBLEM DOES THIS MODULE SOLVE, AND WHAT PROBLEMS REMAIN? ................................................................. 89
21. KEY TERMS AND SEARCHABLE DEFINITIONS ................................................................................................................. 89
22. FINAL MENTAL MAP FOR THIS MODULE ........................................................................................................................ 93
MODULE 6: LEXICA, WORDNET, WORD VECTORS, PMI, AND DIMENSIONALITY REDUCTION ........................ 94
1. THE BIG PROBLEM: WORDS ARE SYMBOLS, BUT MEANING IS GRADED ................................................................................ 94
2. LEXICA: MANUALLY COLLECTED SEMANTIC KNOWLEDGE ................................................................................................ 95
3. THESAURI AND WORDNET: EXPLICIT NETWORKS OF MEANING RELATIONS ........................................................................ 96
4. SIMILARITY IN WORDNET: PATH LENGTH, INFORMATION CONTENT, AND GLOSS OVERLAP ................................................... 97
5. DISTRIBUTIONAL SEMANTICS: MEANING FROM LANGUAGE USE ...................................................................................... 100
6. WORD VECTORS: TURNING WORDS INTO POINTS IN SPACE ............................................................................................ 100
7. SIMILARITY IN VECTOR SPACES: COSINE SIMILARITY ...................................................................................................... 102
8. WHY RAW FREQUENCY COUNTS ARE NOT ENOUGH ...................................................................................................... 103
9. PMI: WEIGHTING INFORMATIVE CO-OCCURRENCES ..................................................................................................... 103
10. DIMENSIONALITY REDUCTION: COMPRESSING THE SEMANTIC SPACE ............................................................................ 105
11. DISTRIBUTIONAL SEMANTIC MODELS: THE FULL PIPELINE ........................................................................................... 107


2

, 12. HOW THE WHOLE MODULE FITS INTO THE COURSE TIMELINE ....................................................................................... 107
13. KNOWLEDGE-BASED SEMANTICS VS DISTRIBUTIONAL SEMANTICS ................................................................................ 108
14. WHAT THIS MODULE SOLVES, AND WHAT LIMITATIONS REMAIN .................................................................................... 109
15. KEY TERMS AND SEARCHABLE DEFINITIONS ............................................................................................................... 109
16. FINAL MENTAL MAP FOR THIS MODULE ...................................................................................................................... 113
MODULE 7: LOGISTIC REGRESSION, FEED-FORWARD NEURAL NETWORKS, WORD2VEC, AND THE
EVALUATION OF DISTRIBUTIONAL SEMANTIC MODELS .................................................................................. 114
1. THE BIG SHIFT: FROM HAND-CRAFTED EVIDENCE TO LEARNED REPRESENTATIONS............................................................ 114
2. LOGISTIC REGRESSION: THE BASIC DISCRIMINATIVE CLASSIFIER ..................................................................................... 115
3. THE CORE EQUATION OF LOGISTIC REGRESSION........................................................................................................... 116
4. THE SIGMOID FUNCTION: TURNING A SCORE INTO A PROBABILITY ................................................................................... 117
5. THE FOUR ESSENTIAL COMPONENTS OF A CLASSIFIER ................................................................................................... 118
6. LOSS FUNCTIONS: HOW THE MODEL KNOWS IT IS WRONG ............................................................................................. 118
7. WHY LOGISTIC REGRESSION IS USEFUL, AND WHAT ITS MAIN LIMITATION IS...................................................................... 119
8. FEED-FORWARD NEURAL NETWORKS: STACKING NONLINEAR TRANSFORMATIONS ........................................................... 120
9. HIDDEN LAYERS: WHAT THEY ACTUALLY DO ................................................................................................................. 121
10. WHY STACKING LAYERS HELPS ................................................................................................................................ 121
11. FULLY CONNECTED NETWORKS ............................................................................................................................... 122
12. THE OUTPUT LAYER OF A NEURAL NETWORK .............................................................................................................. 122
13. PARAMETERS VS HYPERPARAMETERS ....................................................................................................................... 122
14. LOGISTIC REGRESSION VS FEED-FORWARD NEURAL NETWORKS .................................................................................. 123
15. FROM COUNT-BASED WORD VECTORS TO NEURAL EMBEDDINGS ................................................................................. 124
16. WORD2VEC: DENSE EMBEDDINGS FROM THE START ................................................................................................... 124
17. THE CENTRAL INTUITION OF WORD2VEC ................................................................................................................... 125
18. SKIP-GRAM WITH NEGATIVE SAMPLING (SGNS) ....................................................................................................... 125
19. NEGATIVE SAMPLING: WHY IT IS NEEDED ................................................................................................................... 126
20. WHY SGNS LEARNS SEMANTIC SIMILARITY ............................................................................................................... 127
21. REPRESENTATION LEARNING IN WORD2VEC ............................................................................................................. 127
22. TWO EMBEDDINGS PER WORD ................................................................................................................................. 127
23. CBOW: THE OTHER MAJOR WORD2VEC ARCHITECTURE............................................................................................. 128
24. CBOW VS SGNS ................................................................................................................................................. 128
25. HOW WORD2VEC DIFFERS FROM CLASSICAL COUNT-BASED DSMS ............................................................................ 129
26. EVALUATING EMBEDDINGS: HOW DO WE KNOW THEY ARE GOOD? ............................................................................... 129
27. THE INFLUENCE OF CONTEXT WINDOW SIZE .............................................................................................................. 130
28. SEMANTIC DRIFT: MEANING CHANGE OVER TIME ........................................................................................................ 131
29. INTRINSIC EVALUATION OF EMBEDDINGS .................................................................................................................. 132
30. EXTRINSIC EVALUATION OF EMBEDDINGS .................................................................................................................. 132
31. BIAS IN EMBEDDINGS ............................................................................................................................................. 133
32. HOW THIS MODULE FITS INTO THE FULL COURSE TIMELINE .......................................................................................... 133
33. WHAT THIS MODULE SOLVES, AND WHAT LIMITATIONS REMAIN .................................................................................... 134
34. KEY TERMS AND SEARCHABLE DEFINITIONS ............................................................................................................... 135
35. FINAL MENTAL MAP FOR THIS MODULE ...................................................................................................................... 138
MODULE 8: TRANSFORMERS, ATTENTION, BERT-STYLE ENCODERS, AND GENERATIVE LANGUAGE MODELS
............................................................................................................................................................................ 140
1. WHY MLPS WERE NOT ENOUGH ............................................................................................................................... 140
2. WHY SEQUENCE MODELS WERE NEEDED: LANGUAGE HAS LONG-DISTANCE STRUCTURE .................................................. 141
3. RECURRENT NEURAL NETWORKS: THE FIRST NEURAL ANSWER TO SEQUENCE MODELING .................................................. 141
4. PREDICTIVE TRAINING IN RECURRENT MODELS ............................................................................................................ 142
5. THE MAIN WEAKNESS OF RECURRENT NETWORKS ........................................................................................................ 143
6. LSTM: IMPROVING THE RECURRENT APPROACH .......................................................................................................... 143
7. ATTENTION: THE IDEA THAT BROKE THE BOTTLENECK .................................................................................................... 144


3

, 8. TRANSFORMER: ATTENTION BECOMES THE ARCHITECTURE ........................................................................................... 144
9. ENCODER, DECODER, AND ENCODER–DECODER TRANSFORMERS ................................................................................. 145
10. SELF-ATTENTION: THE CORE COMPUTATION .............................................................................................................. 145
11. QUERY, KEY, AND VALUE: HOW SELF-ATTENTION WORKS STEP BY STEP......................................................................... 146
12. MULTI-HEAD ATTENTION: DIFFERENT RELATION TYPES AT ONCE ................................................................................... 147
13. RESIDUAL CONNECTIONS AND THE RESIDUAL STREAM ................................................................................................ 148
14. POSITIONAL ENCODING: HOW TRANSFORMERS KNOW ORDER ..................................................................................... 148
15. MASKED ATTENTION VS FULL ATTENTION .................................................................................................................. 149
16. CROSS-ATTENTION IN ENCODER–DECODER MODELS ................................................................................................. 149
17. BERT: ENCODER-STYLE SELF-SUPERVISED PRETRAINING ........................................................................................... 150
18. USING BERT FOR DOWNSTREAM TASKS ................................................................................................................... 150
19. GENERATIVE LANGUAGE MODELS: DECODER-STYLE NEXT-TOKEN PREDICTION .............................................................. 151
20. FROM HIDDEN STATES TO VOCABULARY PROBABILITIES .............................................................................................. 151
21. GREEDY DECODING VS SAMPLING ............................................................................................................................ 152
22. TEMPERATURE: CONTROLLING PREDICTABILITY VS DIVERSITY ...................................................................................... 152
23. FINE-TUNING GENERATIVE MODELS FOR TASKS ......................................................................................................... 153
24. ZERO-SHOT PROMPTING: USING THE MODEL WITHOUT PARAMETER UPDATES ................................................................ 153
25. ZERO-SHOT VS IN-CONTEXT LEARNING ..................................................................................................................... 153
26. BERT VS GENERATIVE LMS..................................................................................................................................... 154
27. HOW THIS MODULE CONNECTS TO EARLIER COURSE TOPICS ....................................................................................... 155
28. WHAT THIS MODULE SOLVES, AND WHAT LIMITATIONS REMAIN .................................................................................... 155
29. KEY TERMS AND SEARCHABLE DEFINITIONS ............................................................................................................... 156
30. FINAL MENTAL MAP FOR THIS MODULE ...................................................................................................................... 159
MODULE 9: SPOKEN LANGUAGE, SELF-SUPERVISED SPEECH REPRESENTATION LEARNING, HUBERT,
WAV2VEC 2.0, AND ASR WITH CTC .................................................................................................................. 161
1. WHY SPEECH IS FUNDAMENTALLY DIFFERENT FROM WRITING ........................................................................................ 161
2. WHY SPOKEN LANGUAGE IS CONSIDERED PRIMARY ...................................................................................................... 162
3. MAIN SPEECH APPLICATIONS..................................................................................................................................... 163
4. WHY SPEECH AND TEXT NLP HISTORICALLY DEVELOPED SEPARATELY ............................................................................ 163
5. HOW SPEECH IS REPRESENTED IN A COMPUTER ........................................................................................................... 164
6. WHY BERT-LIKE OR GPT-LIKE MODELING IS HARDER FOR SPEECH ................................................................................ 165
7. SELF-SUPERVISED SPEECH REPRESENTATION LEARNING: THE GENERAL RECIPE ............................................................... 166
8. WHY SPOKEN LANGUAGE MODELING PROVED DIFFICULT .............................................................................................. 166
9. HUBERT: THE BERT-LIKE IDEA FOR SPEECH .............................................................................................................. 167
10. K-MEANS CLUSTERING IN HUBERT ......................................................................................................................... 167
11. WAV2VEC 2.0: SIMILAR GOAL, DIFFERENT STRATEGY .................................................................................................. 168
12. THREE FAMILIES OF SELF-SUPERVISED SPEECH OBJECTIVES ........................................................................................ 169
13. OFFLINE VS INTERNAL DISCRETIZATION..................................................................................................................... 170
14. HOW DO WE EVALUATE PRETRAINED SPEECH REPRESENTATIONS?............................................................................... 170
15. FINE-TUNING FOR ASR .......................................................................................................................................... 171
16. TWO WAYS TO BUILD AN ASR SYSTEM ON TOP OF A PRETRAINED SPEECH MODEL .......................................................... 171
17. CONNECTIONIST TEMPORAL CLASSIFICATION (CTC)................................................................................................. 172
18. WHY DYNAMIC PROGRAMMING IS NEEDED IN CTC .................................................................................................... 172
19. DRAWBACK OF CTC COMPARED WITH A FULL DECODER ............................................................................................ 173
20. HOW ASR OUTPUTS ARE EVALUATED ....................................................................................................................... 174
21. HOW THIS MODULE FITS INTO THE FULL COURSE TIMELINE .......................................................................................... 174
22. WHAT THIS MODULE SOLVES, AND WHAT LIMITATIONS REMAIN .................................................................................... 175
23. KEY TERMS AND SEARCHABLE DEFINITIONS ............................................................................................................... 176
24. FINAL MENTAL MAP FOR THIS MODULE ...................................................................................................................... 179
MODULE 10: GENERATIVE SPOKEN DIALOGUE LANGUAGE MODELING, TURN-TAKING, DIALOG STRUCTURE,
DGSLM, AND EVALUATION OF SPOKEN CONVERSATION MODELS ................................................................ 181



4

Written for

Institution
Study
Course

Document information

Uploaded on
March 22, 2026
Number of pages
236
Written in
2025/2026
Type
SUMMARY

Subjects

$8.29
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
aukehilbrands

Get to know the seller

Seller avatar
aukehilbrands Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
1 year
Number of followers
0
Documents
4
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions