Samenvatting

Natural Language Generation (INFOMNLG) Summary All Lectures + Course Notes

Beoordeling

Verkocht

Pagina's

Geüpload op

02-04-2026

Geschreven in

2025/2026

This document includes a summary of all the lectures, enriched with lecture notes, screenshots of important lecture slides and extra notes with explanations to help understand the content and concepts better.

Instelling

Vak

Voorbeeld van de inhoud

L1 Introduction
Natural Language Generation types

1) Input looks English but is just data, output is in a language. Data-to-text
o 1. Summary of data, input data – output text
2) Input query – output text
o Same architecture can be used for both scenarios
• Questions for these examples
o What is the model’s output based on? What is it grounded in?
o Can we handle these cases with the same architecture?
o What questions to ask when to evaluate it?
• Natural language generation: tasks of generating text from some input in any natural language. Settings:
o Text-to-text, e.g. classical tasks: summarization, machine translation
o Data-to-text, e.g. summarizing tables (sports/weather data), summarizing patient data
o Media-to-text, e.g. captioning images, describing videos
o Open-ended (creative) generation, e.g. generating (fictional) stories, poetry based on prompts
▪ Deep neural networks (transformers) offer a unified framework which deal with all of
these kinds of language generations. So they can perform all these tasks
• What to say and how to say it - Thompson
o Strategic choices: what the system/human chooses to say
▪ Based on: input, additional knowledge and target language
▪ E.g. in the street organ example: street, organ, people
o Tactical choices: how to say it
▪ Highly dependent on language
▪ E.g. in the street organ example: a street organ on a city street.
• History
o Difficult to stop hallucinations from happening
o Extrinsic evaluation of smoking text
o Learning without parameter updates: learn by showing examples
• Dimensions when generating text
o Language: fluency, variation, style & coherence
o World: accuracy with relation to input, faithfulness to input &
truthfulness
o Interpersonal (pragmatics, sociolinguistics): alignment to
communicative intent, avoidance of harm

1

,L2 Subtasks Involved in Generating Text
Modular vs end - to- end
• Modular architecture: breaks down the main task into subtasks, modelling each one separately. Dominant
approach in ‘classical’ (pre-neural) NLG systems
• End-to-end models: no or fewer explicit subtasks. Less attention is paid to designing the steps in between,
but attention is paid in designing a learning framework. Contemporary models are end-to-end trained.
o Start from pairings from input and output, system needs to find the pathway itself
o Harder to figure out where the choices are being made
• Various tasks can be grouped in a three-stage pipeline: starting from input, generate text based on this input
and in more steps?
o Architecture represented a ‘consensus’ view
Reiter’s pipeline architecture, highly modular

• Document planner: picking what info it will convey and how it is organized. About the what
o Domain and task related things
• Microplanner: about the how, how you bring the information
o More language specific (words we use depend on which language is generated)
• Surface realiser: turns it into actual text
• Diagram doesn’t show knowledge sources! Like domain knowledge (e.g. information about the weather),
lexical/grammatical knowledge, model of the user
• Strategic tasks
o Selecting the messages to be included
o Rhetorical: relates words with relation e.g. contrast relation
o Ordering
o Segmentation
• Tactical tasks
o Lexicalisation: the words that we choose (use warm or hot?)
o Referring Expression Generation: how do we refer to things in the domain?
o Aggregation: how do we merge to get more fluid sentences?
• Tactical tasks
o Choosing syntactic structures
o Applying morphological rules
o Rendering the text as a string
• When dealing with raw, unstructured data, steps have to be taken before generating text. Data has to be
analysed in order to:
o 1) Identify the important things and filter out noise
o 2) Map the data to appropriate input representations
o 3) Perform some reasoning on these representations
• Extension of original architecture pipeline to handle data pre-processing – Reiter (2007)
o Signal analysis: to extract patterns and trends from unstructured input data
o Data interpretation: to perform reasoning on the results

2

,• Example BabyTalk (data-to-text system)

• Document planning/content selection
o Main tasks: content selection & information ordering
o Typical output is document plan
▪ Tree whose leaves are messages
▪ Nonterminals indicate rhetorical relations between messages (e.g. justify, part-
of/includes, cause, sequence)

• Lexicalisation: events in a sequence can be described in many ways to express the same thing
o SEQUENCE(x,y,z)
▪ x happened, then y, then z
▪ x happened, followed by y and z
▪ x,y,z happened
▪ there was a sequence of x,y,z
o With enough data, this variation can be learned
• Aggregation: given 2 or more messages, identify ways in which they could be merged into one, more concise
message
o e.g. be(HR, stable) + be(HR, normal)
▪ (No aggregation) HR is currently stable. HR is within the normal range
▪ (conjunction) HR is currently stable and HR is within the normal range
▪ (adjunction) HR is currently stable within the normal range
• Referring expressions: given an entity, identify the best way to refer to it unambiguously, e.g. bradycardia: a
bradycardia, the bradycardia, it, the previous one.
o Depends on discourse context: pronouns only make sense if entity has been referred to before
• Syntactic planning: sentence form can vary and still express the same thing

3

, o Realisation, subtasks
▪ 1) Map the output of microplanning to a syntactic structure
▪ 2) Identify the best form, given the input representation (Which is the best alternative?
Very hard to model in a rule-based fashion. Statistical approaches provide a solution.)
▪ 3) Apply inflectional morphology (plural, past tense etc) and then linearise as text string
• Key takeaways
o Text generation involves a series of choices
o Strategic choices (what) → context selection and microplanning
o Tactical choices (how) → microplanning and realisation
o Classic systems
▪ Heavily engineered
▪ Often modular
▪ Full control of choice behaviour
▪ Limited fluency and variation
o Contemporary models
▪ Trained (neural)
▪ Choice behaviour is stochastic, and learned from data
▪ Harder to control
▪ Much more fluent, broader variation
o Generating meaningful text is really hard
Image captioning - Modular and Data - Driven approaches
• General setup is similar to data-to-text scenario, only input is now a picture
• Kulkarni et al (2011)
o Key contribution: map from object/attribute detections to generated sentences
▪ Blue = objects
▪ Orange = spatial relations
▪ Green = other attributes

o “This is a photograph of one person and one brown sofa and one dog. The person is against the
brown sofa. And the dog is near the person and beside the brown sofa.”
o Modular step by step pipeline with dog
• Mitchell et al (2012)
o Key contribution: exploit corpus-based knowledge for generation

o Finding the most likely way to relate the words that described the image, but this could be wrong

4

Meld schending auteursrecht

Geschreven voor

Instelling: Universiteit Utrecht (UU)
Studie: Business Informatics
Vak: Natural Language Generation (INFOMNLG)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 2 april 2026
Aantal pagina's: 53
Geschreven in: 2025/2026
Type: SAMENVATTING

Onderwerpen

encoder
decoder
llms
language model
computational models
pre training
post training
evaluation methods
natural language generation

$10.17

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

niniiii

3.0

(1)

Maak kennis met de verkoper

niniiii Universiteit Utrecht

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

1 maand geleden

3.0

1 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper niniiii. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $10.17. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 48886 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Natural Language Generation (INFOMNLG) Summary All Lectures + Course Notes

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?