Summary

Natural Language Generation (INFOMNLG) Summary All Lectures + Course Notes

Rating

Sold

Pages

Uploaded on

02-04-2026

Written in

2025/2026

This document includes a summary of all the lectures, enriched with lecture notes, screenshots of important lecture slides and extra notes with explanations to help understand the content and concepts better.

Institution

Course

Content preview

L1 Introduction
Natural Language Generation types

1) Input looks English but is just data, output is in a language. Data-to-text
o 1. Summary of data, input data – output text
2) Input query – output text
o Same architecture can be used for both scenarios
• Questions for these examples
o What is the model’s output based on? What is it grounded in?
o Can we handle these cases with the same architecture?
o What questions to ask when to evaluate it?
• Natural language generation: tasks of generating text from some input in any natural language. Settings:
o Text-to-text, e.g. classical tasks: summarization, machine translation
o Data-to-text, e.g. summarizing tables (sports/weather data), summarizing patient data
o Media-to-text, e.g. captioning images, describing videos
o Open-ended (creative) generation, e.g. generating (fictional) stories, poetry based on prompts
▪ Deep neural networks (transformers) offer a unified framework which deal with all of
these kinds of language generations. So they can perform all these tasks
• What to say and how to say it - Thompson
o Strategic choices: what the system/human chooses to say
▪ Based on: input, additional knowledge and target language
▪ E.g. in the street organ example: street, organ, people
o Tactical choices: how to say it
▪ Highly dependent on language
▪ E.g. in the street organ example: a street organ on a city street.
• History
o Difficult to stop hallucinations from happening
o Extrinsic evaluation of smoking text
o Learning without parameter updates: learn by showing examples
• Dimensions when generating text
o Language: fluency, variation, style & coherence
o World: accuracy with relation to input, faithfulness to input &
truthfulness
o Interpersonal (pragmatics, sociolinguistics): alignment to
communicative intent, avoidance of harm

1

,L2 Subtasks Involved in Generating Text
Modular vs end - to- end
• Modular architecture: breaks down the main task into subtasks, modelling each one separately. Dominant
approach in ‘classical’ (pre-neural) NLG systems
• End-to-end models: no or fewer explicit subtasks. Less attention is paid to designing the steps in between,
but attention is paid in designing a learning framework. Contemporary models are end-to-end trained.
o Start from pairings from input and output, system needs to find the pathway itself
o Harder to figure out where the choices are being made
• Various tasks can be grouped in a three-stage pipeline: starting from input, generate text based on this input
and in more steps?
o Architecture represented a ‘consensus’ view
Reiter’s pipeline architecture, highly modular

• Document planner: picking what info it will convey and how it is organized. About the what
o Domain and task related things
• Microplanner: about the how, how you bring the information
o More language specific (words we use depend on which language is generated)
• Surface realiser: turns it into actual text
• Diagram doesn’t show knowledge sources! Like domain knowledge (e.g. information about the weather),
lexical/grammatical knowledge, model of the user
• Strategic tasks
o Selecting the messages to be included
o Rhetorical: relates words with relation e.g. contrast relation
o Ordering
o Segmentation
• Tactical tasks
o Lexicalisation: the words that we choose (use warm or hot?)
o Referring Expression Generation: how do we refer to things in the domain?
o Aggregation: how do we merge to get more fluid sentences?
• Tactical tasks
o Choosing syntactic structures
o Applying morphological rules
o Rendering the text as a string
• When dealing with raw, unstructured data, steps have to be taken before generating text. Data has to be
analysed in order to:
o 1) Identify the important things and filter out noise
o 2) Map the data to appropriate input representations
o 3) Perform some reasoning on these representations
• Extension of original architecture pipeline to handle data pre-processing – Reiter (2007)
o Signal analysis: to extract patterns and trends from unstructured input data
o Data interpretation: to perform reasoning on the results

2

,• Example BabyTalk (data-to-text system)

• Document planning/content selection
o Main tasks: content selection & information ordering
o Typical output is document plan
▪ Tree whose leaves are messages
▪ Nonterminals indicate rhetorical relations between messages (e.g. justify, part-
of/includes, cause, sequence)

• Lexicalisation: events in a sequence can be described in many ways to express the same thing
o SEQUENCE(x,y,z)
▪ x happened, then y, then z
▪ x happened, followed by y and z
▪ x,y,z happened
▪ there was a sequence of x,y,z
o With enough data, this variation can be learned
• Aggregation: given 2 or more messages, identify ways in which they could be merged into one, more concise
message
o e.g. be(HR, stable) + be(HR, normal)
▪ (No aggregation) HR is currently stable. HR is within the normal range
▪ (conjunction) HR is currently stable and HR is within the normal range
▪ (adjunction) HR is currently stable within the normal range
• Referring expressions: given an entity, identify the best way to refer to it unambiguously, e.g. bradycardia: a
bradycardia, the bradycardia, it, the previous one.
o Depends on discourse context: pronouns only make sense if entity has been referred to before
• Syntactic planning: sentence form can vary and still express the same thing

3

, o Realisation, subtasks
▪ 1) Map the output of microplanning to a syntactic structure
▪ 2) Identify the best form, given the input representation (Which is the best alternative?
Very hard to model in a rule-based fashion. Statistical approaches provide a solution.)
▪ 3) Apply inflectional morphology (plural, past tense etc) and then linearise as text string
• Key takeaways
o Text generation involves a series of choices
o Strategic choices (what) → context selection and microplanning
o Tactical choices (how) → microplanning and realisation
o Classic systems
▪ Heavily engineered
▪ Often modular
▪ Full control of choice behaviour
▪ Limited fluency and variation
o Contemporary models
▪ Trained (neural)
▪ Choice behaviour is stochastic, and learned from data
▪ Harder to control
▪ Much more fluent, broader variation
o Generating meaningful text is really hard
Image captioning - Modular and Data - Driven approaches
• General setup is similar to data-to-text scenario, only input is now a picture
• Kulkarni et al (2011)
o Key contribution: map from object/attribute detections to generated sentences
▪ Blue = objects
▪ Orange = spatial relations
▪ Green = other attributes

o “This is a photograph of one person and one brown sofa and one dog. The person is against the
brown sofa. And the dog is near the person and beside the brown sofa.”
o Modular step by step pipeline with dog
• Mitchell et al (2012)
o Key contribution: exploit corpus-based knowledge for generation

o Finding the most likely way to relate the words that described the image, but this could be wrong

4

Report Copyright Violation

Written for

Institution: Universiteit Utrecht (UU)
Study: Business Informatics
Course: Natural Language Generation (INFOMNLG)

All documents for this subject (1)

Document information

Uploaded on: April 2, 2026
Number of pages: 53
Written in: 2025/2026
Type: SUMMARY

Subjects

encoder
decoder
llms
language model
computational models
pre training
post training
evaluation methods
natural language generation

€8,49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

niniiii

3,0

(1)

Get to know the seller

niniiii Universiteit Utrecht

View profile

Sold

Member since

2 year

Number of followers

Documents

Last sold

4 days ago

3,0

1 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller niniiii. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for €8,49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 54752 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Natural Language Generation (INFOMNLG) Summary All Lectures + Course Notes

Content preview

Written for

Document information

Subjects

More courses for Universiteit Utrecht (UU) > Business Informatics

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?