Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

Natural Language Processing Exam Guide Questions and Answers Graded A+

Rating
-
Sold
-
Pages
48
Grade
A+
Uploaded on
31-10-2024
Written in
2024/2025

Natural Language Processing Exam Guide

Institution
Course

Content preview

Natural Language Processing Exam
Guide

What is the specific research problem used to measure the effectiveness of sentiment
analysis - answer- Uses a corpus of movie reviews where the rating associated with
each review is known. Hence, there is an objective measure of whether a review was
positive or negative.
- Pang et al. balances the corpus so it had 50% positive reviews and 50% negative
- Research problem is to assign sentiment automatically to each document in the entire
corpus to agree with the known ratings.

How may a bag of words technique for sentiment analysis be improved - answer- Using
a form of compositional semantics had positive results
- Neural network methods has shown high performance, it is hypothesized that they
induced some internal representation of syntactic and/or semantic structure.
- Many 'obvious' techniques such as accounting for negating words turn out not be so
good, especially as deeper parsing is necessary to determine the scope of negation
correctly.

Why is knowing trivial information both a difficult and essential problem for information
retrieval systems? - answerTrivial information is critical for human reasoning but tends
not to be explicitly stated anywhere, since humans find it trivial.

Why does limiting natural language interface systems to limited domains make it a
relatively easy problem? - answerIt removes a lot of ambiguity, e.g. LUNAR (the lunar
rock sample database querying system) only dealt with "rock" in the sense of the
material, never the music.

What is morphology? - answerThe study of the structure of words.

What is a morpheme? - answerThe minimal information carrying unit of a word

What is an affix? - answerMorphemes which can only occur in conjunction with other
morphemes, e.g. words that are made up of a stem and zero or more affixes.

What are the different types of affixes? Which occur in English? - answerPrefix, suffix,
infix, circumfix

English has pre- and suffix

,What does it mean for a certain linguistic construct to be "productive"? - answerIs
applied to new words

Define the difference between inflectional and derivational morphology? -
answerInflectional morphology can be thought of as setting values of slots in some
paradigm (i.e. there is a fixed set of slots which can be thought of as being filled with
simple values). Inflectional morphology concerns properties such as tense, aspect,
number, person, gender and case.

Derivational morphology have a broader range of semantic possibilities, in the that there
seems no principled limit on what they can mean and don't fit into neat paradigms.

What is derivational morphology? - answerDerivational affixes, such as un-, re-, anti-
etc, have a broader range of semantic possibilities (there seems no principled limit on
what they can mean) and don't fit into neat paradigms. Inflectional affixes may be
combined (though not in English). However, there are always obvious limits to this,
since once all the possible slot values are 'set', nothing else can happen.

What is inflectional morphology? - answerInflectional morphology can be thought of as
setting values of slots in some paradigm (i.e., there is a fixed set of slots which can be
thought of as being filled with simple values). Inflectional morphology concerns
properties such as tense, aspect, number, person, gender, and case, although not all
languages code all of these: English, for instance, has very little morphological marking
of case and gender.

What is a "full form lexicon"? - answerA list of all inflected forms treating derivational
morphology as non-productive. But since the vast majority of words in English have
regular morphology so a full-form lexicon can be regarded as a form of compilation - it is
redundant to have to specify the inflected form as well as the stem.

What is "stemming"? - answerA technique in traditional information retrieval systems.
Involves reducing all morphologically complex forms to a canonical form. The canonical
form may not be the linguistic stem, despite the name of the technique. The most
commonly used algorithm is the Porter stemmer, which uses a series of simple rules to
strip endings.

What is "lemmatization"? - answerAnother name for morphological analysis.

Describe English morphological structure - answerGenerally concatenative

Describe the formation of spelling rules - answerAKA orthographic rules

In such rules, the mapping is always given from the 'underlying' form to the surface
form, the mapping is shown to the left of the slash and the context to the right, with the
indicating the position in question. Example:

,$\epsilon \to e/{ s } \textasciicircum _s$

What sort of lexical information is needed for full, high precision morphological
processing - answer- Affixes, plus the associated information conveyed by the affix
- Irregular forms, with associated information similar to that for affixes
- Stems with syntactic categories (plus more information if derivational morphology is to
be treated as productive)

Give a simple way to encode affix lexicons in a - answerPair affixes with an encoding of
the syntactic/semantic effect of it. E.g.:

ed PAST_VERB
ed PSP_VERB
s PLURAL_NOUN

A lexicon of irregular forms is also necessary. One approach is just a triple consisting of
inflected form, 'affix information' and stem, where 'affix information' corresponds to
whatever encoding is used for the regular affix. E.g.:

began PAST_VERB begin
begun PSP_VERB begin

This approach can be used for generation as well as analysis

Give examples where the idea that morphology is purely concatenative breaks down -
answerunkempt - kempt is no longer a word

feed - could be fee -ed but fee is a noun

corpus - there is no such single "corpu"

What does it mean for a generative system to "overgenerate"? - answerOne that
generates output which is invalid (as well as valid ones)

Why are FSTs more useful than FSAs for morpheme analysis - answerFSAs can be
used to recognise certain patterns but don't by themselves allow for any analysis of
word forms. Hence for morphology we use FSTs which allow the surface structure to be
mapped into the list of morphemes. FSTs are useful for both analysis ad generation
since the mapping in bidirectional. This approach is known as "two-level morphology".

What sort of formalism do spelling rules map to? - answerFinite state transducers

What does 'two level morphology' mean? - answerA system which is good for both
analysing and generating mophemes

, Describe a finite state transducer - answerTransducers map between two
representations, so each transition corresponds to a pair of characters. As with the
spelling rule, we use the special character 'ε' to correspond to the empty character and
'ˆ' to correspond to an affix boundary. The abbreviation 'other : other' means that any
character not mentioned specifically in the FST maps to itself.18
AswiththeFSAexample,weassumethattheFSTonlyacceptsaninputiftheendoftheinputcorre
sponds to an accept state (i.e., no 'left-over' characters are allowed).

List some uses of finite state techniques in NLP - answer- Morpheme
analysis/generation
- Grammars for simple dialog systems
- Partial grammars for named entity recognition
- Dialogue models for spoken dialogue systems (SDS). SDS use dialogue models for a
variety of purposes: in- cluding controlling the way that the information acquired from
the user is instantiated (e.g., the slots that are filled in an underlying database) and
limiting the vocabulary to achieve higher recognition rates. FSAs can be used to record
possible transitions between states in a simple dialogue.

What useful additions can be made to FSAs? - answerTransition probabilities

Define 'corpus' - answerA body of text that has been collected for some purpose.

Define 'balanced corpus' - answerA corpus which contains texts which represent
different genres (newspapers, fiction, textbooks, parliamentary reports, cooking recipes,
scientific papers etc etc): early examples were the Brown corpus (US English: 1960s)
and the Lancaster- Oslo-Bergen (LOB) corpus (British English: 1970s) which are each
about 1 million words: the more recent British National Corpus (BNC: 1990s) contains
approximately 100 million words, including about 10 million words of spoken English.

Why did many mainstream linguists discount the use of corpuses? - answerMainstream
linguists in the past mostly dismissed their use in favour of reliance on intuitive
judgements about whether or not an utterance is grammatical (a corpus can only
(directly) provide positive evidence about grammaticality). However, many linguists do
now use corpora.

What is a 'Wizard of Oz' experiment? - answerFor interface applications in particular,
collecting a corpus requires a simulation of the actual application: this has often been
done by a Wizard of Oz experiment, where a human pretends to be a computer.

Why are corpuses needed in NLP? - answerFirstly, we have to evaluate algorithms on
real language: corpora are required for this purpose for any style of NLP. Secondly,
corpora provide the data source for many machine-learning approaches.

Why do we want to use prediction in NLP? - answer- Some machine learning systems
can be trained using prediction on general text corpora in a way that also makes them
useful on other tasks where there is limited training data.

Written for

Course

Document information

Uploaded on
October 31, 2024
Number of pages
48
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$14.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
julianah420 Phoenix University
Follow You need to be logged in order to follow users or courses
Sold
692
Member since
3 year
Number of followers
329
Documents
35563
Last sold
2 days ago
NURSING,TESTBANKS,ASSIGNMENT,AQA AND ALL REVISION MATERIALS

On this page, you find all documents, package deals, and flashcards offered by seller julianah420

4.2

156 reviews

5
102
4
21
3
12
2
5
1
16

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions