NLP TEST 1
classic parsing method - answer1. parse as search: top-down or bottom-up;
2. shift-reduce
3. cky
4. Earley
CKY parser - answer bottom-up; requires a binarized grammar
earley parser - answer top-down, complex
generative classifier - answer Naive Bayes. Build a model of each class. Given an
observation, they return the class most likely to have generated the observation.
discriminative classifier - answer Logistic regression (MaxEnt). Learn what features from
the input are most useful to discriminate between the different classes.
10-fold cross-validation - answer 留太多 training set 的话,test set 小就不够有代表
性。Thus use all data both for training and test.
1. Randomly choose a training and test set division of data, train the classifier, compute
the error rate on the test set.
2. Repeat with a different randomly selected training set and test set.
3. Do it 10 times
4. average 10 runs to get an average error rate
又因为所有 data 都用来 test,我们不能去看 data,分析有哪些 feature。为避免这种情
况:
create a fixed training set and test set, then do 10-fold cross-validation inside the
training set, compute error rate the normal way in the test set.
overfitting - answerA model that learned the noise instead of the signal is considered
overfit because it fits the training dataset but has poor fit with new datasets.
two common architectures for corpus-based chabots - answer1. information retrieval
2. machine learned sequence transduction
types of chatbots - answerrule-based, corpus-based, frame-based(task-based)
domain ontology - answermodern frame-based dialogue systems are based on domain
ontology.
The ontology defines one or more frames, each a collection of slots, and
slot defines the values that each slot can take
, frame-based chatbot/GUS architecture - answerbased on hand designed FSA.
NLU goal for filling frame-based chatbot slots - answer1. domain classification
2. user intent determination
3. slot filling
language models - answerModels that assign probabilities to sequences of words. The
simplest model is N-gram model.
N-gram model - answerInstead of computing the probability of a word given its entire
history, we can approximate the history by just the last few words. It is based on Markov
assumption: the probability of a word depends only on the previous word.
Markov models - answerthe class of probabilistic models that assume we can predict
the probability of some future unit without looking too far into the past.
maximum likelihood estimation - answerThe procedure of computing the score for all
possible parameter values to identify the parameter value that confers the highest
likelihood score
evaluate language models - answer1. extrinsic evaluation: to embed the model in an
application and measure how much the application improves. Expensive
2. intrinsic evaluation: to measure the quality of a model independent of any application.
80% training, 10%development set, 10% test set
perplexity - answer In practice, we don't use raw probability as our metric for evaluating
language models but a variant called perplexity. It is the inverse probability of the test
set. The lower the perplexity, the higher the probability.
Perplexity can also be thought as the weighted average branching factor of a language
(Not just a branching factor).
The branching factor of a language is the number of possible next words that can follow
any word.
OOV - answer out of vocabulary, words that we haven't seen before.
The percentage of OOV words that appear in the test set is called the OOV rate.
Smoothing - answers keep a language model from assigning zero probability to these
unseen events, we'll have to shave off a bit of probability mass from some more
frequent events and give it to the events we've never seen. This modification is called
smoothing or discounting.
Laplace/add-1 smoothing, add-k smoothing, stupid backoff, Kneser-Ney
smoothing(most useful for language modeling)
(add-one and add-k are not good for language modeling, but good for classification)
classic parsing method - answer1. parse as search: top-down or bottom-up;
2. shift-reduce
3. cky
4. Earley
CKY parser - answer bottom-up; requires a binarized grammar
earley parser - answer top-down, complex
generative classifier - answer Naive Bayes. Build a model of each class. Given an
observation, they return the class most likely to have generated the observation.
discriminative classifier - answer Logistic regression (MaxEnt). Learn what features from
the input are most useful to discriminate between the different classes.
10-fold cross-validation - answer 留太多 training set 的话,test set 小就不够有代表
性。Thus use all data both for training and test.
1. Randomly choose a training and test set division of data, train the classifier, compute
the error rate on the test set.
2. Repeat with a different randomly selected training set and test set.
3. Do it 10 times
4. average 10 runs to get an average error rate
又因为所有 data 都用来 test,我们不能去看 data,分析有哪些 feature。为避免这种情
况:
create a fixed training set and test set, then do 10-fold cross-validation inside the
training set, compute error rate the normal way in the test set.
overfitting - answerA model that learned the noise instead of the signal is considered
overfit because it fits the training dataset but has poor fit with new datasets.
two common architectures for corpus-based chabots - answer1. information retrieval
2. machine learned sequence transduction
types of chatbots - answerrule-based, corpus-based, frame-based(task-based)
domain ontology - answermodern frame-based dialogue systems are based on domain
ontology.
The ontology defines one or more frames, each a collection of slots, and
slot defines the values that each slot can take
, frame-based chatbot/GUS architecture - answerbased on hand designed FSA.
NLU goal for filling frame-based chatbot slots - answer1. domain classification
2. user intent determination
3. slot filling
language models - answerModels that assign probabilities to sequences of words. The
simplest model is N-gram model.
N-gram model - answerInstead of computing the probability of a word given its entire
history, we can approximate the history by just the last few words. It is based on Markov
assumption: the probability of a word depends only on the previous word.
Markov models - answerthe class of probabilistic models that assume we can predict
the probability of some future unit without looking too far into the past.
maximum likelihood estimation - answerThe procedure of computing the score for all
possible parameter values to identify the parameter value that confers the highest
likelihood score
evaluate language models - answer1. extrinsic evaluation: to embed the model in an
application and measure how much the application improves. Expensive
2. intrinsic evaluation: to measure the quality of a model independent of any application.
80% training, 10%development set, 10% test set
perplexity - answer In practice, we don't use raw probability as our metric for evaluating
language models but a variant called perplexity. It is the inverse probability of the test
set. The lower the perplexity, the higher the probability.
Perplexity can also be thought as the weighted average branching factor of a language
(Not just a branching factor).
The branching factor of a language is the number of possible next words that can follow
any word.
OOV - answer out of vocabulary, words that we haven't seen before.
The percentage of OOV words that appear in the test set is called the OOV rate.
Smoothing - answers keep a language model from assigning zero probability to these
unseen events, we'll have to shave off a bit of probability mass from some more
frequent events and give it to the events we've never seen. This modification is called
smoothing or discounting.
Laplace/add-1 smoothing, add-k smoothing, stupid backoff, Kneser-Ney
smoothing(most useful for language modeling)
(add-one and add-k are not good for language modeling, but good for classification)