NLP Quiz #4
when to use rematch and research – answer rematch () and research () are two
functions in Python's re module that are used to search for a pattern in a string. The
main difference between the two is that rematch () only matches the pattern at the
beginning of the string, while research () searches for the pattern anywhere in the string
differences between the regex quantifiers (? * + {}), and what |, $, and . mean - answer?
Matches the preceding character or group zero or one time.
*: Matches the preceding character or group zero or more times.
+: Matches the preceding character or group one or more times.
{n,m}: Matches the preceding character or group between n and m times.
. is a special character in regular expressions that matches any character except for a
newline character
| is another special character in regular expressions that is used to specify alternatives.
It matches either the expression before or after the |
$ is a special character in regular expressions that matches the end of a string
What's a word embedding? - answerA word embedding is a mapping of a word into a
vector space... words turn into numbers / vectors!
One-hot encoding - answerturning a feature/observation into a vector where exactly one
dimension has a non-zero value -- giving unique ID's to each word. for more categorical
data
For a vocabulary of size N, how big do the vectors need to be? - answerFor one-hot
encoding in a vocabulary of size N, each vector needs to be of size N.
Sparse vs Dense vectors - answerDense Vectors:
Representation: Dense vectors store all the elements of a vector, even if they are zero.
Storage: They use a fixed amount of memory equal to the size of the vector, regardless
of the actual number of non-zero elements.
Example: If you have a dense vector [2, 0, 0, 5], it would be stored as [2, 0, 0, 5] without
omitting any zeros.
Representation: Sparse vectors store only the non-zero elements along with their
indices.
Storage: They use memory proportional to the number of non-zero elements, which can
be much smaller than the size of the vector.
Example: The same vector [2, 0, 0, 5] would be represented sparsely as [(0, 2), (3, 5)],
indicating that the non-zero elements are at indices 0 and 3, with values 2 and 5,
respectively.
when to use rematch and research – answer rematch () and research () are two
functions in Python's re module that are used to search for a pattern in a string. The
main difference between the two is that rematch () only matches the pattern at the
beginning of the string, while research () searches for the pattern anywhere in the string
differences between the regex quantifiers (? * + {}), and what |, $, and . mean - answer?
Matches the preceding character or group zero or one time.
*: Matches the preceding character or group zero or more times.
+: Matches the preceding character or group one or more times.
{n,m}: Matches the preceding character or group between n and m times.
. is a special character in regular expressions that matches any character except for a
newline character
| is another special character in regular expressions that is used to specify alternatives.
It matches either the expression before or after the |
$ is a special character in regular expressions that matches the end of a string
What's a word embedding? - answerA word embedding is a mapping of a word into a
vector space... words turn into numbers / vectors!
One-hot encoding - answerturning a feature/observation into a vector where exactly one
dimension has a non-zero value -- giving unique ID's to each word. for more categorical
data
For a vocabulary of size N, how big do the vectors need to be? - answerFor one-hot
encoding in a vocabulary of size N, each vector needs to be of size N.
Sparse vs Dense vectors - answerDense Vectors:
Representation: Dense vectors store all the elements of a vector, even if they are zero.
Storage: They use a fixed amount of memory equal to the size of the vector, regardless
of the actual number of non-zero elements.
Example: If you have a dense vector [2, 0, 0, 5], it would be stored as [2, 0, 0, 5] without
omitting any zeros.
Representation: Sparse vectors store only the non-zero elements along with their
indices.
Storage: They use memory proportional to the number of non-zero elements, which can
be much smaller than the size of the vector.
Example: The same vector [2, 0, 0, 5] would be represented sparsely as [(0, 2), (3, 5)],
indicating that the non-zero elements are at indices 0 and 3, with values 2 and 5,
respectively.