Python – Complete Study Notes
Introduction
NLP is a branch of AI that focuses on enabling machines to understand, interpret, and
generate human language.
It is widely used in applications such as chatbots, translation systems, and sentiment
analysis.
Python provides powerful libraries like NLTK and spaCy for NLP tasks.
Text Preprocessing
Text preprocessing prepares raw text for analysis.
It includes steps such as tokenization, stopword removal, and stemming.
Example Code:
from nltk.tokenize import word_tokenize
text = 'NLP is interesting'
print(word_tokenize(text))
Tokenization
Tokenization splits text into words or sentences.
It is the first step in NLP pipelines.
It helps in analyzing text structure.
Stopwords Removal
Stopwords are common words like 'is', 'the', 'and'.
They are removed to focus on meaningful words.
Example Code:
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
, Stemming and Lemmatization
Stemming reduces words to root form.
Lemmatization converts words to meaningful base forms.
Example Code:
from nltk.stem import PorterStemmer
ps = PorterStemmer()
print(ps.stem('running'))
Bag of Words
Bag of Words represents text as frequency of words.
It ignores grammar but keeps important terms.
It is widely used in text classification.
TF-IDF
TF-IDF measures importance of words in documents.
It reduces weight of common words and increases rare words.
Used in search engines and document ranking.
Sentiment Analysis
Sentiment analysis determines emotion behind text.
It classifies text as positive, negative, or neutral.
Example Code:
from textblob import TextBlob
analysis = TextBlob('I love this product')
print(analysis.sentiment)
Named Entity Recognition (NER)
NER identifies entities such as names, locations, and organizations.
It is used in information extraction.
Example Code: