Class notes

APPLICATION PROBLEMS-MACHINE LEARNING

Rating

Sold

Pages

Uploaded on

26-04-2025

Written in

2024/2025

These notes focus on real-world Application Problems using Machine Learning. It includes detailed case studies on churn analysis and prediction using Cox-Proportional Models and churn prediction techniques. It also covers credit card fraud detection, with emphasis on handling imbalanced data and the use of neural networks. Sentiment analysis and topic mining from The New York Times articles are addressed using methods like cosine similarity, chi-square tests, and N-gram models. Additional Natural Language Processing (NLP) techniques such as part-of-speech tagging, stemming, and chunking are discussed, along with sales funnel analysis. These notes are perfect for students and professionals who want practical insights into applying ML techniques for real-world challenges.

Show more Read less

Institution

Course

Content preview

UNIT V APPLICATION PROBLEMS

The case studies- churn analysis and prediction using Cox-proportional models, and
churn prediction techniques. - Credit card fraud analysis with a focus on handling
imbalanced data and neural networks. - Sentiment analysis and topic mining from the
New York Times are addressed using similarity measures like cosine similarity, chi-
square, and N-grams. part-of-speech tagging, stemming, chunking - sales funnel
analysis, A/B testing, and campaign effectiveness. - Web page layout effectiveness -
recommendation systems with collaborative filtering - customer segmentation
5.1 CHURNandANALYSIS
strategies lifetime value- portfolio risk conformance and optimization, and Uber
alternative
Churn occursrouting
when a with graph
customer construction
discontinues and or
a service route
stopsoptimization.
using a product. The definition can vary
depending on the industry. For instance, in telecommunications, a churner might be someone who
cancels their subscription, while in retail, it could be a customer who hasn't made a purchase in a set
period.
Features for Churn Prediction:
 Customer Demographics: Age, gender, location, etc.
 Behavioral Data: Frequency of usage, time since the last interaction, customer support
interactions, etc.
 Transaction Data: Purchase history, average purchase value, payment methods.
 Subscription Information: Type of subscription, renewal dates, discounts applied, etc.
 Customer Feedback: Survey responses, ratings, reviews.

5.2 CHURN ANALYSIS PREDICTION USING COX-PROPORTIONAL MODELS
The Cox Proportional Hazards model, often referred to as the Cox model, is a powerful tool used in
survival analysis to predict the time until an event of interest occurs, such as customer churn, equipment
failure, or patient survival. Unlike traditional regression models, the Cox model specifically accounts for
censored data, where the event has not occurred for some individuals by the end of the study period.

Survival Analysis:
 Survival Time: The time duration until the event occurs.
 Censoring: This occurs when the event has not been observed for some subjects during the study
period. For example, if a customer hasn't churned by the end of the observation period, their data is
1

, censored.
 Hazard Function: The hazard function h(t)h(t)h(t) represents the instantaneous rate of occurrence
of the event at time ttt, given that the individual has survived up to time ttt.
 Survival Function: The survival function S(t)S(t)S(t) gives the probability that the event has not
occurred by time ttt.
Cox Proportional Hazards Model:
 The Cox model assumes that the hazard function for an individual at time t is the product of a
baseline hazard function h0(t) and a function of the explanatory variables (covariates):

X1,X2,…,Xp are the covariates (features), and β1,β2,…,βp are the coefficients to be estimated.
The following program demonstrates the cox proportional model for telecom industry

import pandas as pd
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter, CoxPHFitter
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
df = pd.read_csv("telco_churn.csv").dropna() # Drop NaN values at the start
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})
features = ['MonthlyCharges', 'Contract', 'InternetService']
train, test = train_test_split(df[features + ['tenure', 'Churn']], test_size=0.2, random_state=42)
preprocessor = ColumnTransformer([
('num', StandardScaler(), ['MonthlyCharges']),
('cat', OneHotEncoder(drop='first', sparse_output=False), ['Contract', 'InternetService'])
])
X_train = preprocessor.fit_transform(train.drop(columns=['tenure', 'Churn']))
train_data = pd.DataFrame(X_train, columns=preprocessor.get_feature_names_out())
train_data[['tenure', 'Churn']] = train[['tenure', 'Churn']].values

2

, train_data.dropna(inplace=True) # Ensure no NaNs before fitting
KaplanMeierFitter().fit(train['tenure'], train['Churn']).plot_survival_function()
cph = CoxPHFitter().fit(train_data, duration_col='tenure', event_col='Churn')
print(f"\nModel Concordance Index: {cph.concordance_index_:.2f}")
for feature, exp_coef in zip(cph.summary.index, cph.summary["exp(coef)"]):
effect = "MORE" if exp_coef > 1 else "LESS"
print(f"Customers with higher '{feature}' values are {effect} likely to churn. (Factor: {exp_coef:.2f})")
plt.title("Kaplan-Meier Survival Curve")
plt.show()

5.3 CHURN PREDICTION TECHNIQUES.

Churn prediction is a critical task in various industries, especially in businesses where customer
retention is crucial, such as telecom, finance, and subscription-based services. Several techniques can be
used for churn prediction, ranging from traditional statistical methods to advanced machine learning
models. Below is an overview of the most commonly used techniques:
1. Logistic Regression
Description: Logistic regression is a simple and interpretable method used for binary classification
problems like churn prediction. It models the probability that a customer will churn based on various
input features.
How It Works: The model predicts the probability of churn using a sigmoid function applied to a linear
combination of input features.
Advantages: Easy to interpret, fast to train, and works well with linearly separable data.
Disadvantages: May not perform well with complex relationships between features.
2. Decision Trees
Description: Decision trees split the data into subsets based on the most significant features, creating a
tree-like model of decisions.
How It Works: The model recursively splits the data based on the feature that provides the best split
(usually measured by metrics like Gini impurity or information gain).
Advantages: Easy to interpret, handles non-linear relationships well, and can manage both numerical
3

Report Copyright Violation

Written for

Institution: Chennai Institute Of Technology
Course: AM3403 (AM3403)

All documents for this subject (5)

Document information

Uploaded on: April 26, 2025
Number of pages: 17
Written in: 2024/2025
Type: Class notes
Professor(s): Dr.gowri
Contains: All classes

Subjects

stemming chunking
sales funnel
part of speech tagging
sentiment analysis
chi square
n grams
churn analysis and prediction using cox proportion
credit card fraud analysis

$8.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

gvarshini

Get to know the seller

gvarshini CHENNAI INSTITUTE OF TECHNOLOGY

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller gvarshini. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 49246 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

APPLICATION PROBLEMS-MACHINE LEARNING

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?