In approaching the power company case study, I find it best to break the problem into three more
discrete and manageable parts. For each of these parts we can then better define the specific data
needs, model usage, and results for each individual part. Then, it is possible bring these individually
addressed issues together to create a cohesive analysis for the case. The three primary addressable
parts are the following:
1) Classifying a non-paying household that never intends to pay
2) Predicting benefits and cost of shutoffs
3) Optimizing the shutoff process within the company’s operational capacity
Part 1: Classification of household
In addressing this portion of the case, it is vastly important to have a broad database of
customer/household records, information, and usage within legal boundaries. If such a centralized
source of data has not yet been established within the company the construction of a customer database
and a system to collect and clean the data would be priority one. Given that such a database is in place
we would begin addressing our classification model by first aggregating a dataset of features that could
be useful in identifying separability between classes. Some examples of features that would likely aid in
model development include the following:
- Unique customer ID (numeric)
- Address (tuple; latitude, longitude) *use latitude/longitude for python Kepler mapping library*
- Account Status (binary; 1 = non-current account; 0 = current account)
- Cumulative Months Served (numeric)
- Cumulative Amount Past Due (numeric)
- Days Since Last Payment (numeric)
- Credit Score (numeric)
- Residential Address (binary; 1 = household/residential address, 0 = commercial address)
- 12 Month Usage History kWH (numeric time series) (non-customer/missing = NaN)
- 12 Month Payment History (numeric time series) (non-customer/missing = NaN)
- 12 Month On-Peak % Usage (numeric time series) (non-customer/missing = NaN)
- Number of Residents (numeric)
- Economic Stress Indicator (binary; 0 = non-contractionary, 1 = contractionary)
- Labeled Historical Classes
This list of variables (given we don’t have actual data) is just a back of envelope list of factors that seem
analytically important from a surface level at classifying customers. There are likely factors included
above that would yield insignificant coefficients or would be removed via feature shrinkage techniques. I
also find it important to split the dataset before moving onto classification via the binary variable current
vs. non-current and days since last payment greater than 30. The reason I find it important to do this is
to simplify our classification task. There are three groups:
This study source was downloaded by 100000850872992 from CourseHero.com on 04-10-2023 22:48:16 GMT -05:00
https://www.coursehero.com/file/63361583/POWER-CASE-SOLNdocx/