Problem 1
You are hired by Blue Moon Consulting (BMC) to conduct a data mining project to improve the
targeting of a new financial planning service. The service has been quite successful so far, being
marketed over the last 6 months only via a very inexpensive word-of-mouth campaign, and BMC
has already garnered a large customer base without any targeting. However, the CIO of BMC
believes that very accurate targeting might cost-effectively expand your audience to consumers
that word-of-mouth would not reach.
You are given an abridged version of BMC’s data science proposal (found below). Identify 3
things that you believe are weaknesses/flaws in their plan and explain why (3-5 sentences each).
I encourage you to refer to the questions associated with the CRISP-DM process and the
Appendices in the back of the book (attached as well) if you are struggling to come up feedback.
“We will build a logistic regression (LR) model to predict service uptake for a consumer using
data on BMC’s existing customers, including their demographics and their past usage of the
service. We believe that logistic regression is the best choice of method because it is a tried-and-
true statistical technique, and we can check if the results make sense. If they do make sense, then
we can have confidence that the model will be accurate in predicting service uptake. We will
then apply the model to BMC’s large database of existing customers and target those whom the
LR model predicts to be the most likely to use the service.”
Problem 2
In a departure from your usual work, you are contracted by non-other than Disney to investigate
fan-inspired accusations that Marvel superheroes and stories are becoming too “predictable.”
Specifically, that “good” and “bad” heroes share too many similar traits and that more creative
diversity is needed to keep audiences engaged.
You are provided access to a data set that contains a few attributes on a selection of the most
popular superheroes within the Marvel universe. The target variable (alignment) indicates
whether the hero is considered “good” or “bad” in the Marvel universe. There are also (when
applicable) the following attributes on each hero: gender, race (Human/Non-Human), eye color,
hair color, height (in inches), and weight (in pounds).
Run a logistic regression to predict the probability that a hero’s alignment is “good” or “bad.”
Here is the link for the model:
https://bigml.com/shared/logisticregression/gq1lboPQYVy39YYmKm3EMVrrzmH
You need to use predict from the right upper corner, see screenshot below
You are hired by Blue Moon Consulting (BMC) to conduct a data mining project to improve the
targeting of a new financial planning service. The service has been quite successful so far, being
marketed over the last 6 months only via a very inexpensive word-of-mouth campaign, and BMC
has already garnered a large customer base without any targeting. However, the CIO of BMC
believes that very accurate targeting might cost-effectively expand your audience to consumers
that word-of-mouth would not reach.
You are given an abridged version of BMC’s data science proposal (found below). Identify 3
things that you believe are weaknesses/flaws in their plan and explain why (3-5 sentences each).
I encourage you to refer to the questions associated with the CRISP-DM process and the
Appendices in the back of the book (attached as well) if you are struggling to come up feedback.
“We will build a logistic regression (LR) model to predict service uptake for a consumer using
data on BMC’s existing customers, including their demographics and their past usage of the
service. We believe that logistic regression is the best choice of method because it is a tried-and-
true statistical technique, and we can check if the results make sense. If they do make sense, then
we can have confidence that the model will be accurate in predicting service uptake. We will
then apply the model to BMC’s large database of existing customers and target those whom the
LR model predicts to be the most likely to use the service.”
Problem 2
In a departure from your usual work, you are contracted by non-other than Disney to investigate
fan-inspired accusations that Marvel superheroes and stories are becoming too “predictable.”
Specifically, that “good” and “bad” heroes share too many similar traits and that more creative
diversity is needed to keep audiences engaged.
You are provided access to a data set that contains a few attributes on a selection of the most
popular superheroes within the Marvel universe. The target variable (alignment) indicates
whether the hero is considered “good” or “bad” in the Marvel universe. There are also (when
applicable) the following attributes on each hero: gender, race (Human/Non-Human), eye color,
hair color, height (in inches), and weight (in pounds).
Run a logistic regression to predict the probability that a hero’s alignment is “good” or “bad.”
Here is the link for the model:
https://bigml.com/shared/logisticregression/gq1lboPQYVy39YYmKm3EMVrrzmH
You need to use predict from the right upper corner, see screenshot below