Appendix A. Proposal Review Guide
Effective data analytic thinking should allow you to assess potential data mining projects systematically. The material in this book
should give you the necessary background to assess proposed data mining projects, and to uncover potential flaws in proposals. This
skill can be applied both as a self-assessment for your own proposals and as an aid in evaluating proposals from internal data science
teams or external consultants.
What follows contains a set of questions that one should have in mind when considering a data mining project. The questions are
framed by the data mining process discussed in detail in Chapter 2, and used as a conceptual framework throughout the book. After
reading this book, you should be able to apply these conceptually to a new business problem. The list that follows is not meant to be
exhaustive (in general, the book isn’t meant to be exhaustive). However, the list contains a selection of some of the most important
questions to ask.
Throughout the book we have concentrated on data science projects where the focus is to mine some regularities, patterns, or models
from the data. The proposal review guide reflects this. There may be data science projects in an organization where these regularities are
not so explicitly defined. For example, many data visualization projects initially do not have crisply defined objectives for modeling.
Nevertheless, the data mining process can help to structure data-analytic thinking about such projects — they simply resemble
unsupervised data mining more than supervised data mining.
, Business and Data Understanding
What exactly is the business problem to be solved?
Is the data science solution formulated appropriately to solve this business problem? NB: sometimes we have to make judicious
approximations.
What business entity does an instance/example correspond to?
Is the problem a supervised or unsupervised problem?
If supervised,
Is a target variable defined?
If so, is it defined precisely?
Think about the values it can take.
Are the attributes defined precisely?
Think about the values they can take.
For supervised problems: will modeling this target variable actually improve the stated business problem? An important
subproblem? If the latter, is the rest of the business problem addressed?
Does framing the problem in terms of expected value help to structure the subtasks that need to be solved?
If unsupervised, is there an “exploratory data analysis” path well defined? (That is, where is the analysis going?)
Effective data analytic thinking should allow you to assess potential data mining projects systematically. The material in this book
should give you the necessary background to assess proposed data mining projects, and to uncover potential flaws in proposals. This
skill can be applied both as a self-assessment for your own proposals and as an aid in evaluating proposals from internal data science
teams or external consultants.
What follows contains a set of questions that one should have in mind when considering a data mining project. The questions are
framed by the data mining process discussed in detail in Chapter 2, and used as a conceptual framework throughout the book. After
reading this book, you should be able to apply these conceptually to a new business problem. The list that follows is not meant to be
exhaustive (in general, the book isn’t meant to be exhaustive). However, the list contains a selection of some of the most important
questions to ask.
Throughout the book we have concentrated on data science projects where the focus is to mine some regularities, patterns, or models
from the data. The proposal review guide reflects this. There may be data science projects in an organization where these regularities are
not so explicitly defined. For example, many data visualization projects initially do not have crisply defined objectives for modeling.
Nevertheless, the data mining process can help to structure data-analytic thinking about such projects — they simply resemble
unsupervised data mining more than supervised data mining.
, Business and Data Understanding
What exactly is the business problem to be solved?
Is the data science solution formulated appropriately to solve this business problem? NB: sometimes we have to make judicious
approximations.
What business entity does an instance/example correspond to?
Is the problem a supervised or unsupervised problem?
If supervised,
Is a target variable defined?
If so, is it defined precisely?
Think about the values it can take.
Are the attributes defined precisely?
Think about the values they can take.
For supervised problems: will modeling this target variable actually improve the stated business problem? An important
subproblem? If the latter, is the rest of the business problem addressed?
Does framing the problem in terms of expected value help to structure the subtasks that need to be solved?
If unsupervised, is there an “exploratory data analysis” path well defined? (That is, where is the analysis going?)