CRISP-DM project
Study guide, definitions & notes
With funding from the European Commission, the CRISP-DM (CRoss-Industry
Standard Process for Data Mining) project developed a data-mining process model.
Starting from the knowledge discovery processes used in early data-mining
projects and responding directly to user requirements, this project defined and
validated a data-mining process that is applicable in diverse industry sectors.
CRISP-DM 1.0 (1999) is a methodology that aims to make data mining and
predictive analytics projects more efficient, better organized, more reproducible,
more manageable, and more likely to yield business success.75 Partners of the
CRISP-DM Consortium include NCR Systems Engineering Copenhagen (the US
and Denmark), DaimlerChrysler AG (Germany), SPSS, Inc. (the US) and OHRA
Verzekeringen en Bank Groep B.V (the Netherlands). However, over 300
organizations have contributed to the process model and more than 200
organizations worldwide (including, e.g., AirTouch, DeloitteTouche, Capgemini,
and Lloyds Bank) are members of the CRISP-DM Special Interest Group (SIG).
The main purpose of creating a standard data-mining process is to make the
process reliable and repeatable even for companies with a little data mining
background.
Business understanding: This initial phase focuses on understanding the project
objectives and requirements from a business perspective, and then converting this
knowledge into a data-mining problem definition and a preliminary plan designed
to achieve the objectives.
Data understanding: The data-understanding phase starts with an initial data
collection and proceeds with activities to get familiar with the data, identify data
quality problems, discover first insights into the data or detect interesting subsets to
form hypotheses for hidden information.
Data preparation: The data preparation phase covers all activities to construct the
final dataset (data that will be fed into the modeling tool(s)) from the initial raw
data. Data preparation tasks are likely to be performed multiple times, and not in
any prescribed order. Tasks include table, record, and attribute selection as well as
transformation and cleaning of data for modeling tools.
Study guide, definitions & notes
With funding from the European Commission, the CRISP-DM (CRoss-Industry
Standard Process for Data Mining) project developed a data-mining process model.
Starting from the knowledge discovery processes used in early data-mining
projects and responding directly to user requirements, this project defined and
validated a data-mining process that is applicable in diverse industry sectors.
CRISP-DM 1.0 (1999) is a methodology that aims to make data mining and
predictive analytics projects more efficient, better organized, more reproducible,
more manageable, and more likely to yield business success.75 Partners of the
CRISP-DM Consortium include NCR Systems Engineering Copenhagen (the US
and Denmark), DaimlerChrysler AG (Germany), SPSS, Inc. (the US) and OHRA
Verzekeringen en Bank Groep B.V (the Netherlands). However, over 300
organizations have contributed to the process model and more than 200
organizations worldwide (including, e.g., AirTouch, DeloitteTouche, Capgemini,
and Lloyds Bank) are members of the CRISP-DM Special Interest Group (SIG).
The main purpose of creating a standard data-mining process is to make the
process reliable and repeatable even for companies with a little data mining
background.
Business understanding: This initial phase focuses on understanding the project
objectives and requirements from a business perspective, and then converting this
knowledge into a data-mining problem definition and a preliminary plan designed
to achieve the objectives.
Data understanding: The data-understanding phase starts with an initial data
collection and proceeds with activities to get familiar with the data, identify data
quality problems, discover first insights into the data or detect interesting subsets to
form hypotheses for hidden information.
Data preparation: The data preparation phase covers all activities to construct the
final dataset (data that will be fed into the modeling tool(s)) from the initial raw
data. Data preparation tasks are likely to be performed multiple times, and not in
any prescribed order. Tasks include table, record, and attribute selection as well as
transformation and cleaning of data for modeling tools.