John wiley sons data mining techniques for marketing sales_19 potx - Pdf 14

470643 c17.qxd 3/8/04 11:29 AM Page 584
Table 17.6
Potential of Six Credit Card Customers
CREDIT RATE INTEREST TRANSACTION
POTENTIAL ACTUAL POTENTIAL
LIMIT
REVENUE
Customer 1 $500 14.9% $6.21 $5.00
$6.21 $5.47 88%
Customer 2 $5,000 4.9% $20.42 $5
0.00
$50.00 $18.38 37%
Customer 3 $6,000 11.9% $59.50 $60.00
$60.00 $33.73 56%
Customer 4 $10,000 14.9% $124.17 $1
00.00
$124.17 $25.00 20%
Customer 5 $8,000 12.9% $86.00 $80.00
$86.00 $65.00 76%
Customer 6 $5,000 17.9% $74.58 $5
0.00
$74.58 $67.13 90%
584 Chapter 17
470643 c17.qxd 3/8/04 11:29 AM Page 585
Preparing Data for Mining 585
There is another aspect of comparing actual revenue to potential revenue;
it normalizes the data. Without this normalization, wealthier customers appear
to have the most potential, although this potential is not fully utilized. So, the
customer with a $10,000 credit line is far from meeting his or her potential. In
fact, it is Customer 1, with the smallest credit line, who comes closest to achiev-
ing his or her potential value. Such a definition of value eliminates the wealth

A typical revolver only pays
on or near the minimum
$1,500
Payment
Minimum
balance every month.
$1,000
This revolver has maintained
an average balance of
$500
$1,070, with new charges of
about $200 dollars.
$0
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Payment
Minimum
larger than the minimum
payment, except in months
$1,500

$500
This convenience user has
an average balance of $524.
$0
$2,000
A typical convenience user
uses the card when
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Figure 17.16 These three charts show actual and minimum payments for three credit card
customers with a credit line of $2,000.
Manually looking at shapes is an inefficient way to categorize the behavior
of several million customers. Shape is a vague, qualitative notion. What is
needed is a score. One way to create a score is by looking at the area between
the “minimum payment” curve and the actual “payment” curve. For our pur-
poses, the area is the sum of the differences between the payment and the min-
imum. For the revolver, this sum is $112; for the convenience user, $559.10; and
for the transactor, a whopping $13,178.90.
470643 c17.qxd 3/8/04 11:29 AM Page 587
Preparing Data for Mining 587

a customer. All of these are based on the important variables relevant to the
customer and measurements taken over several months. Different measures
are more valuable for identifying various aspects of behavior.
The Ideal Convenience User
The measures in the previous section focused on the extremes of customer
behavior, as typified by revolvers and transactors. Convenience users were
just assumed to be somewhere in the middle. Is there a way to develop a score
that is optimized for the ideal convenience user?
470643 c17.qxd 3/8/04 11:29 AM Page 588
588 Chapter 17
120
100
80
60
40
20
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
CONVENIENCE
Payment as Multiple of Min Payment
TRANSACTOR
REVOLVER
Figure 17.17 Comparing the amount paid as a multiple of the minimum payment shows
distinct curves for transactors, revolvers, and convenience users.
First, let’s define the ideal convenience user. This is someone who, twice a
year, charges up to his or her credit line and then pays the balance off over 4
months. There are few, if any, additional charges during the other 10 months of
the year. Table 17.7 illustrates the monthly balances for two convenience users
as a ratio of their credit lines.
This table also illustrates one of the main challenges in the definition of con-

sures how each customer’s behavior compares to the ideal.
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 23 4 56 7 8 9101112
Ratio of Balance to Credit Line
IDEAL CONVENIENCE
CONVENIENCE 2
CONVENIENCE 1
IDEAL TRANSACTOR
Month (Sorted from Highest Balance to Lowest)
Figure 17.18 Comparison of two convenience users to the ideal, by sorting the months by
the balance ratio.
470643 c17.qxd 3/8/04 11:29 AM Page 590
590 Chapter 17
The Dark Side of Data
Working with data is a critical part of the data mining process. What does the
data mean? There are many ways to answer this question—through written
documents, in database schemas, in file layouts, through metadata systems,
and, not least, via the database administrators and systems analysis who know
what is really going on. No matter how good the documentation, the real story
lies in the data.

whether the failure to match is at the record level or the field level.
470643 c17.qxd 3/8/04 11:29 AM Page 591
Preparing Data for Mining 591
Because customer signatures use so much aggregated data, they often con-
tain “0” for various features. So, missing data in the customer signatures is not
the most significant issue for the algorithms. However, this can be taken too
far. Consider a customer signature that has 12 months of billing data. Cus-
tomers who started in the past 12 months have missing data for the earlier
months. In this case, replacing the missing data with some arbitrary value is
not a good idea. The best thing is to split the model set into two pieces—those
with 12 months of tenure and those who are more recent.
When missing data is a problem, it is important to find its cause. For
instance, one database we encountered had missing data for customers’ start
dates. With further investigation, it turned out that these were all customers
who had started and ended their relationship prior to March 1999. Subsequent
use of this data source focused on either customers who started after this date
or who were active on this date. In another case, a transaction table was miss-
ing a particular type of transaction before a certain date. During the creation of
the data warehouse, different transactions were implemented at different
times. Only carefully looking at crosstabulations of transaction types by time
made it clear that one type was implemented much later than the rest.
In another case, the missing data in a data warehouse was just that—
missing because the data warehouse had failed to load it properly. When
there is such a clear cause, the database should be fixed, especially since mis-
leading data is worse than no data at all.
One approach to dealing with missing data is to try to fill in the values—for
example, with the average value or the most common value. Either of these
substitutions changes the distribution of the variable and may lead to poor
models. A more clever variation of this approach is to try to calculate the value
based on other fields, using a technique such as regression or neural networks.

tomers with the exact same birthday.
The attempt to collect accurate data often runs into conflict with efforts to
manage the business. Many stores offer discounts to customers who have
membership cards. What happens when a customer does not have a card? The
business rules probably say “no discount.” What may really happen is that a
store employee may enter a default number, so that customer can still qualify.
This friendly gesture leads to certain member numbers appearing to have
exceptionally high transaction volumes.
One company found several customers in Elizabeth, NJ with the zip code
07209. Unfortunately, the zip code does not exist, which was discovered when
analyzing the data by zip code and appending zip code information. The error
had not been discovered earlier because the post office can often figure out
how to route incorrectly addressed mail. Such errors can be fixed by using
software or an outside service bureau to standardize the address data.
What looks like dirty data might actually provide insight into the business.
A telephone number, for instance, should consist only of numbers. The billing
system for one regional telephone company stored the number as a string (this
is quite common actually). The surprise was several hundred “telephone num-
bers” that included alphabetic characters. Several weeks (!) after being asked
about this, the systems group determined that these were essentially calling
card numbers, not attached to a telephone line, that were used only for third-
party billing services.
Another company used media codes to determine how customers were
acquired. So, media codes starting with “W” indicated that customers came
from the Web, “D” indicated response to direct mail, and so on. Additional
characters in the code distinguished between particular banner ads and par-
ticular email campaigns. When looking at the data, it was surprising to dis-
cover Web customers starting as early as the 1980s. No, these were not
bleeding-edge customers. It turned out that the coding scheme for media
codes was created in October 1997. Earlier codes were essentially gibberish.

Team-Fly
®

470643 c17.qxd 3/8/04 11:29 AM Page 593
Preparing Data for Mining 593

should be close to each other. However, there are always exceptions. The best
solution is to include all these dates, since they can all shed light on the busi-
ness. For instance, when are there long delays between the time a customer
signs up for the service and the time the service actually becomes effective?
Is this related to churn? A more common solution is to choose one of the dates
and call that the start date.
Another reason has to do with the good intentions of systems developers.
For instance, a decision-support system might keep a current snapshot of cus-
tomers, including a code for why the customer stopped. One code value might
indicate that some customers stopped for nonpayment; other code values
might represent other reasons—going to a competitor, not liking the service,
470643 c17.qxd 3/8/04 11:29 AM Page 594
594 Chapter 17
and so on. However, it is not uncommon for customers who have stopped vol-
untarily to not pay their last bill. In this data source, the actual stop code was
simply overwritten. The longer ago that a customer stopped, greater the
chance that the original stop reason was subsequently overwritten when the
company determines—at a later time—that a balance is owed. The problem
here is that one field is being used for two different things—the stop reason
and nonpayment information. This is an example of poor data modeling that
comes back to bite the analysts.
A problem that arises when using data warehouses involves the distinction
between the initial loads and subsequent incremental loads. Often, the initial
load is not as rich in information, so there are gaps going back in time. For
instance, the start date may be correct, but there is no product or billing plan
for that date. Every source of data has its peculiarities; the best advice is to get
to know the data and ask lots of questions.
Computational Issues
Creating useful customer signatures requires considerable computational
power. Fortunately, computers are up to the task. The question is more which

These queries are also killer queries, although databases are becoming increas-
ingly powerful and able to handle them. On the plus side, databases do take
advantage of parallel hardware, a big advantage for transforming data.
Extraction Tools
Extraction tools (often called ETL tools for extract-transform-load) are gener-
ally used for loading data warehouses and data marts. In most companies,
business users do not have ready access to these tools, and most of their func-
tionality can be found in other tools. Extraction tools are generally on the
expensive side because they are intended for large data warehousing projects.
In Mastering Data Mining (Wiley, 1999), we discuss a case study using a suite
of tools from Ab Initio, Inc., a company that specializes in parallel data trans-
formation software. This case study illustrates the power of such software
when working on very large volumes of data, something to consider in an
environment where such software might be available.
Special-Purpose Code
Coding is the tried-and-true way of implementing data transformations. The
choice of tool is really based on what the programmer is most familiar with
and what tools are available. For the transformations needed for a customer
signature, the main statistical tools all have sufficient functionality.
One downside of using special-purpose code is that it adds an extra layer to
the data transformation process. Data must still be extracted from source systems
(one possible source of error) and then passed through code (another source of
error). It is a good idea to write code that is well documented and reusable.
Data Mining Tools
Increasingly, data mining tools have the ability to transform data within the
tool. Most tools have the ability to extract features from fields and to combine
multiple fields in a row, although the support for non-numeric data types
470643 c17.qxd 3/8/04 11:29 AM Page 596
596 Chapter 17
varies from tool to tool and release to release. Some tools also support sum-

wrong for some reason.
Many data mining efforts have to use data that is less than perfect. As with
old cars that spew blue smoke but still manage to chug along the street, these
efforts produce results that are good enough. Like the vagabonds in Samuel
Beckett’s play Waiting for Godot, we can choose to wait until perfection arrives.
That is the path to doing nothing; the better choice is to plow ahead, to learn,
and to make incremental progress.
470643 c18.qxd 3/8/04 11:31 AM Page 597
18
Putting Data Mining to Work
CHAPTER
You’ve reached the last chapter of this book, and you are ready to start putting
data mining to work for your company. You are convinced that when data
mining has been woven into the fabric of your organization, the whole enter-
prise will benefit from an increased understanding of its customers and mar-
ket, from better-focused marketing, from more-efficient utilization of sales
resources, and from more-responsive customer support. You also know that
there is a big difference between understanding something you have read in a
book and actually putting it into practice. This chapter is about how to bridge
that gap.
At Data Miners, Inc., the consulting company founded by the authors of this
book, we have helped many companies through their first data mining pro-
jects. Although this chapter focuses on a company’s first foray into data min-
ing, it is really about how to increase the probability of success for any data
mining project, whether the first or the fiftieth. It brings together ideas from
earlier chapters and applies them to the design of a data mining pilot project.
The chapter begins with general advice about integrating data mining into the
enterprise. It then discusses how to select and implement a successful pilot
project. The chapter concludes with the story of one company’s initial data
mining effort and its success.

have in common with one another and locate new markets where simi-
lar customers can be found.
■■ Build a model to identify market research segments among the customers
in our corporate data warehouse, so we can target messages to the right
customers
■■ Forecast the expected level of debt collection for the next several
months, so we can manage to a plan.
These examples show the diversity of problems that data mining can
address. In each case, the data mining challenge is to find and analyze the
appropriate data to solve the business problem. However, this process starts
by choosing the right demonstration project in the first place.
470643 c18.qxd 3/8/04 11:31 AM Page 599
Putting Data Mining to Work 599
What to Expect from a Proof-of-Concept Project
When the proof-of-concept project is complete, the following are available:
■■ A prototype model development system (which might be outsourced or
might be the kernel of the production system)
■■ An evaluation of several data mining techniques and tools (unless the
choice of tool was foreordained)
■■ A plan for modifying business processes and systems to incorporate
data mining
■■ A description of the production data mining environment
■■ A business case for investing in data mining and customer analytics
Even when the decision has already been made to invest in data mining, the
proof-of-concept project is an important way to step through the virtuous
cycle of data mining for the first time. You should expect challenges and hic-
cups along the way, because such a project is touching several different parts
of the organization—both technical and operational—and needs them to work
together in perhaps unfamiliar ways.
Identifying a Proof-of-Concept Project

required. Where data is already being warehoused, study the data dictionaries
and database schemas. When the source systems are operational systems,
study the record layouts that will be supplying the data and get to know the
people who are familiar with how the systems process and store information.
As part of the proof-of-concept selection process, do some initial profiling of
the available records and fields to get a preliminary understanding of relation-
ships in the data and to get some early warnings of data problems that may
hinder the data mining process. This effort is likely to require some amount of
data cleansing, filtering, and transformation.
Once several candidate projects have been identified, evaluate them in
terms of the ability to act on the results, the usefulness of the potential results,
the availability of data, and the level of technical effort. One of the most impor-
tant questions to ask about each candidate project is “how will the results be
used?” As illustrated by the example in the sidebar “A Successful Proof of
Concept?” a common fate of data mining pilot projects is to be technically suc-
cessful but underappreciated because no one can figure out what to do with
the results.
There are certainly many examples of successful data mining projects that
originated in IT. Nevertheless, when the people conducting the data mining
are not located in marketing or some other group that communicates directly
with customers, sponsorship or at least input from such a group is important
for a successful project. Although data mining requires interaction with data-
bases and analytic software, it is not primarily an IT project and should rarely
be attempted in isolation from the owners of the business problem being
addressed.
A data mining pilot project may be based in any of several groups withinTIP
the company, but it must always include active participation from the group
that feels ownership of the business problem to be addressed.
Marketing campaigns make good proof-of-concept projects because in most
companies there is already a culture of measuring the results of such cam-

droves. However, nothing was done because the charter of the group
sponsoring the data mining project was to explore new technologies rather
There is another organizational challenge with these customers. As long as
they remain, the mismatched customers are quite profitable, paying for
expensive overcalls or on a too-expensive rate plan. Moving them to a rate plan
but also decrease profitability. Which is more important, churn or profitability?
Data mining often raises as many questions as it answers, and the answers to
some questions depend on business strategy more than on data mining results.
a demonstration project that goes beyond evaluating models to actually mea-
suring the results of a campaign based on the models. Where that is not possi-
ble, careful thought must be given to how to attach a dollar value to the results
of the demonstration project. In some cases, it is sufficient to test the new mod-
els derived from data mining against historical data.
Implementing the Proof-of-Concept Project
Once an appropriate business problem has been selected, the next step is to
identify and collect data that can be transformed into actionable information.
Data sources have already been identified as part of the process of selecting the
470643 c18.qxd 3/8/04 11:31 AM Page 602
602 Chapter 18
proof-of-concept project. The next step is to extract data from those sources
and transform it into customer signatures, as described in the previous chap-
ter. Designing a good customer signature is tricky the first few times. This is an
area where the help of experienced data miners can be valuable.
In addition to constructing the initial customer signature, there needs to be
a prototype data exploration and model development environment. This envi-
ronment could be provided by a software company or data mining consul-
tancy, or it can be constructed in-house as part of the pilot project. The data
mining environment is likely to consist of a data mining software suite
installed on a dedicated analytic workstation. The model development envi-
ronment should be rich enough to allow the testing of a variety of data mining

valuable in a company that already has a culture of doing such experiments.
Finally, use the results of modeling (whether from historical testing or an
actual experiment) to build a business case for integrating data mining into the
business operations on a permanent basis.
Sometimes, the result of the pilot project is insight into customers and the
market. In this case, success is determined more subjectively, by providing
insight to business people. Although this might seem the easier proof-of-concept
project, it is quite challenging to find results in a span of weeks that make a
favorable impression on business people with years of experience.
Many data mining proof-of-concept projects are not ambitious because they
are designed to assess the technology rather than the results of its application.
It is best when the link between better models and better business results is not
hypothetical, but is demonstrated by actual results. Statisticians and analysts
may be impressed by theoretical results; senior management is not.
A graph showing the lift in response rates achieved by a new model on a test
dataset is impressive; however, new customers gained because of the model
are even more impressive.
Measure the Results of the Actions
It is important to measure both the effectiveness of the data mining models
themselves and the actual impact on the business of the actions taken as a
result of the models’ predictions.
Lift is an appropriate way to measure the effectiveness of the models them-
selves. Lift measures the change in concentration of records of some particular
type (such as responders or defaulters) relative to model scores. To measure
the impact on the business requires more information. If the pilot project
builds a response model, keep track of the following costs and benefits:
■■ What is the fixed cost of setting up the campaign and the model that
supports it?
■■ What is the cost per recipient of making the offer?
■■ What is the cost per respondent of fulfilling the offer?

Randomly Selected Customers
Included in Campaign
High Model Score Customers
Excluded from Campaign
Randomly Selected Customers
Excluded from Campaign
Modeled & Included
(Group A)
Random & Included
(Group B)
Modeled & Excluded
(Group D)
Random & Excluded
(Group C)
How well does
message work
on modeled
customers?
How well does model work
for measuring propensity?
Figure 18.1 Tracking four different groups makes it possible to determine both the effect
of the campaign and the effect of the model.
470643 c18.qxd 3/8/04 11:31 AM Page 605
Putting Data Mining to Work 605
This latter situation does occur. One Canadian bank used a model to pick
customers who should be targeted with a direct mail campaign to open invest-
ment accounts. The people picked by the model did, in fact, open investment
accounts at a higher rate than other customers—whether or not they received
the promotional material. In this case there is a simple reason. The bank had
flooded its customers with messages about investment accounts—advertising,

606 Chapter 18
less likely to churn. The additional requirement to identify separate segments
of subscribers at risk and understand what motivates each group to leave sug-
gests the use of decision trees and clever derived variables.
Each leaf of the decision tree has a label, which in this case would be “not
likely to churn” or “likely to churn.” Each leaf in the tree has different propor-
tions of the target variables; this proportion of churners that can be used as a
churn score. Each leaf also has a set of rules describing who ends up there. With
skill and creativity, an analyst may be able to turn these mechanistic rules into
comprehensible reasons for leaving that, once understood, can be counteracted.
Decision trees often have more leaves than desired for the purpose of develop-
ing special offers and telemarketing scripts. To combine leaves, into larger
groups, take whole branches of the tree as the groups, rather than single leaves.
Note that our preference for decision-tree methods in this case stems from
the desire to understand the reasons for attrition and our desire to treat sub-
groups differentially. If the goal were simply to do the best possible job of pre-
dicting the subscribers at risk, without worrying about the reasons, we might
select a different approach. Different business goals suggest different data
mining techniques. If the goal were to estimate next month’s minutes of use for
each subscriber, neural networks or regression would be better choices. If the
goal were to find naturally occurring customer segments an undirected clus-
tering technique or profiling and hypothesis testing would be appropriate.
Determine the Relevant Characteristics of the Data
Once the data mining tasks have been identified and used to narrow the range
of data mining methods under consideration, the characteristics of the avail-
able data can help to refine the selection further. In general terms, the goal is to
select the data mining technique that minimizes the number and difficulty of
the data transformations that must be performed in order to coax good results
from the data.
As discussed in the previous chapter, some amount of data transformation

contributes the most information at each node and bases the next segment of
the rule on that field alone. Dozens or hundreds of other fields can come along
for the ride, but won’t be represented in the final rules unless they contribute
to the solution.
TIP When faced with a large number of fields for a directed data mining
problem, it is a good idea to start by building a decision tree, even if the final
model is to be built using a different technique. The decision tree will identify a
good subset of the fields to use as input to a another technique that might be
swamped by the original set of input variables.
Free-Form Text
Most data mining techniques are incapable of directly handling free-form text.
But clearly, text fields often contain extremely valuable information. When
analyzing warranty claims submitted to an engine manufacturer by indepen-
dent dealers, the mechanic’s free-form notes explaining what went wrong and
what was done to fix the problem are at least as valuable as the fixed fields that
show the part numbers and hours of labor used.
One data mining technique that can deal with free text is memory-based
reasoning, one of the nearest neighbor methods discussed in Chapter 8. Recall
that memory-based reasoning is based on the ability to measure the distance
470643 c18.qxd 3/8/04 11:31 AM Page 608
608 Chapter 18
from one record to all the other records in a database in order to form a neigh-
borhood of similar records. Often, finding an appropriate distance metric is a
stumbling block that makes it hard to apply the technique, but researchers in
the field of information retrieval have come up with good measures of the dis-
tance between two blocks of text. These measurements are based on the over-
lap in vocabulary between the documents, especially of uncommon words and
proper nouns. The ability of Web search engines to find appropriate articles is
one familiar example of text mining.
As described in Chapter 8, memory-based reasoning on free-form text has

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

John wiley sons data mining techniques for marketing sales_19 potx - Pdf 14

Tài liệu, ebook tham khảo khác

Học thêm