Data Mining and Knowledge Discovery Handbook, 2 Edition part 7 - Pdf 16

40 Jerzy W. Grzymala-Busse and Witold J. Grzymala-Busse
3.2.7 Replacing Missing Attribute Values by the Attribute Mean Restricted to
a Concept
Similarly as the previous method, this method is restricted to numerical attributes. A
missing attribute value of a numerical attribute is replaced by the arithmetic mean of
all known values of the attribute restricted to the concept. For example from Table
3.7, case 3 has missing attribute value for Temperature. Case 3 belong to the concept
{3, 5, 6, 7}. The arithmetic mean of known values of Temperature restricted to the
concept, i.e., 99.8, 96.4, and 96.6 is 97.6, so the missing attribute value is replaced
by 97.6. On the other hand, case 8 belongs to the concept {1, 2, 4, 8}, the arith-
metic mean of 100.2, 102.6, and 99.6 is 100.8, so the missing attribute value for case
8 should be replaced by 100.8. The table with missing attribute values replaced by
the mean restricted to the concept is presented in Table 3.9. For symbolic attributes
Headache and Nausea, missing attribute values were replaced using the most com-
mon value of the attribute restricted to the concept.
Table 3.9. Data set in which missing attribute values are replaced by the attribute mean and
the most common value, both restricted to the concept
Case Attributes Decision
Temperature Headache Nausea Flu
1 100.2 yes no yes
2 102.6 yes yes yes
3 97.6 no no no
4 99.6 yes yes yes
5 99.8 no yes no
6 96.4 yes no no
7 96.6 no yes no
8 100.8 yes yes yes
3.2.8 Global Closest Fit
The global closes ﬁt method (Grzymala-Busse et al., 2002) is based on replacing a
missing attribute value by the known value in another case that resembles as much
as possible the case with the missing attribute value. In searching for the closest ﬁt

i
= y
i
,
1ifx and y are symbolic and x
i
= y
i
,
or x
i
=?ory
i
=?,
|x
i
−y
i
|
r
if x
i
and y
i
are numbers and x
i
= y
i
,
where r is the difference between the maximum and minimum of the known values

merged into the same table. In our example from Table 3.7, the ﬁnal, merged table is
presented in Table 3.14.
42 Jerzy W. Grzymala-Busse and Witold J. Grzymala-Busse
Table 3.11. Data Set Processed by the Global Closest Fit Method.
Case Attributes Decision
Temperature Headache Nausea Flu
1 100.2 yes no yes
2 102.6 yes yes yes
3 100.2 no no no
4 99.6 yes yes yes
5 99.8 yes yes no
6 96.4 yes no no
7 96.6 no yes no
8 102.6 yes yes yes
Table 3.12. Dataset Restricted to the Concept {1, 2, 4, 8}.
Case Attributes Decision
Temperature Headache Nausea Flu
1 100.2 ? no yes
2 102.6 yes yes yes
4 99.6 yes yes yes
8 ? yes ? yes
Table 3.13. Dataset Restricted to the Concept {3, 5, 6, 7}.
Case Attributes Decision
Temperature Headache Nausea Flu
3 ? no no no
5 99.8 ? yes no
6 96.4 yes no no
7 96.6 no yes no
3.2.10 Other Methods
There is a number of other methods to handle missing attribute values. One of them is

Learning missing attribute values from summary constraints was reported in (Wu
and Barbara, 2002,Wu and Barbara, 2002). Yet another approach to handling missing
attribute values was presented in (Greco et al., 2000).
There is a number of statistical methods of handling missing attribute values,
usually known under the name of imputation (Allison, 2002,Little and Rubin, 2002,
Schikuta, 1996), such as maximum likelihood and the EM algorithm. Recently mul-
tiple imputation gained popularity. It is a Monte Carlo method of handling missing
attribute values in which missing attribute values are replaced by many plausible
values, then many complete data sets are analyzed and the results are combined.
3.3 Parallel Methods
In this section we will concentrate on handling missing attribute values in parallel
with rule induction. We will distinguish two types of missing attribute values: lost
and do not care conditions (for respective interpretation, see Introduction). First we
will introduce some useful ideas, such as blocks of attribute-value pairs, character-
istic sets, characteristic relations, lower and upper approximations. Later we will
explain how to induce rules using the same blocks of attribute-value pairs that were
used to compute lower and upper approximations. Input data sets are not prepro-
cessed the same way as in sequential methods, instead, the rule learning algorithm is
modiﬁed to learn rules directly from the original, incomplete data sets.
44 Jerzy W. Grzymala-Busse and Witold J. Grzymala-Busse
3.3.1 Blocks of Attribute-Value Pairs and Characteristic Sets
In this subsection we will quote some basic ideas of the rough set theory. Any deci-
sion table deﬁnes a function
ρ
that maps the direct product of the set U of all cases
and the set A of all attributes into the set of all values. For example, in Table 3.1,
ρ
(1,Temperature)=high. In this section we will assume that all missing attribute
values are denoted either by ”?” or by ”*”, lost values will be denoted by ”?”, ”do not
care” conditions will be denoted by ”*”. Thus, we assume that all missing attribute

[(Temperature, high)] = {1, 4, 5},
[(Temperature, very
high)] = {2},
[(Temperature, normal)] = {6, 7},
[(Headache, yes)] = {2, 4, 6, 8},
[(Headache, no)] = {3, 7},
[(Nausea, no)] = {1, 3, 6},
[(Nausea, yes)] = {2, 4, 5, 7},
and for Table 3.15
3 Handling Missing Attribute Values 45
[(Temperature, high)] = {1, 3, 4, 5, 8},
[(Temperature, very
high)] = {2, 3, 8},
[(Temperature, normal)] = {3, 6, 7, 8},
[(Headache, yes)] = {1, 2, 4, 5, 6, 8},
[(Headache, no)] = {1, 3, 5, 7},
[(Nausea, no)] = {1, 3, 6, 8},
[(Nausea, yes)] = {2, 4, 5, 7, 8}.
The characteristic set K
B
(x) is the intersection of blocks of attribute-value pairs
(a,v) for all attributes a from B for which
ρ
(x,a) is known and
ρ
(x,a)=v. For Table
3.1 and B = A,
K
A
(1)={1,4,5}∩{1,3,6} = {1},

(3)={1,3,5,7}∩{1,3, 6, 8} = {1,3},
K
A
(4)={1,3,4,5,8}∩{1, 2, 4, 5,6, 8}∩{2,4,5,7,8}= {4,5,8},
K
A
(5)={1,3,4,5,8}∩{2, 4, 5, 7,8} = {4,5,8},
K
A
(6)={3,6,7,8}∩{1,2, 4, 5, 6,8}∩{1, 3,6,8}= {6,8},
K
A
(7)={3,6,7,8}∩{1,3, 5, 7}∩{2, 4,5, 7,8}= {7}, and
K
A
(8)={1,2,4,5,6, 8}.
The characteristic set K
B
(x) may be interpreted as the smallest set of cases that
are indistinguishable from x using all attributes from B, using a given interpre-
tation of missing attribute values. Thus, K
A
(x) is the set of all cases that cannot
be distinguished from x using all attributes. For further properties of characteristic
sets see (Grzymala-Busse, 2003,Grzymala-Busse, 2004A, Grzymala-Busse, 2004B,
Grzymala-Busse, 2004C). Incomplete decision tables in which all attribute values
are lost, from the viewpoint of rough set theory, were studied for the ﬁrst time
in (Grzymala-Busse and Wang, 1997), where two algorithms for rule induction, mod-
iﬁed to handle lost attribute values, were presented. This approach was studied later
in (Stefanowski, 2001, Stefanowski and Tsoukias, 1999, Stefanowski and Tsoukias,

B
(x)|x ∈X, K
B
(x) ∩X = /0} = ∪{K
B
(x)|x ∈X}.
For the decision table presented in Table 3.1, the concept A-lower and A-upper
approximations are
A
{1,2,4,8} = {1, 2, 4},
A
{3,5,6,7} = {3, 6, 7},
A{1,2,4,8} = {1, 2, 4, 6,8},
A{3,5,6,7} = {3, 4, 5, 6,7},
and for the decision table from Table 3.15, the concept A-lower and A-upper approx-
imations are
A
{1,2,4,8} = {2, 8},
A
{3,5,6,7} = {7},
A{1,2,4,8} = {1, 2, 3, 4,5,6,8},
A{3,5,6,7} = {1, 3, 4, 5,6,7,8}.
3 Handling Missing Attribute Values 47
3.3.3 Rule Induction—MLEM2
The MLEM2 rule induction algorithm is a modiﬁed version of the algorithm LEM2,
see chapter 12.6 in this volume. Rules induced from the lower approximation of the
concept certainly describe the concept, so they are called certain. On the other hand,
rules induced from the upper approximation of the concept describe the concept only
possibly (or plausibly), so they are called possible (Grzymala-Busse, 1988). MLEM2
may induce both certain and possible rules from a decision table with some missing

1, 2, 2
(Headache, no) -> (Flu, no)
Rule sets induced from the decision table presented in Table 3.15 are:
certain rule set:
2, 2, 2
48 Jerzy W. Grzymala-Busse and Witold J. Grzymala-Busse
(Temperature, very high) & (Nausea, yes) -> (Flu, yes)
3, 1, 1
(Temperature, normal) & (Headache, no) & (Nausea, yes) -> (Flu, no)
and possible rule set:
1, 4, 6
(Headache, yes) -> (Flu, yes)
1, 2, 3
(Temperature, very
high) -> (Flu, yes)
1, 2, 5
(Temperature, high) -> (Flu, no)
1, 3, 4
(Temperature, normal) -> (Flu, no)
3.3.4 Other Approaches to Missing Attribute Values
Through this section we assumed that the incomplete decision tables may only con-
sist of lost values or do not care conditions. Note that the MLEM2 algorithm is able
to handle not only these two types of tables but also decision tables with a mixture
of these two cases, i.e., tables with some lost attribute values and with other missing
attribute values being do not care conditions. Furthermore, other interpretations of
missing attribute values are possible as well, see (Grzymala-Busse, 2003,Grzymala-
Busse, 2004A).
3.4 Conclusions
In general, there is no best, universal method of handling missing attribute values. On
the basis of existing research on comparison such methods

Grzymala-Busse J.W. Knowledge acquisition under uncertainty—A rough set approach.
Journal of Intelligent & Robotic Systems 1 (1988) 3–16.
Grzymala-Busse J.W. On the unknown attribute values in learning from examples. Proc. of
the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems,
Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artiﬁcial Intelligence,
vol. 542, Springer-Verlag, Berlin, Heidelberg, New York, 1991, 368–377.
Grzymala-Busse J.W. LERS—A system for learning from examples based on rough sets. In
Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets
Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London,
1992, 3–18.
Grzymala-Busse J.W. A new version of the rule induction system LERS, Fundamenta Infor-
maticae 31 (1997) 27–39.
Grzymala-Busse J.W. MLEM2: A new algorithm for rule induction from imperfect data.
Proceedings of the 9th International Conference on Information Processing and Man-
agement of Uncertainty in Knowledge-Based Systems, IPMU 2002, Annecy, France,
July 1–5, 2002, 243–250.
Grzymala-Busse J.W. Rough set strategies to data with missing attribute values. Proceedings
of the Workshop on Foundations and New Directions in Data Mining, associated with the
third IEEE International Conference on Data Mining, Melbourne, FL, November 1922,
2003, 56–63.
Grzymala-Busse J.W. Data with missing attribute values: Generalization of indiscernibility
relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer
Science Journal Subline, Springer-Verlag, vol. 1 78–95, 2004A.
Grzymala-Busse J.W. Characteristic relations for incomplete data: A generalization of the
indiscernibility relation. Proceedings of the RSCTC’2004, the Fourth International Con-
ference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 15,
2004. Lecture Notes in Artiﬁcial Intelligence 3066, Springer-Verlag pp.244–253, 2004B.
Grzymala-Busse J.W. Rough set approach to incomplete data. Proceedings of the
ICAISC’2004, the Seventh International Conference on Artiﬁcial Intelligence and Soft
Computing, Zakopane, Poland, June 711, 2004. Lecture Notes in Artiﬁcial Intelligence

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Data Mining and Knowledge Discovery Handbook, 2 Edition part 7 - Pdf 16

Tài liệu, ebook tham khảo khác

Học thêm