A Classification Algorithm Based on Association Rule Mining - pdf 21

Free download

Abstract—The main difference of the associative classification
algorithms is how to mine frequent item setsǃ analyze the
rules exported and use for classification. This paper presents
an associative classification algorithm based on Trie-tree that
named CARPT, which remove the frequent items that cannot
generate frequent rules directly by adding the count of class
labels. And we compress the storage of database using the two-
dimensional array of vertical data format, reduce the number
of scanning the database significantly, at the same time, it is
convenient to count the support of candidate sets. So, time and
space can be saved effectively. The experiment results show
that the algorithm is feasible and effective.
Keywords-data mining; associative classification;
classification algorithm; Trie-tree
I. INTRODUCTION
The classification rules mining and association rules
mining are two important areas of data mining
[1]
. The classic
associative classification algorithm based on class
association rules named CBA[2]
which integrated the above
two important mining technologies was presented by Bing
Liu of National University of Singapore in the knowledge
discovery in databases (KDD) International Conference held
in New York, 1998. Since then the prelude of the associative
classification was opened. Good classification accuracy of
associative classification algorithm has been confirmed in
the past ten years through a number of studies and
experiments.
The earliest Associative classification algorithm CBA
generates classification association rules using the iterative
method which similar to the Apriori
[3]
algorithm. In order to
generate and test longer item sets, the database needs to be
scanned many times, so the number of rules increases
exponentially, and more system resources are consumed. For
the rules which have the same support and confidence, CBA
algorithm sort and select them randomly, this reduces the
classification accuracy in many cases. The IRARC algorithm
presented in [4] introduces into the importance of properties
in the rough theory, improved the randomness of the CBA
algorithm in the choice of rules.
In [5], CMAR algorithm was presented based on the
CBA algorithm, using the deformation method of FP-Growth
[6]
. The CMAR algorithm finds frequent patterns and
generates classification association rules simultaneously,
uses χ 2
weights to measure the strength of the rules and
then classify a new instance, overcomes the bias of using a
single rule. It greatly improves the efficiency of the
algorithm using CR-tree, a high degree compression tree, to
store, back, pruning the classification rules. While the
CMAR algorithm does not take full advantage of the
characteristics of classification, there are many redundant
nodes in the FP-tree.
Trie, also known as dictionary tree or word search tree, is
a variant of the hash tree. The typical applications of this tree
structure is used to store a large number of strings (but not
limited to). The core idea of Trie-tree is using the common
prefix of the string to reduce the cost of the query time to
improve efficiency.
This paper presents a classification algorithm based on
the trie-tree of association rules that named CARPT, which
effectively reduces the number of scanning during the stage
of association rule mining by changing the data storage
structure and storage manner. It removes the frequent items
that cannot generate frequent rules directly by adding the
count of class labels during the construction of Trie-tree, so,
time and space can be saved effectively. Based on this, the
algorithm it also draws the pruning idea of CDDP
[7]

algorithm, reduces the number of candidate frequent item
sets once again.
II. THEORIES AND DEFINITIONS
A. Association Rules Mining
Association rules mining in a transaction database can be
described as follows:
Set I = {i1, i2, ..., im} is a collection of items, D = {t1, t2, ...,
tn} is a transaction database consisted of a series of
transactions with a unique identifier TID, each transaction ti
(i = 1, 2, ..., n) corresponds to a subset of I. Each ik (k = 1,
2, ..., m) is an “attribute – value” pairs, known as data item

Link download
4eT8LU59iO3oN8f
Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status