Data Analysis Machine Learning and Applications Episode 3 Part 2 - Pdf 20

Applying Small Sample Tests for Behavior-based Recommendations 549
GEYER-SCHULZ, A. and HAHSLER, M. and NEUMANN, A. and THEDE, A. (2003a):
Behavior-Based Recommender Systems as Value-Added Services for ScientiﬁcLi-
braries. In: H. Bozdogan: Statistical Data Mining & Knowledge Discovery. Chapman
& Hall / CRC, Boca Raton, 433–454.
GEYER-SCHULZ, A. and NEUMANN, A. and THEDE, A. (2003b): An Architecture for
Behavior-Based Library Recommender Systems. Journal of Information Technology and
Libraries, 22(4).
KOTLER, P. (1980): Marketing management: analysis, planning, and control. Prentice-Hall,
Englewood Cliffs.
MADDALA, G.S. (2001): Introduction to Econometrics. John Wiley, Chichester.
NARAYANA, C.L. and MARKIN, R.J. (1975): Consumer Behavior and Product Performance:
An Alternative Conceptualization. Journal of Marketing, 39(4), 1–6.
PRIGOGINE, I. (1962): Non-equilibrium statistical mechanics. John Wiley & Sons, New
York, London.
ROTHSCHILD, M. and STIGLITZ, J. (1976): Equilibrium in Competitive Insurance Markets:
An Essay on the Economics of Imperfect Information. Quarterly Journal of Economics,
90(4), 629–649.
SAMUELSON, P.A. (1938a): A Note on the Pure Theory of Consumer’s Behaviour. Econom-
ica, 5(17), 61–71.
SAMUELSON, P.A. (1938b): A Note on the Pure Theory of Consumer’s Behaviour: An Ad-
dendum. Economica, 5(19), 353–354.
SAMUELSON, P.A. (1948): Consumption Theory in Terms of Revealed Preference. Econom-
ica, 15(60), 243–253.
SPENCE, M.A. (1974): Market Signaling: Information Transfer in Hiring and Related Screen-
ing Processes. Harvard University Press, Cambridge, Massachusetts.
SPIGGLE, S. and SEWALL, M.A. (1987): A Choice Sets Model of Retail Selection. Journal
of Marketing, 51(2), 97–111.
Collaborative Tag Recommendations
Leandro Balby Marinho and Lars Schmidt-Thieme
Information Systems and Machine Learning Lab (ISMLL)

cording to what they think is more suitable for the given resource.
534 Leandro Balby Marinho and Lars Schmidt-Thieme
Tag recommender systems recommend relevant tags for an untagged user re-
source. Relevant here can assume different perspectives, for example, a tag can be
judged relevant to a given resource according to the society point of view, through
the opinion of experts in the domain or even based on the personal proﬁle of an indi-
vidual user. The question would be, which concept of relevance would the user prefer
the most when using tag recommender services. This paper attempts to address this
question through the following contributions: (i) formulation of the tag recommenda-
tion problem and the introduction of a collaborative ﬁltering-based tag recommender
algorithm, (ii) presentation of a simple protocol for tag recommender evaluation (iii)
and (iv) a ground and quantitative evaluation on real-life data comparing different
tag recommender algorithms.
2 Related work
The literature regarding the speciﬁc problem of collaborative tag recommendation
is still sparse. The majority of the recent research work about collaborative tagging
systems and folksonomies is concerned in devising approaches to better structure the
data for browsing and searching where the recommendation problem is sometimes
only highlighted as a potential property to be further explored in future work (Mika
(2005), Hotho et al. (2006), Brooks and Montanez (2006), Heymann and Garcia-
Molinay (2006)). We brieﬂy describe below the works speciﬁcally investigating the
problem of collaborative tag recommendation.
Autotag (Mishne (2006)) is a tool that suggests tags for weblog posts using col-
laborative ﬁltering methods. Given a new weblog post, posts which are similar to it
are identiﬁed through traditional information retrieval similarity measures. Next, the
tags assigned to these posts are aggregated creating a ranked list of likely tags. De-
spite the collaborative ﬁltering scenario, there is no real personalization because the
user is not taken directly into account. Furthermore, the evaluation is done in a semi-
automatically fashion where the assumption of tag relevance for a given resource is
deﬁned to some extent by human experts.

• A set of items I
•AsetS⊆ R of possible ratings where r : U ×I → S is a partial function that
associates ratings to user/item pairs. In datasets r typically is represented as a list
of tuples (u, i, r(u, i)) with u ∈U, i ∈I and r deﬁned for the domain dom
r
⊆U ×I
• Task: In recommender systems the recommendations are for a given user u ∈U
aset
˜
I(u) ⊆ I of items. Usually
˜
I(u) is computed by ﬁrst generating a ranking
on the set of items according to some quality or relevance criterion, from which
then the top n elements are selected (see Eq. 2 below).
In CF, for m users and n items, the user proﬁles are represented in a user-item
matrix X ∈ R
m×n
. The matrix can be decomposed into row vectors:
X :=[x
1
, ,x
m
]

with x
u
:=[x
u,1
, ,x
u,n

The pairwise similarities between users is usually computed by means of vector
similarity:
sim(prof
u
,prof
v
) :=
prof
u
,prof
v

 prof
u
 prof
v

(1)
where u, v ∈U are two users and prof
u
and prof
v
are their proﬁle vectors.
536 Leandro Balby Marinho and Lars Schmidt-Thieme
Let B ⊆ I be the basket of items of the active user u ⊆ U and N
u
his/her best-
neighbors. The topN recommendations usually consists of a list of items ranked by
decreasing frequency of occurrence in the ratings of the neighbors:
˜

1
), pictures (Flickr
2
), music(Last.fm
3
),
etc. A tag recommender system can be formulated as follows:
• A set of users U
• A set of resources R
•AsetoftagsT
• A function s : U ×R →
˜
T associating tags to user/resources pairs, where
˜
T ⊆ T
and s is deﬁned for the domain dom
s
⊆U ×R
• Task: In tag recommender systems the recommendations are for a given user
u ∈ U and a resource r ∈ R aset
˜
T(u,r) ⊆ T of tags. As well as in the tradi-
tional formulation (section 3),
˜
T(u,r) can also be computed by ﬁrst generating a
ranking on the set of tags according to some quality or relevance criterion, from
which then the top n elements are selected (see Algo.1 below).
When comparing the formulation above with the one in section 3, we observe that
CF cannot be applied directly. This is due to the additional dimension represented by
1

Next the weights of each particular tag are summed up and the recommendation list
is ranked by decreasing value of the summed weights. Ties are broken by smaller
index. The overall CF procedure for tag recommendations is summarized in Algo.1.
Algorithm 1 CF for tag recommendations
• Given a new and/or untagged resource r ∈ R for the active user u ∈U
• Let A := {v ⊆U |s
v,r
≡

} denote the set of users who have tagged r where s is a function
associating tags to user/resources pairs
–Findk best neighbors:
N
u
:=
k
argmax
v∈A
sim(prof
u
,prof
v
)
– Output the top n tags:
˜
T(u,r) :=
n
argmax
t∈T


recommender used in practice (Fig.1).
To evaluate the recommenders we used a variant of the leave-one-out holdout
estimation that we named leave-tags-out. The idea is to choose a resource at random
for each user in the test set and hide the tags attached to it. The algorithm must try to
predict the hidden tags. To count the hits made by the algorithms we used the usual
recall measure,
recall
macro
(D) :=
1
| D |
|D|

1=1
|Y
i
∩Z
i
|
|Y
i
|
(3)
where
D is the test set, Y
i
the true tags and Z
i
the predicted ones. Since the precision
is forced by taking into account only a restricted number n of recommendations

quent tags recommender, which indicates its adequacy for cold-start related prob-
lems, where just a few tags are available in the system.
In future work we plan to reproduce the same experiments with different datasets
from different domains to conﬁrm the results here presented. We also want to reﬁne
the CF algorithms exploring different combinations between the user similarities
obtained from the two proﬁle matrices, i.e., user-resources and user-tags. Moreover,
5
T-test for a signiﬁcance level of 0.05.
540 Leandro Balby Marinho and Lars Schmidt-Thieme
we will compare the CF approach with more complex models such as multi-label
and relational classiﬁers.
7 Acknowledgments
This work is supported by CNPq, an institution of Brazilian Government for scien-
tiﬁc and technologic development.
References
BENZ, D., TSO, K., SCHMIDT-THIEME, L. (2006): Automatic Bookmark Classiﬁcation: A
Collaborative Approach. In: Proceedings of the Second Workshop on Innovations in Web
Infrastructure (IWI 2006), Edinburgh, Scotland.
BERNERS-LEE, T., HENDLER, J. and LASSILA, O. (2001): "Semantic Web", Scientiﬁc
American, May 2001.
BROOKS, C. H., MONTANEZ, N. (2006): Improved annotation of the blogosphere via au-
totagging and hierarchical clustering. New York, NY, USA : ACM Press, WWW ’06:
Proceedings of the 15th international conference on World Wide Web : 625
˚
U632.
DESHPANDE, M. and KARYPIS, G. (2004): Item-based top-n recommendation algorithms.
ACM Transactions on Information Systems, 22(1):1-34.
GOLBER, S., HUBERMAN, B.A. (2005): "The Structure of Collaborative Tagging
System", Information Dynamics Lab: HP Labs, Palo Alto, USA, available at:
/>HEYMANN, P. and GARCIA-MOLINAY, H. (2006): Collaborative Creation of Communal

Department of Computer Science, University of Freiburg
Georges-Koehler-Allee 51, 79110 Freiburg, Germany

2
Information Systems and Machine Learning Lab, University of Hildesheim
Samelsonplatz 1, 31141 Hildesheim, Germany
{tso,schmidt-thieme}@ismll.uni-hildesheim.de
Abstract. Recommender systems are used by an increasing number of e-commerce websites
to help the customers to ﬁnd suitable products from a large database. One of the most popular
techniques for recommender systems is collaborative ﬁltering. Several collaborative ﬁltering
algorithms claim to be able to solve i) the new-item problem, when a new item is introduced
to the system and only a few or no ratings have been provided; and ii) the user-bias problem,
when it is not possible to distinguish two items, which possess the same historical ratings
from users, but different contents. However, for most algorithms, evaluations are not satisfying
due to the lack of suitable evaluation metrics and protocols, thus, a fair comparison of the
algorithms is not possible.
In this paper, we introduce new methods and metrics for evaluating the user-bias and new-
item problem for collaborative ﬁltering algorithms which consider attributes. In addition, we
conduct empirical analysis and compare the results of existing collaborative ﬁltering algo-
rithms for these two problems by using several public movie datasets on a common setting.
1 Introduction
A Recommender system is a type of customization tool in e-commerce that gener-
ates personalized recommendations, which match with the taste of the users. Col-
laborative ﬁltering (CF) (Sarwar et al. (2000, 2001)) is a popular technique used in
recommender systems. It is used to predict the user interest for a given item based on
user proﬁles. The concept of this technique is that the user, who received a recom-
mendation for some sorts of items, would prefer the same items as other individuals
with a similar mind set.
However, besides its simplicity, one of the shortcomings of CF are the new-item
or cold-start problem. If no ratings are given for new items, it is difﬁcult for standard

Evaluating CF algorithms is not anything novel as there have already been relatively
standard measures for evaluating the CF algorithms. Most of the evaluations done
on CF focus on the overall performance of the CF algorithms (Breese et al. (1998),
Sarwar et al. (2000), Herlocker et al. (2004)). However, as mentioned in the pre-
vious section, CF suffers from several shortcomings which are the new-item prob-
lem, also known as the cold-start problem, as well as the user-bias problem. It has
been claimed that incorporating attributes could help to alleviate these drawbacks
(Kim and Li (2004)). In fact, there exist many approaches for combining content
Comparison of RS Algorithms on the New-Item and User-Bias Problem 527
information with CF (Burke (2002), Melville et al. (2002), Kim and Li (2004), Tso
and Schmidt-Thieme (2005)). However, there has been lack of suitable evaluations
which compute comparative analysis of attribute-aware and non attribute-aware CF
algorithms, focusing on these two problems.
Schein et al. (2002) have already discussed methods and metrics for the new-
item problem, in which they have introduced a performance metric called CROC
curve. However, this metric is only suitable for the new-item problem. In this paper,
we use standard performance metric, but introduce new protocols for evaluating the
new-item and the user-bias problems. Hence, this evaluation setting allows users to
compare the results with standard CF evaluation metrics, which does not restrict to
evaluate only the new-item problem, but also on the user-bias problem. In addition,
we compare the predicting accuracy of various collaborative ﬁltering algorithms in
this evaluation setting.
3 Observed approaches
In this section, we present a brief description of the two state-of-the-art CF models:
the aspect model by Hofmann (2004) and the approach by Kim & Li (2004).
Aspect model by Hofmann
Hofmann (2004) speciﬁed different versions of the aspect model regarding the col-
laborative ﬁltering domain. In this paper, we focus on the Gaussian model, because
it shows the best prediction accuracy for non-speciﬁc problems. He uses the aspect
model to identify the hidden semantic relationship among item y and users u,byus-

2V
2
y,z

As z is unobserved, Hoffmann used the Expectation Maximization (EM) algo-
rithm to learn the two model parameters: P(v|y,z) and P(z|u). The EM algorithm has
two main steps. The ﬁrst step is computation of the Expectation (E-Step), which is
528 Stefan Hauger, Karen H. L. Tso and Lars Schmidt-Thieme
done by computing the variation distribution Q over the latent variable z. The second
step is Maximization (M-Step), in which the model parameters are updated by using
the Q distribution computed in the previous E-Step. These two steps are executed un-
til it converges to a local optimal limit. The EM steps for the Gaussian pLSA model
are:
E-Step:
Q(z;u,y,v,T)=
P(z|u)P(v;z
y,z
,V
y,z
)

z

P(z

|u)P(v;z
y,z
,V
y,z
)

the new-item with the help of item attributes. They have incorporated attributes of
movies such as genre, actors, years, etc. to collaborative ﬁltering. It is expected that
when attributes are considered, it is possible to recommend a new item based on just
the user’s fondness of the attributes, even though no user has voted for the item.
Kim & Li have a rather similar model as the aspect model by Hoffmann, yet
there are several differences. First, class z associates only with the item, but not with
the users in contrast to the pLSA model by Hofmann. Note that, the latent class z in
this approach is regarded as an item clusters, instead of the user communities. Fur-
thermore, they have applied some heuristic techniques to compute the corresponding
model parameters, which can be done in two steps. First, using attributes, they clus-
tered the items in different cliques with a simple K-means clustering algorithm. After
clustering the items, they computed the probability of every item, i.e. the value indi-
cating how much the item belongs to every clique. Then, an item-clique matrix with
all the probabilities is derived. In the second step, the original item-user matrix is ex-
tended with the item-clique matrix, thus the attribute-cliques are just used as normal
users.
Class z is built with the help of the extended item-user matrix. Every class z con-
sists of a number of items of high similarity. The quality of class z is responsible
for the accuracy of the later prediction of the use vote. A K-Medoids clustering algo-
rithm using the Pearson’s Correlation is used to compute the classes. After clustering
the items into class z, a new item for each class z is created using the arithmetic mean.
This new item is then the representative vector of the class z.
With the help of these representative items and a group matrix, which stores the
membership of every item of the item-user matrix, it is possible to compute the ex-
pected vote for a user. In calculating the prediction, it is assumed that class z satisﬁes
the Gaussian distribution. Let V
y
be the rating vector of item y, V
z
the representative

=

y∈U
z
v
u,y
p(z|y)

y∈U
z
p(z|y)
4 Evaluation protocols
New-item problem
To evaluate the prediction accuracy, we use a protocol which deletes one vote ran-
domly from every user in the dataset, the so-called, AllBut1 protocol (Breese et al.
1998). The new-item problem is evaluated by a protocol similar to the AllBut1 pro-
tocol. Likewise, this protocol also deletes existing votes and builds up the model,
which is to be evaluated with the reduced dataset. The new items are created by
deleting all votes for a randomly selected item. After this is done for the required
number of items, one vote is deleted from each user as in the AllBut1 protocol. This
protocol has the advantage that the results of the new items can be compared with
the results for past-rated items. Mean Absolute Error (MAE) is used as metrics in
our experiments.
User-bias problem
The user-bias problem occurs, when two items have the same rating, but one item be-
longs to a group of items, which have not been given a good vote by the user, whereas
the other item belongs to a group, which was in contrast given a good vote by the
user; then the item, which belongs to the good-rated group, should be recommended.
To ﬁnd a pair of items for an user, all the items, which are rated by the user,
are taken into consideration and grouped two times. Once in item groups with equal

New-item problem
The results of the new-item problem are presented in Figure 4 and 5. Comparing the
performance achieved by the algorithms, which use no attributes and the Kim & Li
approach, we can see that the performance of the Kim & Li approach is only negli-
gibly affected when more new items are added, while the predicting accuracy of the
other approaches becomes much worse. This phenomenon is in line with our expec-
tations, because it is not possible for algorithms, which do not take the attributes into
account, to ﬁnd any relations between new items and already rated items. As for the
Kim & Li approach, there is no difﬁculty to assign an unrated item to an item cluster,
because it includes the attributes. The average standard deviation is about 0.03.
User-bias problem
In the experiments of the user-bias problem, the number of items for prediction is
between 60 to 70% of the total number of items, which is a representative amount.
Comparison of RS Algorithms on the New-Item and User-Bias Problem 531
Fig. 4. New-Item using EachMovie.
Fig. 5. New-Item using MovieLens.
Fig. 6. User-Bias using EachMovie.
Fig. 7. User-Bias using MovieLens.
Besides, as shown in Figures 6 and 7, our expectations are conﬁrmed. Only the ap-
proach by Kim & Li can mine the difference between two items with the same his-
torical rating, but belong to different attributes; while the other approaches do not
have any possibility to ﬁnd the type of items the user likes because they do not take
attributes into consideration. It is interesting to see that the aspect model, which per-
forms best in general, performs worst to the user- and item-based CF when special
problems such as the user-bias and new-item problem are considered.
6 Conclusion
The aim of this paper is to show that the new-item problem and user-bias problem
can be solved with the help of attributes. We have used three CF algorithms, which do
not use any attributes, and one approach, which takes the attribute information into
account to compute the recommendations in our evaluation. Our evaluations have

on Electronic Commerce (ECÕ00), 2000, pp. 285–295.
SARWAR, B.M., KARYPIS, G., KONSTAN, J.A. and RIEDL, J. (2001): Itembased collab-
orative ﬁltering recommendation algorithms. In Proceedings of the 10th international
conference on World Wide Web. New York, NY, USA: ACM Press, 2001, pp. 285–295.
SCHEIN, A.I., POPESCUL, A., UNGAR, L.H. and PENNOCK, D.M. (2002):Methods and
metrics for cold-start recommendations. In Proceedings of the 25th annual international
ACM SIGIR conference on Research and development in information retrieval. New York,
NY, USA: ACM Press, 2002, pp. 253–260.
TSO, K. and SCHMIDT-THIEME L. (2005): Attribute-aware Collaborative Filtering. In Pro-
ceedings of 29th Annual Conference of the Gesellschaft für Klassiﬁkation (GfKl) 2005,
Magdeburg, Springer.
A Two-Stage Approach for Context-Dependent
Hypernym Extraction
Berenike Loos
1
and Mario DiMarzo
2
1
European Media Laboratory GmbH,
Schloss-Wolfsbrunnenweg 33, 69118 Heidelberg, Germany

2
Universität Heidelberg, Institute of Computer Science, Germany

Abstract. In this paper we present an unsupervised method to deal with the classiﬁcation of
out-of-vocabulary words in open-domain spoken dialog systems. This classiﬁcation is vital to
ameliorate the human-computer interaction and to be able to extract additional information,
which can be presented to the user. We propose a two-stage approach for interpreting named
entities in a document corpus: to cluster documents dealing with a particular named entity
and to classify it with the help of structural and contextual information in these documents.

In the Single-Link algorithm a document only needs some kind of similarity to one
of the documents of the cluster. The result are comparatively few big clusters. (See
Subsection 2.2 for a detailed description of the single steps processed by the algo-
rithms.)
The evaluation will therefore show into which direction to go with respect to
clustering approaches in the future.
2.1 Data preparation
In the preprocessing step standard term vectors where established using the Porter
Stemmer. The similarity is calculated with the cosine coefﬁcient as shown in For-
mula (1).
cos(
→
x
,
→
y
)=
→
x
·
→
y
|
→
x
|·|
→
y
|
=

2.2 The clustering algorithms
The clustering algorithms applied with the DDSM in this work are Single-Link and
Clique.
The Single-Link algorithm as described by Kowalski (1997) works in four steps:
First, it chooses a document d of the remaining documents and adds it to a new
document cluster. Second, it adds all documents which are similar to d according to
the DDSM to the recent cluster. Third, it performs the second step for each document
which was added to the recent cluster. And last, if there are no more documents which
will be added to the recent cluster, it performs the ﬁrst step, otherwise it terminates.
A Two-Stage Approach for Context-Dependent Hypernym Extraction 587
The Clique Clustering algorithm described by Koch (2001) ﬁnds document clus-
ters by creating a seed-list with similar documents starting with an initial document.
As soon as the seed-list consists of all similar documents it is declared to be a cluster.
This procedure is done for all documents and, therefore, all ﬁnally belong to any of
the created clusters.
These two algorithms were chosen as the resulting clusters are quite different
to each other. Clique differs signiﬁcantly from Single Link, since Clique produces
smaller and more clusters than Single Link. A cluster established by Clique con-
tains always pairwise similar documents. Hence, all documents within the cluster
are similar to each other. In order to add a document d into a Single-Link cluster, it
is sufﬁcient that d is similar to only one of the documents belonging to the recent
cluster.
3 Hypernym extraction
According to Lyons (1977) hyponymy is the relation which holds between a more
speciﬁc lexeme (i.e. a hyponym) and a more general one (i.e. a hypernym). E.g.
animal is a hypernym of cat. Hypernym Extraction (HE) is applied in cases where
the hypernym of a given noun or named entity has to be found for example as part of
an ontology learning framework.
After the documents of the corpus are divided into different clusters the HE can
take place separately for all of the clusters. For this approach a Part-of-Speech Tagger

For each of the clusters a unique list of nouns occurring in the documents belonging
to a cluster is extracted. This list contains all possible nouns (hypernym candidates)
588 Berenike Loos and Mario DiMarzo
and, therefore, serves as a basis to establish the Named-Entity-Term-Vector (NETV).
The NETV is a vector, which contains a value for each noun (hypernym candidate)
in the unique list. The value is calculated by the cosine coefﬁcient (as shown in
Formula 1) and signiﬁes the co-occurrence of a hypernym candidate and the named
entity based on term frequency.
3.2 Term distance
The term distance approach takes the notion into account that smaller distances be-
tween hypernym candidate and named entity signify a more probable hypernym rela-
tion. Hence, smaller distances are considered to be more valuable and are, therefore,
preferred.
An example is the following German sentence:
•DasHotel Auerstein beﬁndet sich verkehrstechnisch günstig im nördlichen Hei-
delberger Stadtteil Handschuhsheim. (In English: The Hotel Auerstein is located
in direct access from the city center of Heidelberg in the northern neighborhood
Handschuhsheim.)
Therefore, a NETV of dimension p can be built, where p is the number of terms
in the unique-list. The entries for the vector are computed by calculating the distance
weights as described in the following: First, a parameter value for the highest pos-
sible distance of a hypernym candidate and the named entity is identiﬁed as shown
in Figure 3 in the Evaluation Section. It appeared that the results are most promising
for the distance of p =8.
The average distance weight v
n
of the pairwise occurrence of a hypernym n and
the named entity i is calculated according to Formula 2, where w
i
is the weight of the

(3)
3.3 Lexico-syntactic patterns
To take not only statistical methods into account, we tested the results for lexico-
syntactic patterns according to Hearst (1992). Therefore, we developed a boolean
named-entity-term-vector. Even though the detection of lexico-syntactic patterns is
not frequent, the probability that once found patterns are correct is high.
A Two-Stage Approach for Context-Dependent Hypernym Extraction 589
3.4 Weighting and consolidation
From the three described methods for hypernym extraction result three NETVs with
the same dimension, which are consolidated to one vector. As the probability of
correctness for once found lexico-syntactic Patterns is high, the weighting of them
is also high. Nonetheless, the weighting of the others is taken into account even if a
lexico-syntactic pattern is found.
The following formula serves for the calculation of the consolidated NETV,
where h is the NETV for the lexico-syntactic patterns, for the term frequency f,for
the term distance b and w
1
, w
2
, w
3
the weights, which are used as parameters in the
evaluation:
k =
w
1
·h + w
2
· f + w
3

as for the recall (as shown in Figure 2). For the recall we calculated how many of the
documents which are assigned to one cluster should actually be there.
For the analysis of an optimal threshold value it is necessary that only clusterings
are analyzed which consist of clusters indicated by manual annotation to be clusters.
The precision of the clustering task has to be 100% as only this can yield reliable
results for the hypernym extraction.
Fig. 2. Recall for Single-Link and Clique
A Two-Stage Approach for Context-Dependent Hypernym Extraction 591
4.2 Evaluation of the Hypernym Extraction task
For the Hypernym Extraction (HE) task the formula for weighting the nouns in the
neighborhood of the NE yields the best results. This parameter is referred to as neigh-
borRelevance.
The evaluation of the neighborRelevance parameter showed that a window of
eight words surrounding the NE yielded the best results as shown in Figure 3. This
means, that if a window of eight words surrounding the named entity is chosen, the
best results are attained. Nonetheless, it should be taken into account that the analysis
of shorter snippets is cheaper and therefore also the comparatively good results for
a value of 4 should be kept in mind for performance reasons. The formula for the
calculations is described in Subsection 3.2.
Fig. 3. Evaluation for neighbor relevance
The precision for the HE task depending on the value of the parameter
amountOfExtractedHypernyms, which refers to the number of hypernyms given by
the HE module, were 64.47% for value 1, 77.63% for value 2 and 84.21% for value 3.
The results vary from the ones of the evaluation for neighbor relevance due to slightly
changed parameter values. Overall we had results which outperformed earlier devel-
oped methods as described in Faulhaber et al. (2006) for hypernym extraction by
about 4% (absolute).
Table 1 shows the results for the best parameter choice according to our evalua-
tion for a combination of the modules for clustering and HE which we obtained by
empirical evaluation. These results of parameter values are not only of interest for

Theoretical Computer Science, 250(1-2):1–30.
KOWALSKI, G. (1997): Information Retrieval Systems: Theory and Implementation. Kluwer
Academic Publishers, USA.
LYONS, J. (1977): Semantics. University Press, Cambridge, MA.
MILLER, G., BECKWITH, R., FELLBAUM, C., GROSS, D. and MILLER, K. (1990): Intro-
duction to wordnet: An on-line lexical database. Journal of Lexicography, 3(4):235–244,
January.

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Data Analysis Machine Learning and Applications Episode 3 Part 2 - Pdf 20

Tài liệu, ebook tham khảo khác

Học thêm