Tài liệu Báo cáo khoa học: "Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages" doc - Pdf 10

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1346–1356,
Uppsala, Sweden, 11-16 July 2010.
c
2010 Association for Computational Linguistics
Multilingual Pseudo-Relevance Feedback: Performance Study of
Assisting Languages
Manoj K. Chinnakotla Karthik Raman Pushpak Bhattacharyya
Department of Computer Science and Engineering
Indian Institute of Technology, Bombay,
Mumbai, India
{manoj,karthikr,pb}@cse.iitb.ac.in
Abstract
In a previous work of ours Chinnakotla
et al. (2010) we introduced a novel
framework for Pseudo-Relevance Feed-
back (PRF) called MultiPRF. Given a
query in one language called Source, we
used English as the Assisting Language to
improve the performance of PRF for the
source language. MulitiPRF showed re-
markable improvement over plain Model
Based Feedback (MBF) uniformly for 4
languages, viz., French, German, Hungar-
ian and Finnish with English as the as-
sisting language. This fact inspired us
to study the effect of any source-assistant
pair on MultiPRF performance from out
of a set of languages with widely differ-
ent characteristics, viz., Dutch, English,
Finnish, French, German and Spanish.
Carrying this further, we looked into the

retrieval as being relevant to the query. Based on
the above assumption, the terms in the feedback
document set are analyzed to choose the most dis-
tinguishing set of terms that characterize the feed-
back documents and as a result the relevance of
a document. Query reﬁnement is done by adding
the terms obtained through PRF, along with their
weights, to the actual query.
Although PRF has been shown to improve re-
trieval, it suffers from the following drawbacks:
(a) the type of term associations obtained for query
expansion is restricted to co-occurrence based re-
lationships in the feedback documents, and thus
other types of term associations such as lexical and
semantic relations (morphological variants, syn-
onyms) are not explicitly captured, and (b) due to
the inherent assumption in PRF, i.e., relevance of
top k documents, performance is sensitive to that
of the initial retrieval algorithm and as a result is
not robust.
Multilingual Pseudo-Relevance Feedback
(MultiPRF) (Chinnakotla et al., 2010) is a novel
framework for PRF to overcome both the above
limitations of PRF. It does so by taking the help of
a different language called the assisting language.
In MultiPRF, given a query in source language
L
1
, the query is automatically translated into
the assisting language L

neously. Interestingly, the performance improve-
ment is more pronounced when the source and as-
sisting languages are closely related, e.g., French
and Spanish.
The paper is organized as follows: Section 2,
discusses the related work. Section 3, explains the
Language Modeling (LM) based PRF approach.
Section 4, describes the MultiPRF approach. Sec-
tion 5 discusses the experimental set up. Section 6
presents the results, and studies the effect of vary-
ing the assisting language and incorporates mul-
tiple assisting languages. Finally, Section 7 con-
cludes the paper by summarizing and outlining fu-
ture work.
2 Related Work
PRF has been successfully applied in various IR
frameworks like vector space models, probabilis-
tic IR and language modeling (Buckley et al.,
1994; Jones et al., 2000; Lavrenko and Croft,
2001; Zhai and Lafferty, 2001). Several ap-
proaches have been proposed to improve the per-
formance and robustness of PRF. Some of the rep-
resentative techniques are (i) Reﬁning the feed-
back document set (Mitra et al., 1998; Sakai et
al., 2005), (ii) Reﬁning the terms obtained through
PRF by selecting good expansion terms (Cao et
al., 2008) and (iii) Using selective query expan-
sion (Amati et al., 2004; Cronen-Townsend et al.,
2004) and (iv) Varying the importance of docu-
ments in the feedback set (Tao and Zhai, 2006).

of query processing (for example term conﬂation)
in those languages.
Jourlin et al. (1999) use parallel blind relevance
feedback, i.e. they use blind relevance feedback on
a larger, more reliable parallel corpus, to improve
retrieval performance on imperfect transcriptions
of speech. Another related idea is by Xu et al.
(2002), where a statistical thesaurus is learned us-
ing the probabilistic bilingual dictionaries of Ara-
bic to English and English to Arabic. Meij et
al. (2009) tries to expand a query in a differ-
ent language using language models for domain-
speciﬁc retrieval, but in a very different setting.
Since our method uses a corpus in the assisting
language from a similar time period, it can be
likened to the work by Talvensaari et al. (2007)
who used comparable corpora for Cross-Lingual
Information Retrieval (CLIR). Other work pertain-
ing to document alignment in comparable corpora,
such as Braschler and Sch
¨
auble (1998), Lavrenko
et al. (2002), also share certain common themes
with our approach. Recent work by Gao et al.
1347
(2008) uses English to improve the performance
over a subset of Chinese queries whose transla-
tions in English are unambiguous. They use inter-
document similarities across languages to improve
the ranking performance. However, cross lan-

accurately using the query alone. In
PRF, the top k documents obtained through the ini-
tial ranking algorithm are assumed to be relevant
and used as feedback for improving the estima-
tion of Θ
Q
. The feedback documents contain both
relevant and noisy terms from which the feedback
language model is inferred based on a Generative
Mixture Model (Zhai and Lafferty, 2001).
Let D
F
= {d
1
, d
2
, . . . , d
k
} be the top k docu-
ments retrieved using the initial ranking algorithm.
Zhai and Lafferty (Zhai and Lafferty, 2001) model
the feedback document set D
F
as a mixture of two
distributions: (a) the feedback language model and
(b) the collection model P (w|C). The feedback
language model is inferred using the EM Algo-
rithm (Dempster et al., 1977), which iteratively
accumulates probability mass on the most distin-
guishing terms, i.e. terms which are more fre-

Top ‘k’ Results
Top ‘k’ Results
PRF
(Model Based
Feedback)
PRF
(Model Based
Feedback)
L
1
Index
L
2
Index
Final Ranked List
Of Documents in L
1
Feedback
Model Interpolation
Relevance Model
Translation
KL-Divergence
Ranking Function
Feedback Model θ
L
2
Feedback Model θ
L
1
Query in L

L
2
Feedback Language Model obtained from PRF in L
2
Θ
T rans
L
1
Feedback Model Translated from L
2
to L
1
t(f|e) Probabilistic Bi-Lingual Dictionary from L
2
to L
1
β, γ Interpolation coefﬁcients coefﬁcients used in MultiPRF
Table 2: Glossary of Symbols used in explaining MultiPRF
to the above technique as Model Based Feedback
(MBF).
4 Multilingual PRF (MultiPRF)
The schematic of the MultiPRF approach is shown
in Figure 1. Given a query Q in the source lan-
guage L
1
, we automatically translate the query
into the assisting language L
2
. We then rank the
documents in the L

→ L
1
as follows:
P (f|Θ
T rans
L
1
) =
X
∀ e in L
2
t(f|e) · P (e|Θ
F
L
2
) (1)
The probabilistic bi-lingual dictionary t(f |e) is
1348
Language
CLEF Collection
Identifier
Description
No. of
Documents
No. of Unique
Terms
CLEF Topics
(No. of Topics)
English
EN-00+01+02

FR-03+05
Le Monde 94, French SDA 94-95
129806
182214
141-200,251-300 (99)
FR-06
Le Monde 94-95, French SDA 94-95
177452
231429
301-350 (48)
German
DE-00
Frankfurter Rundschau 94, Der Spiegel 94-95
153694
791093
1-40 (33)
DE-01+02
Frankfurter Rundschau 94, Der Spiegel 94-95,
German SDA 94
225371
782304
41-140 (85)
DE-02+03
Frankfurter Rundschau 94, Der Spiegel 94-95,
German SDA 94-95
294809
867072
91-200 (67)
DE-03
Frankfurter Rundschau 94, Der Spiegel 94-95,

Source Term Top Aligned Terms in Target
French English
am
´
ericain american, us, united, state, america
nation nation, un, united, state, country
e
´
tude study, research, assess, investigate, survey
German English
ﬂugzeug aircraft, plane, aeroplane, air, ﬂight
spiele play, game, stake, role, player
verh
¨
altnis relationship, relate, balance, proportion
Table 3: Top Translation Alternatives for some sample words
in Probabilistic Bi-Lingual Dictionary
learned from a parallel sentence-aligned corpora
in L
1
−L
2
based on word level alignments. Tiede-
mann (Tiedemann, 2001) has shown that the trans-
lation alternatives found using word alignments
could be used to infer various morphological and
semantic relations between terms. In Table 3,
we show the top translation alternatives for some
sample words. For example, the French word
am

F
L
1
+ γ · Θ
T rans
L
1
(2)
Since we want to retain the query focus during
back translation the feedback model in L
2
is inter-
polated with the translated query before transla-
tion of the L
2
feedback model. The parameters β
and γ control the relative importance of the orig-
inal query model, feedback model of L
1
and the
translated feedback model obtained from L
1
and
are tuned based on the choice of L
1
and L
2
.
5 Experimental Setup
We evaluate the performance of our system us-

P@5
P@10
MAP
GMAP
MBF
MultiPRF
% Impr.
MBF
MultiPRF
% Impr.
MBF
MultiPRF
% Impr.
MBF
MultiPRF
% Impr.
FR-00
EN
0.4690
0.5241
11.76
‡
0.4000
0.4000
0.00
0.4220
0.4393
4.10
0.2961
0.3413

0.4535
4.43
‡
0.2395
0.2721
13.61
ES
0.4977
7.35
‡
0.4363
7.26
‡
0.4416
1.70
0.2349
-1.92
NL
0.4818
3.92
0.4409
8.38
‡
0.4375
0.76
0.2534
5.80
FR-03+05
EN
0.4545

0.1319
-0.38
FR-06
EN
0.4917
0.5083
3.39
0.4625
0.4729
2.25
0.3837
0.4104
6.97
0.2174
0.2810
29.25
ES
0.5083
3.39
0.4687
1.35
0.3918
2.12
0.2617
20.38
NL
0.5083
3.39
0.4646
0.45

434.78
NL
0.3151
36.82
‡
0.2818
17.71
‡
0.2331
8.00
0.0122
430.43
DE-01+02
EN
0.5341
0.6000
12.34
‡
0.4864
0.5318
9.35
‡
0.4229
0.4576
8.2
‡
0.1765
0.2721
9.19
ES

0.4274
0.4355
1.91
0.1243
0.1771
42.48
ES
0.5647
10.77
‡
0.4980
4.10
0.4568
6.89
‡
0.1645
32.34
NL
0.5529
8.45
‡
0.4941
3.27
0.4347
1.72
0.1490
19.87
FI-02+03+04
EN
0.3782

0.1839
36.83
Table 4: Results comparing the performance of MultiPRF over baseline MBF on CLEF collections with English (EN), Spanish
(ES) and Dutch (NL) as assisting languages. Results marked as
‡
indicate that the improvement was found to be statistically
signiﬁcant over the baseline at 90% conﬁdence level (α = 0.01) when tested using a paired two-tailed t-test.
since some function words like l’, d’ etc., occur as
preﬁxes to a word, we strip them off during index-
ing and query processing, since it signiﬁcantly im-
proves the baseline performance. We use standard
evaluation measures like MAP, P@5 and P@10
for evaluation. Additionally, for assessing robust-
ness, we use the Geometric Mean Average Preci-
sion (GMAP) metric (Robertson, 2006) which is
also used in the TREC Robust Track (Voorhees,
2006). The probabilistic bi-lingual dictionary used
in MultiPRF was learnt automatically by running
GIZA++: a word alignment tool (Och and Ney,
2003) on a parallel sentence aligned corpora. For
all the above language pairs we used the Europarl
Corpus (Philipp, 2005). We use Google Trans-
late as the query translation system as it has been
shown to perform well for the task (Wu et al.,
2008). We use the MBF approach explained in
Section 3 as a baseline for comparison. We use
two-stage Dirichlet smoothing with the optimal
parameters tuned based on the collection (Zhai and
Lafferty, 2004). We tune the parameters of MBF,
speciﬁcally λ and α, and choose the values which

gains achieved by MultiPRF are primarily due to
one of three reasons: (a) Good Feedback in As-
sisting Language: If the feedback model in the
assisting language contains good terms, then the
back-translation process will introduce the corre-
sponding feedback terms in the source language,
thus leading to improved performance. As an
example of this phenomena, consider the French
Query “Maladie de Creutzfeldt-Jakob”. In this
case the original feedback model also performs
1350
TOPIC NO
ASSIST
LANG.
SOURCE LANGUAGE
QUERY
TRANSLATED
QUERY
QUERY
MEANING
MBF
MAP
MPRF
MAP
MBF- Top Representative Terms
(With Meaning) Excl. Query
Terms
MultiPRF- Top Representative
Terms (With Meaning) Excl. Query
Terms

asthma, allergi, krankheit (disease),
allerg (allergenic), chronisch,
hauterkrank (illness of skin), arzt
(doctor), erkrank (ill)
FRENCH '02:
TOPIC 107
NL
Ingénierie génétique
Genetische
Manipulatie
Genetic
Engineering
0.145
0.357
développ (developed), évolu
(evolved), product, produit
(product), moléculair (molecular)
genetic, gen, engineering, développ,
product
FRENCH '06:
TOPIC 256
EN
Maladie de
Creutzfeldt-Jakob
Creutzfeldt-Jakob
Creutzfeldt-
Jakob Disease
0.507
0.688
malad (illness), produit (product),

América Latina
AI in Latin
America
0.456
0.098
international, amnesty,
strassenkind (street child),
kolumbi (Columbian), land, brasili
(Brazil), menschenrecht (human
rights), polizei (police)
karib (Caribbean), land, brasili,
schuld (blame), amerika, kalt (cold),
welt (world), forschung (research)
GERMAN '03:
TOPIC 196
EN
Fusion japanischer
Banken
Fusion of Japanese
banks
Merger of
Japanese Banks
0.572
0.264
daiwa, tokyo, filial (branch),
zusammenschluss (merger)
kernfusion (nuclear fusion),
zentralbank (central bank), daiwa,
weltbank (world bank),
investitionsbank (investment bank)

consider the French query “Ing
´
enierie g´n´tique”.
While the feedback model was unable to ﬁnd
any of the synonyms of the query terms, due to
their lack of co-occurence with the query terms,
the MultiPRF model was able to get these terms,
which are introduced primarily during the back-
translation process. Thus terms like {genetic, gen,
engineering}, which are synonyms of the query
words, are found thus resulting in improved per-
formance. (c) Combination of Above Factors:
Sometimes a combination of the above two factors
causes improvements in the performance as in the
German query “
¨
Olkatastrophein Sibirien”. For
this query, MultiPRF ﬁnds good feedback terms
such as {russisch, russland} while also obtaining
semantically related terms such as {olverschmutz,
erdol, olunfall}.
Although all of the previously described exam-
ples had good quality translations of the query
in the assisting language, as mentioned in (Chin-
nakotla et al., 2010), the MultiPRF approach is
robust to suboptimal translation quality as well.
To see how MultiPRF leads to improvements even
with errors in query translation consider the Ger-
man Query “Siegerinnen von Wimbledon”. When
this is translated to English, the term “Lady” is

Figure 2: Results showing the sensitivity of MultiPRF performance to parameters β and γ for French, German and Finnish.
tion affects performance as well.
6.1 Parameter Sensitivity Analysis
The MultiPRF parameters β and γ in Equation
2 control the relative importance assigned to the
original feedback model in source language L
1
,
the translated feedback model obtained from as-
sisting language L
2
and the original query terms.
We varied the β and γ parameters for French, Ger-
man and Finnish collections with English, Dutch
and Spanish as assisting languages and studied its
effect on MAP of MultiPRF. The results are shown
in Figure 2. The results show that, in all the three
collections, the optimal value of the parameters
almost remains the same and lies in the range of
0.4-0.48. Due to the above reason, we arbitrarily
choose the parameters in the above range and do
not use any technique to learn these parameters.
6.2 Effect of Assisting Language Choice
In this section, we discuss the effect of varying
the assisting language. Besides, we also study
the inter and intra familial behaviour of source-
assisting language pairs. In order to ensure that
the results are comparable across languages, we
indexed the collections from the years 2002, 2003
and use common topics from the topic range 91-

est Monolingual MBF performance. The results
show that Finnish is the least helpful of assist-
ing languages, with performance similar to those
of the baselines. We also observe that the three
best performing assistant languages, i.e. English,
French and Spanish, have the highest monolingual
performances as well, thus further validating the
claim. One possible reason for this is the relative
1352
Source
Lang.
Assisting Language
Source
Lang.MBF
English
German
Dutch
Spanish
French
Finnish
English
MAP
-
0.4464 (-0.7%)
0.4471 (-0.5%)
0.4566 (+1.6%)
0.4563 (+1.5%)
0.4545 (+1.1%)
0.4495
P@5

0.5284 (+11.3%)
0.5209 (+9.8%)
0.5179 (+9.1%)
0.5149 (+8.5%)
0.5075 (+6.9%)
0.4746
Dutch
MAP
0.4317 (+4%)
0.4453 (+7.2%)
-
0.4275 (+2.9%)
0.4241 (+2.1%)
0.3971 (-4.4%)
0.4153
P@5
0.5642 (+11.8%)
0.5731 (+13.6%)
0.5343 (+5.9%)
0.5582 (+10.6%)
0.5045 (0%)
0.5045
P@10
0.5075 (+9%)
0.4925 (+5.8%)
0.4896 (+5.1%)
0.5015 (+7.7%)
0.4806 (+3.2%)
0.4657
Spanish

0.4451 (+2.2%)
0.4356
P@5
0.4925 (+3.1%)
0.4806 (+0.6%)
0.4567 (-4.4%)
0.4925 (+3.1%)
0.4836 (+1.3%)
0.4776
P@10
0.4358 (+3.9%)
0.4239 (+1%)
0.4224 (+0.7%)
0.4388 (+4.6%)
0.4209 (+0.4%)
0.4194
Finnish
MAP
0.3411 (-4.7%)
0.3796 (+6.1%)
0.3722 (+4%)
0.369 (+3.1%)
0.3553 (-0.7%)
-
0.3578
P@5
0.394 (+3.1%)
0.403 (+5.5%)
0.406 (+6.3%)
0.4119 (+7.8%)

strong monolingual MBF performance.
6.3 Effect of Language Family on Back
Translation Performance
As already mentioned, the performance of Multi-
PRF is good if the source and assisting languages
belong to the same family. In this section, we ver-
ify the above intuition by studying the impact of
language family on back translation performance.
The experiment designed is as follows: Given a
query in source language L
1
, the ideal translation
in assisting language L
2
is used to compute the
query model in L
2
using only the query terms.
Then, without performing PRF the query model
Source
Lang.
Assisting Language
MBF
MPRF
FR
ES
DE
NL
EN
FI

0.2902
-
0.2757
0.2372
0.3968
0.3989
Table 8: Effect of Language Family on Back Translation
Performance measured through MultiPRF MAP. 100 Topics
from years 2001 and 2002 were used for all languages.
is directly back translated from L
2
into L
1
and
ﬁnally documents are re-ranked using this trans-
lated feedback model. Since the automatic query
translation and PRF steps have been eliminated,
the only factor which inﬂuences the MultiPRF per-
formance is the back-translation step. This means
that the source-assisting language pairs for which
the back-translation is good will score a higher
performance. The results of the above experiment
is shown in Table 8. For each source language,
the best performing assisting languages have been
highlighted.
The results show that the performance of
closely related languages like French-Spanish and
German-Dutch is more when compared to other
source-assistant language pairs. This shows that
in case of closely related languages, the back-

0.5343 (+7.8%)
0.5403 (+9.0%)
0.4806 (-3.0%)
0.4955
P@10
0.4373 (+1.0%)
0.4358 (+0.7%)
0.4597 (+6.2%)
0.4582 (+5.9%)
0.4164 (-3.8%)
0.4328
German
MAP
0.4427 (+9.8%)
-
0.4306 (+6.8%)
0.4404 (+9.2%)
0.4104 (+1.8%)
0.3993 (-1.0%)
0.4033
P@5
0.606 (+18%)
0.5672 (+10.5%)
0.594 (+15.7%)
0.5761 (+12.2%)
0.5552 (+8.1%)
0.5134
P@10
0.5373 (+13.2%)
0.503 (+6.0%)

0.4773 (-0.7%)
0.4733 (-1.5%)
-
0.4839 (+0.7%)
0.4412 (-8.2%)
0.4805
P@5
0.6507 (+1.8%)
0.6448 (+0.9%)
0.6507 (+1.8%)
0.6478 (+1.4%)
0.597 (-6.5%)
0.6388
P@10
0.5791 (+1.0%)
0.5791 (+1.0%)
0.5761 (+0.5%)
0.5866 (+2.4%)
0.5567 (-2.9%)
0.5731
French
MAP
0.4591 (+5.4%)
0.4514 (+3.6%)
0.4409 (+1.2%)
0.4712 (+8.2%)
-
0.4354 (0%)
0.4356
P@5

0.3567 (+14.9%)
0.31 (-0.2%)
0.3253 (+4.8%)
0.32 (+3.1%)
0.3239 (+4.3%)
0.3105
Table 7: Results showing the performance of MultiPRF without using automatic query translation i.e. by using corresponding
original queries in assisting collection. The results show the potential of MultiPRF by establishing a performance upper bound.
guage which shows good performance across both
families.
6.4 Multiple Assisting Languages
So far, we have only considered a single assist-
ing language. However, a natural extension to
the method which comes to mind, is using mul-
tiple assisting languages. In other words, com-
bining the evidence from all the feedback mod-
els of more than one assisting language, to get a
feedback model which is better than that obtained
using a single assisting language. To check how
this simple extension works, we performed exper-
iments using a pair of assisting languages. In these
experiments for a given source language (from
amongst the 6 previously mentioned languages)
we tried using all pairs of assisting languages (for
each source language, we have 10 pairs possible).
To obtain the ﬁnal model, we simply interpolate all
the feedback models with the initial query model,
in a similar manner as done in MultiPRF. The re-
sults for these experiments are given in Table 9.
As we see, out of the 60 possible combinations

EN – 3 Pairs; FR – 6 Pairs; DE – 10 Pairs;
ES - 8 Pairs; NL – 4 Pairs; FI – 1 Pair
Table 9: Summary of MultiPRF Results with Two Assisting
Languages. The improvements described above are with re-
spect to maximum MultiPRF MAP obtained using either L
1
or L
2
alone as assisting language.
lead to improvements. A more detailed study of
this observation needs to be done to explain this.
7 Conclusion and Future Work
We studied the effect of different source-assistant
pairs and multiple assisting languages on the per-
formance of MultiPRF. Experiments across a wide
range of language pairs with varied degree of fa-
milial relationships show that MultiPRF improves
performance in most cases with the performance
improvement being more pronounced when the
source and assisting languages are closely related.
We also notice that the results are mixed when two
assisting languages are used simultaneously. As
part of future work, we plan to vary the model
interpolation parameters dynamically to improve
the performance in case of multiple assisting lan-
guages.
Acknowledgements
The ﬁrst author was supported by a fellowship
award from Infosys Technologies Ltd., India. We
would like to thank Mr. Vishal Vachhani for his

pages 704–711. ACM.
Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft.
2004. A Framework for Selective Query Expansion. In
CIKM ’04, pages 236–237. ACM.
Ido Dagan, Alon Itai, and Ulrike Schwall. 1991. Two Lan-
guages Are More Informative Than One. In ACL ’91,
pages 130–137. ACL.
A. Dempster, N. Laird, and D. Rubin. 1977. Maximum Like-
lihood from Incomplete Data via the EM Algorithm. Jour-
nal of the Royal Statistical Society, 39:1–38.
T. Susan Dumais, A. Todd Letsche, L. Michael Littman, and
K. Thomas Landauer. 1997. Automatic Cross-Language
Retrieval Using Latent Semantic Indexing. In AAAI ’97,
pages 18–24.
Wei Gao, John Blitzer, and Ming Zhou. 2008. Using English
Information in Non-English Web Search. In iNEWS ’08,
pages 17–24. ACM.
David Hawking, Paul Thistlewaite, and Donna Harman.
1999. Scaling Up the TREC Collection. Inf. Retr., 1(1-
2):115–137.
Hieu Hoang, Alexandra Birch, Chris Callison-burch, Richard
Zens, Rwth Aachen, Alexandra Constantin, Marcello Fed-
erico, Nicola Bertoldi, Chris Dyer, Brooke Cowan, Wade
Shen, Christine Moran, and Ondej Bojar. 2007. Moses:
Open Source Toolkit for Statistical Machine Translation.
In ACL ’07, pages 177–180.
P. Jourlin, S. E. Johnson, K. Sp
¨
arck Jones and P. C. Wood-
land. 1999. Improving Retrieval on Imperfect Speech

atic Comparison of Various Statistical Alignment Models.
Computational Linguistics, 29(1):19–51.
I. Ounis, G. Amati, Plachouras V., B. He, C. Macdonald, and
Johnson. 2005. Terrier Information Retrieval Platform.
In ECIR ’05, volume 3408 of Lecture Notes in Computer
Science, pages 517–519. Springer.
Koehn Philipp. 2005. Europarl: A Parallel Corpus for Statis-
tical Machine Translation. In MT Summit ’05.
Stephen Robertson. 2006. On GMAP: and Other Transfor-
mations. In CIKM ’06, pages 78–83. ACM.
Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama.
2005. Flexible Pseudo-Relevance Feedback Via Selective
Sampling. ACM TALIP, 4(2):111–135.
Tao Tao and ChengXiang Zhai. 2006. Regularized Esti-
mation of Mixture Models for Robust Pseudo-Relevance
Feedback. In SIGIR ’06, pages 162–169. ACM.
Tuomas Talvensaari, Jorma Laurikkala, Kalervo J
¨
arvelin,
Martti Juhola, and Heikki Keskustalo. 2007. Creating and
Exploiting a Comparable Corpus in Cross-language Infor-
mation Retrieval. ACM Trans. Inf. Syst., 25(1):4, 2007.
Jrg Tiedemann. 2001. The Use of Parallel Corpora in Mono-
lingual Lexicography - How word alignment can identify
morphological and semantic relations. In COMPLEX ’01,
pages 143–151.
Ellen M. Voorhees. 1994. Query Expansion Using Lexical-
Semantic Relations. In SIGIR ’94, pages 61–69. Springer-
Verlag.
1355

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Tài liệu Báo cáo khoa học: "Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages" doc - Pdf 10

Tài liệu, ebook tham khảo khác

Học thêm