The Descent of Hierarchy, and Selection in Relational Semantics
Barbara Rosario
SIMS
UC Berkeley
Berkeley, CA 94720
[email protected]
Marti A. Hearst
SIMS
UC Berkeley
Berkeley, CA 94720
[email protected]
Charles Fillmore
ICSI
UC Berkeley
Berkeley, CA 94720
fi[email protected]
Abstract
In many types of technical texts, meaning is
embedded in noun compounds. A language un-
derstanding program needs to be able to inter-
pret these in order to ascertain sentence mean-
ing. We explore the possibility of using an ex-
isting lexical hierarchy for the purpose of plac-
ing words from a noun compound into cate-
gories, and then using this category member-
ship to determine the relation that holds be-
tween the nouns. In this paper we present the
results of an analysis of this method on two-
word noun compounds from the biomedical do-
main, obtaining classification accuracy of ap-
proximately 90%. Since lexical hierarchies are
numbness, and hip pain, the first word of the NC falls into
the MeSH A01 (Body Regions) category, and the second
word falls into the C10 (Nervous System Diseases) cat-
egory. From these we can declare that the relation that
holds between the words is “located in”. Similarly, for
influenza patients and aids survivors, the first word falls
under C02 (Virus Diseases) and the second is found in
M01.643 (Patients), yielding the “afflicted by” relation.
Using this technique on a subpart of the category space,
we obtain 90% accuracy overall.
In some sense, this is a very old idea, dating back to
the early days of semantic nets and semantic grammars.
The critical difference now is that large lexical resources
and corpora have become available, thus allowing some
of those old techniques to become feasible in terms of
coverage. However, the success of such an approach de-
pends on the structure and coverage of the underlying lex-
ical ontology.
In the following sections we discuss the linguistic mo-
tivations behind this approach, the characteristics of the
lexical ontology MeSH, the use of a corpus to examine
the problem space, the method of determining the rela-
tions, the accuracy of the results, and the problem of am-
biguity. The paper concludes with related work and a
discussion of future work.
2 Linguistic Motivation
One way to understand the relations between the words
in a two-word noun compound is to cast the words into
Computational Linguistics (ACL), Philadelphia, July 2002, pp. 247-254.
Proceedings of the 40th Annual Meeting of the Association for
the need to enumerate in advance all of the relations that
may hold. Rather, the corpus determines which relations
occur.
3 The Lexical Hierarchy: MeSH
MeSH (Medical Subject Headings)
1
is the National Li-
brary of Medicine’s controlled vocabulary thesaurus; it
consists of set of terms arranged in a hierarchical struc-
ture. There are 15 main sub-hierarchies (trees) in MeSH,
each corresponding to a major branch of medical termi-
nology. For example, tree A corresponds to Anatomy,
tree B to Organisms, tree C to Diseases and so on. Every
branch has several sub-branches; Anatomy, for example,
consists of Body Regions (A01), Musculoskeletal System
(A02), Digestive System (A03) etc. We refer to these as
“level 0” categories.
These nodes have children, for example, Abdomen
(A01.047) and Back (A01.176) are level 1 children
of Body Regions. The longer the ID of the MeSH
term, the longer the path from the root and the more
precise the description. For example migraine is
C10.228.140.546.800.525, that is, C (a disease), C10
(Nervous System Diseases), C10.228 (Central Nervous
1
http://www.nlm.nih.gov/mesh/meshhome.html; the work
reported in this paper uses MeSH 2001.
System Diseases) and so on. There are over 35,000
unique IDs in MeSH 2001. Many words are assigned
more than one MeSH ID and so occur in more than one
subhierarchy at that level (and below it) behave similarly
with respect to relation assignment.
4 Counting Noun Compounds
In this and the next section, we describe how we investi-
gated the hypothesis:
For all two-word noun compounds (NCs) that
can be characterized by a category pair (CP), a
particular semantic relationship holds between
the nouns comprising those NCs.
The kinds of relations we found are similar to those
described in Section 2. Note that, in this analysis we fo-
cused on determining which sets of NCs fall into the same
relation, without explicitly assigning names to the rela-
tions themselves. Furthermore, the same relation may be
described by many different category pairs (see Section
5.5).
First, we extracted two-word noun compounds from
approximately 1M titles and abstracts from the Med-
line collection of biomedical journal articles, resulting
Figure 1: Distribution of Level 0 Category Pairs. Mark size
indicates the number of unique NCs that fall under the CP. Only
those for which
NCs occur are shown.
in about 1M NCs. The NCs were extracted by finding
adjacent word pairs in which both words are tagged as
nouns by a tagger and appear in the MeSH hierarchy, and
the words preceding and following the pair do not appear
in MeSH
2
Of these two-word noun compounds, 79,677
category pairs then we only need to determine which re-
lations hold between a subset of the possible pairs. Thus,
the more clumped the distribution, the easier (potentially)
our task is. Figure 1 shows that some areas in the CP
space have a higher concentration of unique NCs (the
Anatomy, and the E through N sub-hierarchies, for ex-
ample), especially when we focus on those for which at
least 50 unique NCs are found.
5 Labeling NC Relations
Given the promising nature of the NC distributions, the
question remains as to whether or not the hypothesis
holds. To answer this, we examined a subset of the CPs to
see if we could find positions within the sub-hierarchies
for which the relation assignments for the member NCs
are always the same.
5.1 Method
We first selected a subset of the CPs to examine in detail.
For each of these we examined, by hand, 20% of the NCs
they cover, paraphrasing the relation between the nouns,
and seeing if that paraphrase was the same for all the NCs
in the group. If it was the same, then the current levels of
the CP were considered to be the correct levels of descrip-
tion. If, on the other hand, several different paraphrases
were found, then the analysis descended one level of the
hierarchy. This repeated until the resulting partition of
the NCs resulted in uniform relation assignments.
For example, all the following NCs were mapped to the
same CP, A01 (Body Regions) and A07 (Cardiovascular
System):
scalp arteries, heel capillary, shoulder artery,
A01 H01.671 (Physics)
A01 H01.671.538 (Motion): shoulder rotations
A01 H01.671.100 (Biophysics): shoulder biomechanics
A01 H01.671.691 (Pressure): eye pressures
A01 H01.671.868 (Temp.): forehead temperature
A01 H01.671.768 (Radiation): thorax x-ray
A01 H01.671.252 (Electricity): chest electrode
A01 H01.671.606 (Optics): skin color
Figure 2: Levels of descent needed for NCs classified un-
der A01 H01.
A01 M01.526 (Occupational Groups):
chest physician,
eye nurse, eye physician
A01, M01.898 (Donors):
eye donor, skin donor
A01, M01.150 (Disabled Persons):
arm amputees, knee
amputees
.
In other words, to correctly assign a relationship to
these NCs, we needed to descend one level for the second
word. The resulting rules in this case are (A01 M01.643),
(A01, M01.150) etc. Figure 2 shows one CP for which we
needed to descend 3 levels.
In our collection, a total of 2627 CPs at level 0 have at
least 10 unique NCs. Of these, 798 (30%) are classified
with A (Anatomy) for either the first or the second noun.
We randomly selected 250 of such CPs for analysis.
We also analyzed 21 of the 90 CPs for which the sec-
ond noun was H01 (Natural Sciences); we decided to ana-
For CPs with H01 as the second noun, of the 21
CPs analyzed, we observed the following (level number,
count) pairs: (0, 1) (1, 8) (2, 12).
In all but three cases, the descending was done for the
second noun only. This may be because the second noun
usually plays the role of the head noun in two-word noun
compounds in English, thus requiring more specificity.
Alternatively, it may reflect the fact that for the exam-
ples we have examined so far, the more heterogeneous
terms dominate the second noun. Further examination is
needed to answer this decisively.
5.2 Accuracy
We tested the resulting classifications by developing a
randomly chosen test set (20% of the NCs for each
CP), entirely distinct from the labeled set, and used the
classifications (rules) found above to automatically pre-
dict which relations should be assigned to the member
NCs. An independent evaluator with biomedical training
checked these results manually, and found high accura-
cies: For the CPs which contained a noun in the Anatomy
domain, the assignments of new NCs were 94.2% accu-
rate computed via intra-category averaging, and 91.3%
accurate with extra-category averaging. For the CPs in
the Natural Sciences (H01) we found 81.6% accuracy via
intra-category averaging, and 78.6% accuracy with extra-
category averaging. For the three CPs in the C04 category
we obtained 100% accuracy.
The total accuracy across the portions of the A, H01
and C04 hierarchies that we analyzed were 89.6% via
intra-category averaging, and 90.8% via extra-category
turns out that our 415 classification rules cover 46001
possible CP pairs
3
.
This, and the fact that we achieve high accuracies with
these classification rules, show that we successfully use
MeSH to generalize over unique NCs.
5.4 Ambiguity
A common problem for NLP tasks is ambiguity. In this
work we observe two kinds: lexical and “relationship”
ambiguity. As an example of the former, mortality can
refer to the state of being mortal or to death rate. As an
example of the latter, bacteria mortality can either mean
“death of bacteria” or “death caused by bacteria”.
In some cases, the relationship assignment method de-
scribed here can help disambiguate the meaning of an
ambiguous lexical item. Milk for example, can be both
Animal Structures (A13) and Food and Beverages (J02).
Consider the NCs chocolate milk, coconut milk that fall
under the CPs (B06 -Plants-, J02) and (B06, A13). The
CP (B06, J02) contains 180 NCs (other examples are
berry wines, cocoa beverages) while (B06, A13) has
only 6 NCs (4 of which with milk). Assuming then that
(B06, A13) is “wrong”, we will assign only (B06, J02)
to chocolate milk, coconut milk, therefore disambiguat-
ing the sense for milk in this context (Beverage). Anal-
ogously, for buffalo milk, caprine milk we also have two
CPs (B02, J02) (B02, A13). In this case, however, it is
easy to show that only (B02 -Vertebrates-, A13) is the
correct one (i.e. yielding the correct relationship) and we
immunoglobulin or staining of immunoglobulin.
3) Multiple MeSH mappings but only one possible re-
lation. One example of this case is alcoholism treatment
where treatment is Therapeutics (E02) and alcoholism is
both Disorders of Environmental Origin (C21) and Men-
tal Disorders (F03). For this NC we have therefore 2 CPs:
(C21, E02) as in wound treatments, injury rehabilitation
and (F03, E02) as in delirium treatment, schizophrenia
therapeutics. The multiple mappings reflect the conflict-
ing views on how to classify the condition of alcoholism,
but the relationship does not change.
4) Multiple MeSH mappings and multiple relations
that can be predicted by the different CPs. For exam-
ple, Bread diet can mean either that a person usually eats
bread or that a physician prescribed bread to treat a con-
dition. This difference is reflected by the different map-
pings: diet is both Investigative Techniques (E05) and
Metabolism and Nutrition (G06), bread is Food and Bev-
erages (J02). In these cases, the category can help disam-
biguate the relation (as opposed to in case 5 below); word
sense disambiguation algorithms that use context may be
helpful.
5) Multiple MeSH mappings and multiple relations
that cannot be predicted by the different CPs. As an ex-
ample of this case, bacteria mortality can be both “death
of bacteria” or “death caused by bacteria”. The multiple
mapping for mortality (Public Health, Information Sci-
ence, Population Characteristics and Investigative Tech-
niques) does not account for this ambiguity. Similarly,
for inhibin immunization, the first noun falls under Hor-
The
average number of MeSH senses is always less than two,
and increases with length of description, as is to be ex-
pected.
We observe that 3.6% of the lexical ambiguity is at lev-
els higher that 2, 16% at L2, 21.4% at L1 and 59% at L0.
Level 1 and 2 combined account for more than 80% of the
lexical ambiguity. This means that when a noun has mul-
tiple senses, those senses are more likely to come from
different main subtrees of MeSH (A and B, for exam-
ple), than from different deeper nodes in the same subtree
(H01.671.538 vs. H01.671.252). This fits nicely with our
method of describing the NCs with the higher levels of
the hierarchy: if most of the ambiguity is at the highest
levels (as these results show), information about lexical
ambiguity is not lost when we describe the NCs using the
higher levels of MeSH. Ideally, however, we would like
to reduce the lexical ambiguity for similar senses and to
retain it when the senses are semantically distinct (like,
for example, for diet in case 4). In other words, ideally,
the ambiguity left at the levels of our rules accounts for
only (and for all) the semantically different senses. Fur-
ther analysis is needed, but the high accuracy we obtained
in the classification seems to indicate that this indeed is
what is happening.
4
We obtained very similar results for the second noun.
# Senses Original L2 L1 L0
1 (Unambiguous) 51539 51766 54087 58763
2 18637 18611 18677 17373
relations, or how often they repeat across different CPs,
until we examine the full spectrum of CPs. However, we
did a preliminary analysis to attempt to find relation repe-
tition across category pairs. As one example, we hypoth-
esized a relation afflicted by and verified that it applies
to all the CPs of the form (Disease C, Patients M01.643),
e.g.: anorexia (C23) patients, cancer (C04) survivor, in-
fluenza (C02) patients. This relation also applies to some
of the F category (Psychiatry), as in delirium (F03) pa-
tients, anxiety (F01) patient.
It becomes a judgement call whether to also include
NCs such as eye (A01) patient, gallbladder (A03) pa-
tients, and more generally, all the (Anatomy, Patients)
pairs. The question is, is “afflicted-by (unspecified) Dis-
ease in Anatomy Part” equivalent to “afflicted by Dis-
ease?” The answer depends on one’s theory of rela-
tional semantics. Another quandary is illustrated by the
NCs adolescent cancer, child tumors, adult dementia (in
which adolescent, child and adult are Age Groups) and
the heads are Diseases. Should these fall under the af-
flicted by relation, given the references to entire groups?
6 Related Work
6.1 Noun Compound Relation Assignment
Several approaches have been proposed for empirical
noun compound interpretation. Lauer & Dras (1994)
point out that there are three components to the prob-
lem: identification of the compound from within the text,
syntactic analysis of the compound (left versus right as-
sociation), and the interpretation of the underlying se-
mantics. Several researchers have tackled the syntactic
heuristics about how to classify a new NC given its simi-
larity to one that has already been seen.
In previous work (Rosario and Hearst, 2001), we
demonstrated the utility of using a lexical hierarchy for
assigning relations to two-word noun compounds. We
use machine learning algorithms and MeSH to success-
fully generalize from training instances, achieving about
60% accuracy on an 18-way classification problem us-
ing a very small training set. That approach is bottom
up and requires good coverage in the training set; the ap-
proach described in this paper is top-down, characteriz-
ing the lexical hierarchies explicitly rather than implicitly
through machine learning algorithms.
6.2 Using Lexical Hierarchies
Many approaches attempt to automatically assign seman-
tic roles (such as case roles) by computing semantic
similarity measures across a large lexical hierarchy; pri-
marily using WordNet (Fellbaum, 1998). Budanitsky &
Hirst (2001) provide a comparative analysis of such algo-
rithms.
However, it is uncommon to simply use the hier-
archy directly for generalization purposes. Many re-
searchers have noted that WordNet’s words are classi-
fied into senses that are too fine-grained for standard NLP
tasks. For example, Buitelaar (1997) notes that the noun
book is assigned to seven different senses, including fact
and section, subdivision. Thus most users of WordNet
must contend with the sense disambiguation issue in or-
der to use the lexicon.
The most closely related use of a lexical hierarchy
and Hirst, 2001) to attempt to identify which subhierar-
chies are homogeneous. Another approach would be to
see if, after analyzing more CPs, those categories found
to be heterogeneous should be assumed to be heteroge-
neous across classifications, and similarly for those that
seem to be homogeneous.
The second major issue to address is how to extend the
technique to multi-word noun compounds. We will need
to distinguish between NCs such as
acute migraine treat-
ment
and
oral migraine treatment
, and handle the case
when the relation must first be found between the left-
most words. Thus additional steps will be needed; one
approach is to compute statistics to indicate likelihood of
the various CPs.
Finding noun compound relations is part of our larger
effort to investigate what we call statistical semantic pars-
ing (as in (Burton and Brown, 1979); see Grishman
(1986) for a nice overview). For example, we would like
to be able to interpret titles in terms of semantic relations,
for example, transforming
Congenital anomalies of tra-
cheobronchial branching patterns
into a form that allows
questions to be answered such as “What kinds of irreg-
ularities can occur in lung structure?” We hope that by
compositional application of relations to entities, such in-
R. R. Burton and J. S. Brown. 1979. Toward a natural-
language capability for computer-assisted instruction.
In H. O’Neil, editor, Procedures for Instructional Sys-
tems Development, pages 273–313. Academic Press,
New York.
Christiane Fellbaum, editor. 1998. WordNet: An Elec-
tronic Lexical Database. MIT Press.
Timothy W. Finin. 1980. The Semantic Interpretation of
Compound Nominals. Ph.d. dissertation, University of
Illinois, Urbana, Illinois.
Ralph Grishman. 1986. Computational Linguistics.
Cambridge University Press, Cambridge.
Maria Lapata. 2000. The automatic interpretation of
nominalizations. In Proceedings of AAAI.
Mark Lauer and Mark Dras. 1994. A probabilistic model
of compound nouns. In Proceedings of the 7th Aus-
tralian Joint Conference on AI.
Mark Lauer. 1995. Corpus statistics meet the compound
noun. In Proceedings of the 33rd Meeting of the Asso-
ciation for Computational Linguistics, June.
Hang Li and Naoki Abe. 1998. Generalizing case frames
using a thesaurus and the MDI principle. Computa-
tional Linguistics, 24(2):217–244.
Mark Y. Liberman and Kenneth W. Church. 1992. Text
analysis and word pronunciation in text-to-speech syn-
thesis. In Sadaoki Furui and Man Mohan Sondhi, ed-
itors, Advances in Speech Signal Processing, pages
791–831. Marcel Dekker, Inc.
James Pustejovsky, Sabine Bergler, and Peter Anick.
1993. Lexical semantic techniques for corpus analy-