Báo cáo Y học: Identiﬁcation of novel membrane proteins by searching for patterns in hydropathy proﬁles potx - Pdf 11

Identiﬁcation of novel membrane proteins by searching for patterns
in hydropathy proﬁles
John D. Clements and Rowena E. Martin
School of Biochemistry and Molecular Biology, Australian National University, Canberra, Australia
A technique has been developed to search a proteome
database for new members of a functional class of mem-
brane protein. It takes advantage of the highly conserved
secondary structure of functionally related membrane
proteins. Such proteins typically have the same number of
transmembrane domains located at similar relative positions
in their polypeptide sequence. This gives rise to a charac-
teristic pattern of peaks in their hydropathy proﬁles. To
conduct a search, each member of a polypeptide database is
converted to a hydropathy proﬁle, peaks are automatically
detected, and the pattern of peaks is compared with a tem-
plate. A template was designed for the acetylcholine (ACh)
and glycine receptors of the cys-loop receptor superfamily.
The key feature was a closely spaced triplet of hydropathy
peaks bracketed by deep valleys. When applied to the human
proteome the search procedure retrieved 153 proﬁles with a
receptor-like triplet of peaks. The approach was highly
selective with 70% of the retrieved proﬁles annotated as
known or putative receptors. These included ACh, glycine,
c-amino butyric acid and seretonin receptors, which are all
related by sequence. However, ionotropic glutamate recep-
tors, which have almost no sequence homology with ACh
receptors, were also retrieved. Thus, the strategy can ﬁnd
members of a functional class that cannot be identiﬁed by
sequence alignment. To demonstrate that the strategy can
easily be extended to other membrane protein families, a
template was developed for the neurotransmitter/Na

interior of a cell, or subcellular compartment, and trans-
membrane domains provide the physical conduit for the
transfer. Typically, several transmembrane domains com-
bine to form a tightly coupled structure that is intimately
involved in the function of the protein [14]. It follows that
the number and the pattern of transmembrane domains will
be strongly conserved within a functionally related family.
Protein families within which secondary structure is highly
conserved include neurotransmitter receptors, voltage-gated
channels, connexins and transporters (Fig. 1).
The majority of neurotransmitter-activated channels can
be assigned either to the glutamate cationic receptor (iGluR)
superfamily, or the cys-loop receptor superfamily, which
includes acetylcholine (ACh), glycine, c-amino butyric acid
(GABA) and serotonin receptors [15]. Channels from both
superfamilies are formed from subunits that have four
membrane-associated domains. These four domains are
organized as a cluster of three closely spaced domains near
the centre of the polypeptide, and a fourth well separated
domain close to the C-terminal end of the polypeptide
(Fig. 1A) [14]. Despite the similarity of their secondary
structure, there is almost no sequence homology between
the two superfamilies.
Neuronal voltage-gated Na
+
,Ca
2+
and K
+
channel

channels the subunits are
expressed as separate proteins, and the channel forms as a
tetramer of these subunits (Fig. 1B) [14].
Two separate families of membrane proteins form gap-
junctions between mammalian cells (connexins), and
between invertebrate cells (innexins). There is negligible
sequence homology between these families, but they share a
similar secondary structure. Subunits of both connexins and
innexins contain four transmembrane domains, and com-
bine to form dodecamers [14,16–19]. In contrast to ligand-
gated channels, the four transmembrane domains of
connexin and innexin are organized into two closely spaced
pairs, which are separated by an intracellular hydrophilic
loop (Fig. 2D). Many other functionally related protein
families have been identiﬁed where secondary structural
features are better conserved than the underlying amino
acid sequences [20,21].
Despite clear evidence for conservation of secondary
structure, little systematic use has been made of structural
information in proteomic analysis. Most genomic software
Fig. 1. Schematic diagram showing that the pattern of transmembrane
domains is conserved within a functional class of membrane protein.
(A) LGICs typically have a closely spaced cluster of three transmem-
brane domains (dark bars) and a fourth well-separated domain. This
secondary structure is conserved across the cys-loop superfamily and
the iGluR superfamily, even though there is no sequence homology
between these families. Selected subunits from both families are shown.
(B) Distantly related voltage-gated channels also exhibit a character-
istic pattern of transmembrane domains. Channels are formed by four
groups of six transmembrane domains. Within each group, the ﬁrst

membrane [7–11]. These programs are effective when
applied to individual amino acid sequences, but no software
tools are available to automatically analyse the pattern of
putative transmembrane domains (secondary structure).
A method for alignment of hydropathy proﬁles has been
developed [20,21], and an experimental web-based server
uses this approach to align pairs of sequences submitted
by the user, or to search a database for hydropathy proﬁles
that match a submitted sequence (Bioinformatics Unit,
Weizmann Institute of Science). At present, it is limited to
the SwissProt database, and to Hopp–Woods, or Kyte–
Doolittle hydrophobicity scales. In principle, this approach
can be used to search for proteins with conserved secondary
structure, but there are technical issues that limit its
performance. For example, a proﬁle with a similar pattern
of peaks, but differently shaped peaks and valleys may be
missed. It is equally sensitive to mismatches in both peak
(transmembrane) and valley (intra- and extracellular loop)
regions, even though evolutionary changes in valley shape
will have relatively little effect on secondary structure.
In this paper we develop and test a new automated
proteome search technique. Every member of a polypeptide
database is converted to a hydropathy proﬁle, hydropathy
peaks are automatically detected, and the pattern of peaks is
compared with a template. Sequences that match the
template are output to a new database, and their proﬁles
are displayed in a convenient format. This approach can be
used to search for new members of a family or functional
class of membrane protein. It can assist with functional
analysis, and may also be useful in proteome database

hydrophobic and hydrophilic regions, so proﬁles that do
not cross both an upper and lower threshold are also
rejected. These thresholds are the same as those used for
peak detection (Fig. 2). Next, a simple peak-detection
procedure is applied to each hydropathy proﬁle, resulting
in an estimate of the number and the locations of putative
transmembrane helices. The algorithm identiﬁes a peak
when the proﬁle rises from below a base threshold, crosses
above a peak detection threshold, then crosses back below
both the peak and base thresholds. In Fig. 2, the peak and
base thresholds are indicated with the upper two dashed
lines.
1
Different threshold settings are used depending on
the target protein. For example, the base threshold selected
for LGICs is higher than for connexins (Figs 2A–D). The
location and amplitude of each peak is measured at the
maximum point between the two peak threshold crossings.
The width of each peak is measured between the two base
threshold crossings. This gives a more consistent result
than measuring the width at the peak threshold level. The
location and amplitude of each valley minimum is also
measured.
Comparing a proﬁle to a template
After the peaks and valleys are identiﬁed, a test is
performed to determine whether they conform to a
template. The simplest test is to count the peaks and ask
whether this number falls within a speciﬁed range. The
peak count may be adjusted by rejecting narrow peaks, or
by counting a broad peak as two merged peaks. For

false-positives. The ﬁrst goal is achieved by applying the
algorithm to a sequence database containing all proteins
that belong to the family of interest. The parameters are
reﬁned by trial and error until almost all members of the
family are selected. Next the same set of search parameters
is applied to a database containing unrelated membrane
protein sequences. If necessary, the parameters are ﬁne-
tuned until all members of the unrelated family are rejected.
Finally, the search procedure is applied to a large database,
for example one containing the proteome of an organism.
The search algorithm and several related utilities were
written using a development environment that is built
into AxoGraph (Axon Instruments, CA), a scientiﬁc
data analysis and graphics program for Macintosh com-
puters ( The
AxoGraph plug-in programs that implement the search
algorithm are available on request, or from http://
johnc3.anu.edu.au/proteomic_plugins.sea. AxoGraph was
chosen for this study because it can plot and overlay several
thousand hydropathy proﬁles in a single window, and
analyse them in a single operation. It also has convenient
features for browsing and organizing the large number of
proﬁles generated by the search algorithm.
RESULTS
A search strategy was designed for LGICs. The strategy was
reﬁned by applying it to custom polypeptide databases, and
tested by applying it to a database containing the complete
human proteome. This database was chosen because it is
well annotated, which aids in the assessment of the
algorithm’s performance. The results presented below are

threshold of 0.8 HU reliably detected all four peaks in every
proﬁle. However, some of the peaks were measured as very
narrow (only two residues) because the base threshold was
set relatively high. Therefore, narrow peaks were not
rejected. A putative transmembrane domain occasionally
appeared as two narrow peaks. Therefore, a pair of peaks
separated by fewer than six residues were counted as a single
peak. We noted that the ﬁrst and last peaks in the
characteristic cluster of peaks were separated by between
55 and 66 residues. Thus, the template criterion for a LGIC
was the presence of a cluster of three peaks separated by
between 50 and 75 residues, bounded by deep valleys of
< )2.5 HU. The cluster had to be followed by at least one
additional peak, but no more than three peaks.
Testing the LGIC search strategy
A search of the AChR and GlyR database using the above
detection criteria correctly retrieved every one of the 119
proﬁles. Thus, the search strategy exhibits excellent sensi-
tivity, as it was able to detect 100% of known GlyR and
AChR across a range of species.
The accuracy and sensitivity of the search strategy were
tested by applying it to a custom database containing
GABA
A
receptor sequences retrieved via a text search of
the Entrez database. GABA
A
receptors are also members of
the cys-loop superfamily, but they were not used during the
selection and tuning of the search parameters. The algo-

cluster separated by deep valleys (Fig. 3A). It was noted
that the valleys between the triplet peaks were usually
2104 J. D. Clements and R. E. Martin (Eur. J. Biochem. 269) Ó FEBS 2002
deeper for potassium channels and transporters than for
LGICs. The receptor detection algorithm was reﬁned
to eliminate proﬁles where the deeper of the two valleys
between the triplet peaks extended below )1.5 HU. This
reﬁned algorithm was still able to detect 99% of known
GlyR and AChR. It retrieved 87 proﬁles from the human
proteome, of which 90% were receptors. Although this
reﬁned search procedure increased the selectivity for recep-
tors, it also failed to retrieve any iGluRs. This illustrates the
inevitable trade-off between the selectivity of the search
algorithm and the likelihood of detecting distantly related
functional homologues.
The search strategy’s sensitivity to membrane proteins
that were related to the target group by function but not
by sequence, was investigated further. A custom database
containing 84 sequences from the iGluR superfamily was
constructed using Entrez. It included the NMDA, kainate
and a-amino-3-hydroxy-5-methyl-4-isoxazole propionate
(AMPA) receptor subtypes. These receptors are function-
ally related to GlyRs and AChRs, but share almost no
sequence homology. Also, iGluRs are thought to form
tetrameric channels, in contrast with the cys-loop super-
family that forms pentameric channels. Despite these
differences, the search algorithm retrieved 30 sequences
(36%) from the iGluR database. By subtype, 90% of the
kainate receptors in the database were detected, but only
36% of the NMDA receptors, and 1% of the AMPA

BLAST
search revealed weak homol-
ogy with a section of an intrinsic factor-vitamin B12
receptor. The proﬁle is quite similar to a typical LGIC,
although a small narrow peak precedes the main triplet
(Fig. 3D). These ﬁndings demonstrate how the hydropathy
Fig. 3. Hydropathy proﬁles of four proteins that were retrieved from the
human proteome by a search strategy designed to detect LGICs, but were
not annotated as receptors. (A) A voltage-gated potassium channel was
incorrectly retrieved because its ﬁrst two hydropathy peaks fell just
below the detection threshold. Potassium channels typically have a
cluster of ﬁve peaks followed but a sixth well-separated peak. Note that
although only one peak following the valley is highlighted, the tem-
plate will accept up to three peaks. (B) An ancient conserved domain
protein with no known function was retrieved because of its receptor-
like cluster of three transmembrane peaks bracketed by deep valleys.
The separation between the cluster and the fourth peak was larger than
for a typical LGIC, but otherwise the secondary structure is strikingly
similar. (C) An uncharacterized hypothalamus protein is unlikely to be
a LGIC, despite the fact that it is expressed in a brain region. It has two
or three extra peaks before and after the triplet, giving it a secondary
structure that has more in common with a voltage-gated channel or a
transporter. (D) A retrieved protein that was simply annotated
ÔunknownÕ, but which has weak sequence homology with an intrinsic
factor-vitamin B12 receptor.
Ó FEBS 2002 Hydropathy proﬁle search (Eur. J. Biochem. 269) 2105
peak detection algorithm may be used to search for truly
novel members of a functional class of membrane proteins.
Search strategy for neurotransmitter/Na
+

2+
antiporters (9%), Na
+
/
glucose symporters (7%), K
+
/Cl
)
symporters (5%), Na
+
/
nucleoside transporters (3%), and organic ion transporters
(3%) (Fig. 4C). Thus, the search algorithm again succeeded
in identifying proteins that were functionally related to the
target group, but were not related by sequence homology.
DISCUSSION
We have developed and tested an algorithm that can scan a
large polypeptide database, and retrieve membrane proteins
on the basis of secondary structure rather than sequence
homology. The algorithm locates putative transmembrane
domains in each sequence, and tests whether their spatial
pattern matches a template. In the past this process has been
performed manually, by visual inspection of hydropathy
plots generated one at a time. Our major innovation was to
automate the process, and apply it on the proteome scale. A
computer program performs the peak detection and tem-
plate matching. The complete proteome of an organism can
be scanned in about 1 min using a desktop personal
computer. This represents a qualitative increase in the
power of the technique, and it permits new questions to be

membrane-associated domains represent an important
component of the highly conserved secondary structure
Fig. 4. The conserved secondary structure of neurotransmitter/Na
+
symporters is reﬂected in a characteristic pattern of peaks in their
hydropathy proﬁles. (A) The hydropathy proﬁle of a rat dopamine
symporter reveals a pair of peaks followed by a deep valley, then a
cluster of nine peaks. The peak, base and valley threshold levels used
by the search algorithm are shown as horizontal dashed lines. (B) A
similar pattern of peaks and valleys is seen in the proﬁle of a closely
related rat GABA symporter. (C) A human Na
+
-independent organic
anion transporter retrieved by the NSS symporter template exhibits a
similar pattern of peaks, although it has no sequence homology with
the neurotransmitter symporters.
2106 J. D. Clements and R. E. Martin (Eur. J. Biochem. 269) Ó FEBS 2002
of voltage-gated potassium channels, and similar hairpin
structures may also be present in other membrane proteins
[22]. A sophisticated a-helix-detection algorithm may reject
or misinterpret such regions.
Our approach is loosely analogous with a strategy that
uses alignment of hydropathy proﬁles to search for
conserved secondary structural features in polypeptide
sequences [20,21]. This alignment technique is based on
the same algorithm that is used in standard peptide and
nucleotide sequence alignment, but is applied to sequences
of hydropathy values. Proﬁle alignment will generally
provide a more stringent test for conserved structure than
our template-matching approach. However, a more strin-

niques. We have demonstrated how this can be achieved for
LGICs, and for neurotransmitter symporters. Other candi-
date families include voltage-gated ion channels, G-protein
coupled receptors, connexins and a wide variety of trans-
porters.
ACKNOWLEDGEMENTS
This work was supported by a Senior Research Fellowship from the
Australian Research Council (J. D. C.) and an Australian Postgradu-
ate Award (R. E. M.).
REFERENCES
1. Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.C. &
Herrmann, R. (1996) Complete sequence analysis of the genome
of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24,
4420–4449.
2. Frishman, D. & Mewes, H.W. (1997) Protein structural classes in
ﬁve complete genomes. Nat. Struct. Biol. 4, 626–628.
3. Wallin, E. & von Heijne, G. (1998) Genome-wide analysis of
integral membrane proteins from eubacterial, archaean, and
eukaryotic organisms. Protein Sci. 7, 1029–1038.
4. Deisenhofer, J., Remington, S.J. & Steigemann, W. (1985)
Experience with various techniques for the reﬁnement of protein
structures. Methods Enzymol. 115, 303–323.
5. Kyte, J. & Doolittle, R.F. (1982) A simple method for displaying
the hydropathic character of a protein. J. Mol. Biol. 157, 105–132.
6. Engelman, D.M., Steitz, T.A. & Goldman, A. (1986) Identifying
nonpolar transbilayer helices in amino acid sequences of
membrane proteins. Annu. Rev. Biophys. Biophys. Chem. 15,
321–353.
7. Jones, D.T., Taylor, W.R. & Thornton, J.M. (1994) A model
recognition approach to the prediction of all-helical membrane

17. Unger, V.M., Kumar, N.M., Gilula, N.B. & Yeager, M. (1999)
Three-dimensional structure of a recombinant gap junction
membrane channel. Science 283, 1176–1180.
18. Bennett, M.V., Barrio, L.C., Bargiello, T.A., Spray, D.C.,
Hertzberg, E. & Saez, J.C. (1991) Gap junctions: new tools, new
answers, new questions. Neuron 6, 305–320.
19. Ganfornina, M.D., Sanchez, D., Herrera, M. & Bastiani, M.J.
(1999) Developmental expression and molecular characterization
of two gap junction channel proteins expressed during embry-
ogenesis in the grasshopper Schistocerca americana. Dev. Genet.
24, 137–150.
20. Lolkema, J.S. & Slotboom, D.J. (1998) Estimation of structural
similarity of membrane proteins by hydropathy proﬁle alignment.
Mol. Membr. Biol. 15, 33–42.
21. Lolkema, J.S. & Slotboom, D.J. (1998) Hydropathy proﬁle
alignment: a tool to search for structural homologues of mem-
brane proteins. FEMS Microbiol. Rev. 22, 305–322.
22. Wood, M.W., VanDongen, H.M. & VanDongen, A.M. (1995)
Structural conservation of ion conduction pathways in K channels
and glutamate receptors. Proc. Natl. Acad. Sci. USA 92, 4882–
4886.
Ó FEBS 2002 Hydropathy proﬁle search (Eur. J. Biochem. 269) 2107

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Báo cáo Y học: Identiﬁcation of novel membrane proteins by searching for patterns in hydropathy proﬁles potx - Pdf 11

Tài liệu, ebook tham khảo khác

Học thêm