Báo cáo khoa học: Analysis of ancient sequence motifs in the H+-PPase family doc - Pdf 12

Analysis of ancient sequence motifs in the H
+
-PPase family
Joel Hedlund
1
*, Roberto Cantoni
2,3,4
*, Margareta Baltscheffsky
2
, Herrick Baltscheffsky
2
and Bengt Persson
1,4
1 IFM Bioinformatics, Linko
¨
ping University, Sweden
2 Department of Biochemistry and Biophysics, Arrhenius Laboratories, Stockholm University, Sweden
3 Department of Physical Sciences, ‘Federico II’ University of Naples, Italy
4 Department of Cell and Molecular Biology (CMB), Programme for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
Membrane-bound inorganic pyrophosphatase ⁄ pyro-
phosphate synthase (H
+
-PPase ⁄ H
+
-PP
i
synthase) [1,2]
activities were ﬁrst described in chromatophores from
the purple nonsulphur photosynthetic bacterium, Rho-
dospirillum rubrum, where the enzyme functions as a
proton pump [3]. The gene for H

uence motifs are denoted ‘primitive’ because they have
a high content of the four ‘very early’ proteinaceous
amino acids G (glycine), A (alanine), D (aspartic acid)
and V (valine). In 1978, Eigen & Schuster [8] proposed
Keywords
bioinformatics; hidden Markov models;
molecular evolution; proteinaceous amino
acids; pyrophosphatase
Correspondence
B. Persson, IFM Bioinformatics, Linko
¨
ping
University, S-581 83 Linko
¨
ping, Sweden
Fax: +46 13 137 568
Tel: +46 13 282 983
E-mail:
*These authors contributed equally to this
work
(Received 9 August 2006, revised 26 Sep-
tember 2006, accepted 27 September 2006)
doi:10.1111/j.1742-4658.2006.05514.x
The unique family of membrane-bound proton-pumping inorganic pyro-
phosphatases, involving pyrophosphate as the alternative to ATP, was
investigated by characterizing 166 members of the UniProtKB ⁄ Swiss-
Prot + UniProtKB ⁄ TrEMBL databases and available completed genomes,
using sequence comparisons and a hidden Markov model based upon a
conserved 57-residue region in the loop between transmembrane segments
5 and 6. The hidden Markov model was also used to search the approxi-

dence supporting the role of these four proteinaceous
amino acids as ‘very early’ [5] has been further con-
ﬁrmed by Trifonov [9]. Notably, experiments by Miller
[10], leading to the synthesis of amino acids under cer-
tain ‘simulated prebiotic’ conditions, gave these four
amino acids in the highest yield, and they all were
among the amino acids found in the Murchison mete-
orite [11]. Both the high content of the ‘very early’
amino acids in these sequence motifs, and the fact that
the motifs have remained essentially unchanged during
billions of years of biological evolution, from Archaea
and Bacteria to Eukaryotes, provide a background for
our fascination with various aspects of their evolution
and function.
As pointed out previously [5], the GGG motif of
the R. rubrum H
+
-PPase may be of special functional
signiﬁcance in a possible conformational change
mechanism for the physiological coupling between the
light-induced pumping of protons and the photo-
phosphorylation of inorganic phosphate (P
i
)toPP
i
[2]
or its reversal, the dark hydrolysis of PP
i
to P
i

served energy-transferring enzyme family.
The possibility has been considered that PP
i
could
have been a predecessor to ATP, and that H
+
-PPases
could have been direct or indirect evolutionary ances-
tors of ATPases [5,13]. From numerous genome
projects, a large number of both prokaryotic and euk-
aryotic H
+
-PPase sequences are now known. Our new
overview of the high content of strongly conserved,
very early amino acids in the above shown three puta-
tive active site motifs in the loop between transmem-
brane segments 5 and 6 deserves a closer look for
existing sequence similarities in other known polypep-
tides. We thus also evaluate the possibility that the
putative active site may have evolved from an original
enzyme structure involved in an emerging metabolism
of energy-rich phosphate [6]. A recent, additional indi-
cation in this direction was given by the discovery
that shifting the growth conditions of R. rubrum from
aerobic ⁄ dark to anaerobic ⁄ light resulted in concerted
transcriptional activation of both H
+
-PPase and pho-
tosynthetic genes, by the same anaerobic regulatory
factor [14], pointing towards PP

genomes and the UniProtKB protein sequence data-
base [15] were searched using fasta [16], resulting in
characterization of over 100 different members. The
sequences were multiply aligned, revealing a number of
well-conserved regions, especially the highly conserved
57-residue segment in the loop between transmembrane
segments 5 and 6. This segment was used to create a
hidden Markov model (HMM), which subsequently
was used to search for further homologues in the data-
bases of UniProtKB and the currently available
genomes, in order to identify further family members.
Using this HMM, 166 sequences were found (supple-
mentary Table S1). Remarkably, only other H
+
-
PPases were found using this strategy, indicating high
speciﬁcity of the model. The poorest scoring H
+
-PPase
sequence had an E-value of 8.3e-25 (i.e. under the
circumstances of the search, only 8.3e-25 unrelated
Ancient sequence motifs in the H
+
-PPase family J. Hedlund et al.
5184 FEBS Journal 273 (2006) 5183–5193 ª 2006 The Authors Journal compilation ª 2006 FEBS
sequences could be expected to attain the same match
quality by chance alone) [17]. Thus, the E-value is
highly statistically signiﬁcant. Furthermore, the best
scoring non-H
+

in plants and in a few blood parasites, such as Tetra-
hymena and Plasmodium.
The fact that four ‘very early’ amino acid residues
have been retained indicates very early optimization.
This may be a rather unusual situation compared with
early motifs in other proteins, where stepwise evolution
of the motifs may have provided further optimization,
through introduction of later amino acids.
Different H
+
-PPase subfamilies
In order to characterize the interfamily relationships, a
dendrogram was calculated (Fig. 1) based upon the
multiple sequence alignment. The tree shows that H
+
-
PPases form two subfamilies, corresponding to type 1
and type 2 [19,20]. Several species present multiple
forms of H
+
-PPases, which in many cases are distantly
related (37–39% pairwise residue identity) and belong
to the separate types. For cress (Arabidopsis thaliana),
there are ﬁve type 1 and six type 2 enzymes, whereas
for rice (Oryza sativa) there exist 18 enzymes of type 1
and two enzymes of type 2. Among plants, the multi-
plicity has so far only been seen in organisms for
which the complete genomes are available, and this
distantly related multiplicity can thus be expected to
occur also in further plants. The blood parasites Tetra-

by bullet symbols). Functional impact has already been
conﬁrmed for one of these differences (position 507 in
HPPA_STRCO), as an Ala⁄ Lys mutation introduced
at the corresponding position in Carboxydothermus
hydrogenoformans type 1 H
+
-PPase has been shown
to confer the potassium independence of type 2 H
+
-
PPases to the enzyme [22]. At position 253 in the loop
between transmembrane segments 5 and 6, close to
one of the conserved nonapeptides, the type 1 enzymes
have predominantly hydrophobic Ile or Val, while
type 2 enzymes have polar Cys or Ser.
Furthermore, there are two exchanges within weakly
conserved clustalw groups of residues [23] (Fig. 2,
open ring symbols). At position 266 in transmembrane
segment 6, the type 1 residues are Glu or Gly, while
the type 2 residues are Val or Ala. At position 510, in
loop 11–12 close to the membrane, type 1 enzymes
prefer Gly, while type 2 enzymes prefer Thr or Ala.
The remaining three residue exchanges are within
clustalw groups, and these are all located in trans-
membrane segments (Fig. 2).
Conserved regions within H
+
-PPases
From the multiple sequence alignment of all H
+

groups of residues’. Bullet symbols on the backbone denote conserved substitutions between two residues that do not occur together in
any of the
CLUSTALW groups. This latter form of substitution implies functional impact. Also, in order to count as a conserved substitution,
none of the residue types A or B can be identical to the residue types C or D.
Fig. 1. Dendrogram of H
+
-PPases. The dendrogram is based upon the multiple sequence alignment of all PPase sequences found in Uni-
ProtKB ⁄ Swiss-Prot, UniProtKB ⁄ TrEMBL and GenomeLKPG databases, with removal of sequences showing 90% or more identity to any of
the other sequences in the alignment. Two sequences – Q420R8_DESHA and Q4CED6_CLOTM – show unclear relationships and were
excluded without affecting the general tree topology. Red marks at branch points indicate bootstrap values below 888 ⁄ 1000, the bootstrap
value at the branch point separating Type 1 (K
+
-dependent) from Type 2 (K
+
-independent) enzymes. The horizontal bar shows the branch
length corresponding to 5% residue differences. Each branch end-point is designated with identiﬁers from UniProtKB ⁄ Swiss-Prot, Uni-
ProtKB ⁄ TrEMBL or the respective genome project, preﬁxed indicators of kingdom (A, archaeal; B, bacterial; E, eukarytoic), species group (A,
Alveolata; E, Euglenozoa; P, Plant; V, Vertebrates), and PPase type (1 or 2). Archaea are labelled yellow, bacteria blue, and plants green. The
red-labelled sequences originate from primitive eukaryotes (alveolates and euglenozoans). Two sequences from the unﬁnished Xen-
opus tropicalis genome project are shown in purple. Accession numbers are given within parentheses after the UniProtKB ⁄ Swiss-Prot ID.
For UniProtKB ⁄ TrEMBL IDs, the accession number forms the ﬁrst part of the ID. All type 1 PPases are found in the upper part of the dendro-
gram, while type 2 PPase are in the lower part. Several plants and protozoan organisms have both type 1 and type 2 PPases, whereas most
of the bacterial forms only present one of the types.
J. Hedlund et al. Ancient sequence motifs in the H
+
-PPase family
FEBS Journal 273 (2006) 5183–5193 ª 2006 The Authors Journal compilation ª 2006 FEBS 5187
membrane, while only few residues on the noncytosolic
side are conserved. Furthermore, there are ﬁve seg-
ments that are seen as distinct peaks, indicating strong

residues, of which the ﬁrst, ﬁfth and ninth are charged.
Fig. 3. Conservation plot for H
+
-PPases and conserved sequence motifs. From the multiple sequence alignment of H
+
-PPases, CLUSTALW col-
umn scores were averaged for ungapped 11-residue windows of the reference sequence from Streptomyces coelicolor, HPPA_STRCO, and
plotted in green. The predicted membrane topology is shown in blue, where high values indicate the noncytosolic side and low values the
cytosolic side, whereas medium values correspond to transmembrane regions. The plot shows that the conserved regions coincide with
cytoplasmic and transmembrane regions. There are ﬁve regions with distinct peaks above 55% column score (dotted line), indicating strong
conservation. The sequences for these regions in S. coelicolor are shown below the plot. In region 1, the well-conserved patterns, including
the two nonapeptides, are highlighted in red. Distantly related nonapeptides in regions 4 and 5 are also highlighted in red, together with a
GGS pattern, possibly corresponding to the GGG pattern in region 1 (cf. text).
Ancient sequence motifs in the H
+
-PPase family J. Hedlund et al.
5188 FEBS Journal 273 (2006) 5183–5193 ª 2006 The Authors Journal compilation ª 2006 FEBS
The number of amino acid residues separating the three
motifs is remarkably constant in all the H
+
-PPases.
The nonapeptide DVGADLVGK was seen to have
a partial counterpart in the P-loop [24] of the active
b-subunit of ATP synthase. In an alignment of PP
i
synthase from R. rubrum with the P-loop in animal
mitochondrial ATP synthase, four of eight amino acid
residues were found to be identical [5].
In order to investigate further the evolutionary vari-
ation of this 57-residue region, we applied the HMM

FEBS Journal 273 (2006) 5183–5193 ª 2006 The Authors Journal compilation ª 2006 FEBS 5189
enzymes, possibly being visible traces of an ancient
gene duplication. This second region is located at resi-
dues 738–785 (numbering according to the A. thaliana
sequence with accession number Q9FWR2 (Uni-
ProtKB Q9FWR2_ARATH)). This region forms loop
15–16, located at the cytosolic side according to experi-
mental investigations [26] (cf. Fig. 3). The patterns of
this second region are also seen in further species vari-
ants, but are most clearly distinguishable in the plant
sequences. According to the three well-conserved
sequence segments of H
+
-PPases, the second nonapep-
tide motif is well conserved, with all three aspartic acid
residues unchanged (Fig. 3, marked residues in regions
1 and 5). For the ﬁrst nonapeptide motif, charges are
found at positions 1, 5 and 9 in the order Asp, Asp,
Lys in region 1, whereas the order is Asp, Lys, Asp in
region 4. Notably, the positional spacing between the
two nonapeptides in region 1 and region 4+5 is identi-
cal (26 residues). Furthermore, the GGG motif pre-
ceding the nonapeptides in region 1 could correspond
to a GGA or GGS motif, preceding the nonapeptide
in region 4 (Fig. 3, marked residues in peptides 1 and
4). Thus, these observations, taken together, might
reﬂect an ancient gene duplication event, as previously
suggested [5].
Occurrence of the typical H
+

not related to the H
+
-PPases, and two hypothetical pro-
teins from Neurospora crassa (Q871A9_NEUCR and
Q7RZ15_NEUCR). Furthermore, as seen in Table 1,
four H
+
-PPases are not detected by the ﬁrst nonapep-
tide pattern, because those proteins have one atypical
amino acid residue in this pattern. Thus, the nonapep-
tide patterns are, with these few exceptions, speciﬁc for
the H
+
-PPases. As seen in Table 1, the number of non-
H
+
-PPase hits increases dramatically when the patterns
are extended to DXXXDXXGK and DXXGDNXGD,
respectively, where X represents any amino acid residue.
We extended the pattern search to screen the Uni-
ProtKB ⁄ Swiss-Prot database for very simple motifs of
possible ancestral signiﬁcance, with an alternation of
Asp and one of the other ‘very early’ amino acids (e.g.
DADADADAD) (Table 2). It can be seen that the pat-
tern VDVDV is under-represented compared with the
patterns ADADA and GDGDG, even when considering
the general frequencies of the residues (V, 6.7%; A,
7.9%; G, 7.0%). Similarly, the DGDGD pattern is
over-represented compared with the patterns DADAD
and DVDVD. This over-representation is still present

Second nonapeptide
DNVGDNVGD 29 29 29 93 93 93
DLVGDNVGD 1 1 1 15 15 15
DCTGDNAGD 1 1 1 5 5 5
DCIGDNVGD 0 0 0 7 7 7
DFVGDNVGD 0 0 0 2 2 2
DCAGDNAGD 0 0 0 2 2 2
DLTGDNAGD 0 0 0 2 2 0
DppGDNpGD 31 31 31 126 126 124
DXXGDNXGD 31 31 31 160 175 125
Ancient sequence motifs in the H
+
-PPase family J. Hedlund et al.
5190 FEBS Journal 273 (2006) 5183–5193 ª 2006 The Authors Journal compilation ª 2006 FEBS
after homology reduction (at the 80% and 60% levels)
and might well reﬂect structural properties.
We also searched the complete UniProtKB (Swiss-
Prot + TrEMBL) database for patterns with altera-
tions of any two ‘very early’ amino acid residues. The
largest number of proteins was found for the sequences
AGAGA (6185) and GAGAG (6203), in agreement
with the assumption that G and A are both very early,
ﬂexible and frequent amino acids. Close to 6000 results
were also reported for the pattern AVAVA (5865),
while much smaller numbers were found for GVGVG
(3386), ADADA (2383), DGDGD (2100) and
DVDVD (852). The small difference in frequencies
between V and D (6.7% and 5.3%, respectively) in
known proteins does not fully explain the discrepancy
between AVAVA and ADADA.

-PPase. The only
patterns found in the proteins were DNVGDNVGD,
unique to the H
+
-PPase family, and DNNNDNNND,
in the spindle assembly checkpoint component MAD1
from Saccharomyces cerevisiae (Mitotic arrest deﬁcient
protein 1; UniProtKB ⁄ Swiss-Prot ID MAD1_YEAST).
We thus concluded that the charged residues (1, 5 and 9)
of the two nonapeptides form a unique and unaltered
pattern, presumably with critical function and charac-
teristics of the H
+
-PPase family.
Putative metal-binding patterns
Asp residues are strictly conserved in the H
+
-PPase
nonapeptide motifs
DVGADLVGK and DNVGD
NVG
D. The residues aspartic acid (Asp) and glutamic
acid (Glu) can act as metal ligands in various proteins
[28,29]. UniProtKB ⁄ Swiss-Prot was screened for pat-
terns of nine amino acid residues with either Asp or Glu
at every fourth position (1, 5 and 9) and allowing any
residue at the remaining positions. If the sequence
formed an a-helix, with one turn every 3.6 residues, the
charged residues would be facing the same side, to facili-
tate metal-binding properties at the active site.

-PPase
family. Our analyses with bioinformatic methods have
Table 2. Sequence patterns containing the four ‘primitive’ amino
acid residues searched in the UniProtKB ⁄ Swiss-Prot database.
Pattern
Number of
proteins
Number of
occurrences
A-D-A-D-A 162 173
D-A-D-A-D 104 112
A-D-A-D-A-D-A-D-A 22
D-A-D-A-D-A-D-A-D 22
D-A-A-A-D-A-A-A-D 00
D-V-D-V-D 74 79
V-D-V-D-V 55 59
D-V-D-V-D-V-D-V-D 12
V-D-V-D-V-D-V-D-V 12
D-V-V-V-D-V-V-V-D 00
D-G-D-G-D 140 145
G-D-G-D-G 101 108
D-G-D-G-D-G-D-G-D 00
G-D-G-D-G-D-G-D-G 00
D-G-G-G-D-G-G-G-D 00
J. Hedlund et al. Ancient sequence motifs in the H
+
-PPase family
FEBS Journal 273 (2006) 5183–5193 ª 2006 The Authors Journal compilation ª 2006 FEBS 5191
shown that the H
+

the dynamic aspects of H
+
-PPase function.
Experimental procedures
Pyrophosphatase sequences were searched using blast [30]
towards UniProtKB, version 6.3 (October 2005, http://
www.uniprot.org) [15], and an in-house database of all
genomes in the public domain (ftp.ensemble.org; ftp.ncbi.
nih.gov; ftp.tigr.org), denoted GenomeLKPG (A. Bresell and
J. Hedlund, Linko
¨
ping University, Sweden, personal commu-
nication). The searches were complemented by HMM-based
screenings based upon the ‘H_PPase’ model from Pfam [31].
We also built and calibrated our own HMM, based upon 86
sequences. For the creation of HMM and screenings, the
hmmer software () [17] was used with
default parameters for building and calibrating (commands
‘hmmbuild’ and ‘hmmcalibrate’).
General sequence comparisons were made using the pro-
gram fasta [16] and pattern searches using the ps_scan
utility from the Prosite database [32].
In the multiple sequence alignments, sequences annotated
as fragments, and those shorter than 300 residues, were
removed to improve the alignment quality. In the phylo-
genetic trees, sequences with pairwise residue identity of
more than 90% to any other sequence were excluded. Mul-
tiple sequence alignments were calculated using dialign
[33], and dendrograms were generated using the neighbour
joining method, as implemented in clustalx [23].

4 Baltscheffsky M, Nadanaciva S & Schultz A (1998) A
pyrophosphate synthase gene: molecular cloning and
sequencing of the cDNA encoding the inorganic pyro-
phosphate synthase from Rhodospirillum rubrum. Bio-
chim Biophys Acta 1364, 301–306.
5 Baltscheffsky M, Schultz A & Baltscheffsky H (1999)
H
+
-PPases: a tightly membrane-bound family. FEBS
Lett 457, 527–533.
6 Baltscheffsky H, Schultz A, Persson B & Baltscheffsky M
(2001) Tetra- and nonapeptidyl motifs in the origin and
evolution of photosynthetic bioenergy conversion. In
First Steps in the Origin of Life in the Universe (Chela
Flores J, Owen T & Raulin F, eds), pp. 173–178. Kluwer,
Dordrecht.
7 Nakanishi Y, Saijo T, Wada Y & Maeshima M (2001)
Mutagenic analysis of functional residues in putative
substrate-binding site and acidic domains of vacuolar
H+-pyrophosphatase. J Biol Chem 276, 7654–7660.
8 Eigen M & Schuster P (1978) The hypercycle. Naturwis-
senschaften 65, 341–369.
9 Trifonov EN (2000) Consensus temporal order of
amino acids and evolution of the triplet code. Gene 261,
139–151.
10 Miller SL (1953) A production of amino acids under
possible primitive earth conditions. Science 117, 528–
529.
Ancient sequence motifs in the H
+

requirements and discovery of a Na+-dependent enzyme ,
Dissertation, University of Turku, Finland.
19 Drozdowicz YM, Kissinger JC & Rea PA (2000) AVP2,
a sequence-divergent, K(+)-insensitive H(+)-translocat-
ing inorganic pyrophosphatase from Arabidopsis. Plant
Physiol 123, 353–362.
20 Belogurov GA, Turkina MV, Penttinen A, Huopalahti
S, Baykov AA & Lahti R (2002) H+-pyrophosphatase
of Rhodospirillum rubrum. High yield expression in
Escherichia coli and identiﬁcation of the Cys residues
responsible for inactivation my mersalyl. J Biol Chem
277, 22209–22214.
21 Tammenkoski M, Benini S, Magretova NN, Baykov AA
& Lahti R (2005) An unusual, His-dependent family I
pyrophosphatase from Mycobacterium tuberculosis.
J Biol Chem 280, 41819–41826.
22 Belogurov GA & Lahti R (2002) A lysine substitute for
K
+
. A460K mutation eliminates K
+
dependence in
H
+
-pyrophosphatase of Carboxydothermus hydrogeno-
formans. J Biol Chem 277 , 49651–49654.
23 Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ,
Higgins DG & Thompson JD (2003) Multiple sequence
alignment with the Clustal series of programs. Nucleic
Acids Res 31, 3497–3500.

issue).
32 Sigrist C, Cerutti L, Hulo N, Gattiker A, Falquet L,
Pagni M, Bairoch A & Bucher P (2002) PROSITE: a
documented database using patterns and proﬁles as
motif descriptors. Brief Bioinform 3, 265–274.
33 Morgenstern B (1999) DIALIGN 2: improvement of the
segment-to-segment approach to multiple sequence
alignment. Bioinformatics 15, 211–218.
34 Crooks GE, Hon G, Chandonia JM & Brenner SE
(2004) WebLogo: a sequence logo generator. Genome
Res 14, 1188–1190.
Supplementary material
The following supplementary material is available
online:
Table S1. List of the 166 proton-pumping inorganic
pyrophosphatase (H
+
-PPase) sequences found in Uni-
ProtKB ⁄ Swiss-Prot, UniProtKB ⁄ TrEMBL, and Geno-
meLKPG databases.
Table S2. List of the 164 proton-pumping inorganic
pyrophosphatase (H
+
-PPase) partial sequences from
the Sargasso Sea sequencing project.
This material is available as part of the online article
from
J. Hedlund et al. Ancient sequence motifs in the H
+
-PPase family

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Báo cáo khoa học: Analysis of ancient sequence motifs in the H+-PPase family doc - Pdf 12

Tài liệu, ebook tham khảo khác

Học thêm