Báo cáo khoa học: Genome-wide analysis of clustering patterns and flanking characteristics for plant microRNA genes doc - Pdf 12

Genome-wide analysis of clustering patterns and flanking
characteristics for plant microRNA genes
Meng Zhou
1,
*, Jie Sun
1,
*, Qiang-Hu Wang
1,
*, Li-Qun Song
2
, Guang Zhao
1
, Hong-Zhi Wang
2
,
Hai-Xiu Yang
1
and Xia Li
1
1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
2 Department of Internal Medicine, Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, China
Introduction
MicroRNAs (miRNAs), 21–24 nucleotides in
length, are a large class of endogenous, noncoding
small RNA molecules that regulate gene expression
at the post-transcriptional level in animals and plants
[1–4]. The first microRNA – lin-4 – was discovered
in 1993 in Caenorhabditis elegans through forward
genetic screens [5]. The first plant miRNA was dis-
covered in Arabidopsis thaliana in 2002 [6,7]. Plant
miRNA genes are mostly transcribed into primary

plant miRNA genes, we performed genome-wide analyses of Arabidopsis
thaliana, Populus trichocarpa, Oryza sativa and Sorghum bicolor. Our
results showed that miRNA pair distances were significantly higher than
would have been expected to occur at random and that the number of
miRNA gene pairs separated by very short distances of < 1 kb was higher
than of protein-coding gene pairs. Analysis of the promoter architecture of
different miRNA genes in plants revealed significant differences in the
number and distribution of core promoters between intergenic miRNAs
and intragenic miRNAs, and between highly conserved miRNAs and low
conserved or nonconserved miRNAs. We applied two motif-finding algo-
rithms to search for over-represented, statistically significant sequence
motifs, and discovered six species-specific motifs across the four plant spe-
cies studied. Moreover, we also identified, for the first time, several signifi-
cantly over-represented motifs that were associated with conserved
miRNAs, and these motifs may be useful for understanding the mechanism
of origin of new plant miRNAs. The results presented provide a new
insight into the transcriptional regulation and processing of plant miRNAs.
Abbreviations
miRNA, microRNA; Pol II, RNA polymerase II; pri-miRNAs, primary miRNA transcripts; TSSs, transcription start sites.
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 929
into AGO proteins to carry out the silencing reactions
[1,2].
In plants, Xie et al. [8] identified transcription start
sites (TSSs) for 63 miRNA primary transcripts in
A. thaliana and found the TATA box motif in their
core promoter regions. Unlike animal miRNAs, the
vast majority of plant miRNAs are intergenic but not
intronic [2,9]. Several studies have characterized the
upstream sequences of intergenic miRNAs in model
organisms and found the same type of promoters as in

lation mechanisms for plant miRNAs. In our study,
we performed computational approaches, based on
genome-wide analyses, to examine the clustering pat-
terns of plant miRNAs. In addition, we analyzed
regions, up to 2 kb upstream and up to 1 kb down-
stream, of miRNA stem–loop sequences in four plant
species, to identify characteristic sequence motifs. We
hope that the present results can improve the current
understanding of transcriptional regulation and pro-
cessing of plant miRNAs and provide useful knowledge
for understanding the mechanism of the origin and
computational identification of new miRNAs in plants.
Results and Discussion
Analysis of clustering patterns of miRNA genes
in four plant genomes
Many previous studies have shown that miRNA genes
tend to be present as clusters within a region of several
kilobases in animal genomes [17–20]. In contrast, plant
miRNA genes are rarely arranged in tandem [1]. To
further explore the clustering patterns of miRNAs in
plant genomes, we computed the distances between
same-strand consecutive miRNA genes of four plant
species to analyze the distance distribution of miRNA
genes in different plant species based on reported miR-
Base coordinates. The cumulative distance distribution
of the miRNA gene pairs is presented in Fig. 1 and
shows that 17.71%, 26.94% and 29.07% of the miR-
NA gene pairs are separated by regions of < 1, 10
and 100 kb, respectively, which are much smaller than
the regions separating animal miRNA gene pairs. Fur-

cific clustering pattern of miRNAs may be indicative
of functional divergence of the miRNA cluster in
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
930 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
miRNA-mediated gene regulation between monocots
and eudicots. Furthermore, miRNA clusters in plants
are frequently found to have smaller size of cluster
compared with miRNA clusters in animals (P < 0.01;
two-sample t-test). In animals, a large proportion of
known miRNAs are arranged in clusters. For example,
48% of human miRNAs appear as clusters within a
maximum inter-miRNA distance of 10 kb [21] and
50% of miRNAs appear as clusters within a maximum
inter-miRNA distance of 3 kb in the zebrafish genome
[22]. In contrast to patterns of clustering found in ani-
mal miRNAs, only a small proportion of plant miR-
NAs (25.35% in A. thaliana, 17.09% in P. trichocarpa,
22.29% in O. sativa and 21.62% in S. bicolor) were
found to be clustered within a 10-kb region in our
study. It has been demonstrated that miRNA families
are preferentially expressed in eudicots relative to
monocots [23]. Our analysis further indicated that
most plant miRNA clusters are composed of family
members and are located in intergenic regions, which
is consistent with previous studies in plants [10,24,25].
Our results imply that the size of the miRNA cluster
may contribute to preferential expression in eudicots
relative to monocots. Li et al. [25] suggested that the
co-transcription of similar or identical miRNAs in
clusters for plants may be involved in gene dosage

revealed that several intronic miRNA genes in rice
have a class II promoter, and rice miRNAs with more
than one promoter appear to be conserved [10], thus
implying that different sequence characteristics may be
presented in upstream regions of different miRNA
genes in plants. To further explore the promoter archi-
tecture of different miRNAs in plants and the relation-
ship between the number of Pol II promoters and the
degree of conservation of miRNAs, we classified four
plant miRNA genes into two types (intergenic miR-
NAs and intragenic miRNAs) based on their genomic
locations. Then, the miRNAs from the four plant spe-
cies studied were divided into three groups (based on
evolutionary conservation across all plant species, as
described in the Materials and methods): highly con-
served miRNAs, low conserved miRNAs and noncon-
served miRNAs. The results are summarized in Fig. 2.
As shown in Fig. 2A, we found a significant difference
between intergenic miRNAs and intragenic miRNAs
in the numbers of class II promoters in the upstream
regions (P < 0.001; two-sample t-test). The miRNAs
lying between protein-coding genes usually contained
more class II promoters in their upstream sequences
(on average 1.4 per miRNA) than those miRNAs lying
within the introns (on average 0.7 per miRNA) in the
four plant species studied. These results strongly indi-
cate that most intergenic miRNAs are transcribed by
RNA polymerase II in plants, and provide additional
evidence that a significant proportion of intragenic
miRNAs have Pol II promoters. It suggests that these

Distances
e
Percentage
f
Cluster Members Average Distances Percentage Cluster Members Average Distances Percentage
A. thaliana 213 15 32 2.13 0.67 15.02% 17 42 2.47 1.40 19.72% 21 54 2.57 2.69 25.35%
P. trichocarpa 234 9 18 2 0.45 7.69% 14 33 2.36 2.33 14.1% 17 40 2.35 3.11 17.09%
O. sativa 462 26 74 2.85 0.48 16.02% 28 90 3.21 1.08 19.48% 33 103 3.12 1.82 22.29%
S. bicolor 148 6 18 3 0.36 12.16% 10 27 2.7 1.14 18.24% 12 32 2.67 1.86 21.62%
a
The number of miRNA genes studied in four plant species.
b
The number of predicted clusters.
c
The number of miRNA genes located in clusters.
d
The average number of miRNA
genes in a cluster for the four plant species.
e
The average distance between two miRNA genes in a cluster.
f
The percentage of miRNA genes located in clusters.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
932 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
served miRNAs (P < 0.001; Fisher’s exact test) and
23.94% of nonconserved miRNAs (P < 0.0001; Fish-
er’s exact test) have at least two Pol II promoters.
However, there was no significant difference in the
number of Pol II promoters in upstream sequences
between low conserved and nonconserved miRNAs in

Intragenic
Intergenic
Intragenic
Intergenic
Intragenic
Intergenic
0%
10%
40%
50%
60%
20%
30%
0%
10%
40%
50%
20%
30%
0%
10%
40%
50%
20%
30%
0%
10%
P. trichocarpa
A. thaliana
012

occurring between protein-coding genes or
within the introns in four plant species.
(B) The percentage of miRNA genes with
different degrees of conservation.
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 933
plant miRNAs, we examined the distribution of the
putative core promoter in 2-kb upstream regions of
miRNAs in the four species of plant studied. In these
four plant species, the vast majority of the predicted
core promoters of the Pol II promoters were found to
lie within a 900-bp region upstream of the miRNAs.
Distribution analysis of core promoter localization in
2-kb regions upstream of the miRNAs from the four
species of plants studied showed that 50.4% of the
putative core promoters of the Pol II promoter were
located within 0–1 kb, 26.8% were located within 1–
1.5 kb and 22.8% were located within 1.5–2 kb,
respectively of the miRNA. A recent study on rice
(O. sativa) suggested that the majority of TSSs and
TATA-boxes are found within 0–400 bp upstream of
the miRNA [10]. Here, we found a similar distribution
of the putative core promoter in upstream regions of
miRNAs in four plant species. As shown in Fig. 3A, a
significant number of putative core promoters of the
Pol II promoter were found to be located within the
400-bp upstream regions in three plant species,
although the putative promoters in O. sativa were dis-
tributed mainly from 0 to 0.4 kb and from 1.6 to 2 kb.
Together, these results indicate that this distribution

sented and statistically significant motifs in the flank-
ing regions up to 2 kb upstream and 1 kb downstream
from the miRNA stem–loop sequences. First of all, we
used RepeatMasker with default settings to mask
repeats in all upstream and downstream sequences,
and then used two motif-finding tools – MEME and
MotifSampler – to identify over-represented motifs.
Finally, we carried out whole-genome Monte Carlo
simulation analysis to assess the specificity and signifi-
cance of motifs identified, as described in the Materials
and methods. Motifs whose Z-scores were > 2.0 were
considered as over-represented and statistically signifi-
cant motifs. Several significantly over-represented spe-
cies-specific motifs were identified in the flanking
regions of four plant species. All the species-specific
0%
10%
20%
30%
40%
50%
A
B
S. bicolor
O. sativa
A. thaliana
P. trichocarp
a
0%
10%

plant promoters. We found that M5, with the consen-
sus sequence GCATGCATGC, is an RY cis-acting
regulatory element involved in seed-specific regulation
in both monocot and eudicot species of plants [32,33].
Although the functions of other species-specific motifs
are still unknown, we found that some motifs have
repeat sequences in their consensus. M5 has two copies
of GCAT, and M3, which can be considered as GCA-
repeats. Palindromic patterns have been found in the
binding sites of some transcription factors in plants
and animals [34,35]. In contrast to A. thaliana,
P. trichocarpa and S. bicolor , we could not detect any
significant species-specific motifs in the flanking
regions of miRNAs in O. sativa, although a previous
study has identified three specific motifs in the promot-
ers of miRNAs in O. sativa [11]. Our analysis suggests
that these species-specific motifs are associated with
different specific functions, and may play an important
role in species-specific transcriptional regulation net-
works of miRNA genes or contribute to the formation
of species-specific miRNAs in plants. However, their
functions need to be investigated in further studies.
Furthermore, these species-specific motifs will be useful
in the computational identification of species-specific
miRNAs in plants.
Table 2. Significantly over-represented species-specific sequence motifs identified in the flanking regions of the three plant species studied.
Species Index
Consensus
sequence
a

nate is not fully understood. It is believed that the ori-
gin of new plant miRNAs is dependent on duplication
and inversion events [36–38]. However, several lines of
evidence have also suggested that new plant miRNA
genes can arise from foldback sequences, which are
under the control of transcriptional regulatory
sequences [39,40]. In order to determine whether some
significantly over-represented sequence motifs are
related to the degree of conservation of miRNA genes
in plants, we classified the miRNA genes of four plant
species into highly conserved miRNAs, low conserved
miRNAs and nonconserved miRNAs, as described in
the Materials and methods. We then examined the
upstream sequences and downstream sequences of
these miRNA genes to reveal characteristic sequence
motifs. Several significantly over-represented motifs
associated with the degree of miRNA conservation are
identified and listed in Table 3. Two motifs (CAT-
GCATGCA and CTAGCTAGCT; M1 and M2,
respectively), which have repetitive and palindromic
patterns in their consensus sequences, were found to
be significantly over-represented in highly conserved
plant miRNAs and therefore these motifs can be con-
sidered as CATG repeats and CTAG repeats, respec-
tively. However, we did not find any significantly over-
represented sequence motifs in the flanking sequences
of nonconserved miRNAs in the four plant species. In
contrast to nonconserved miRNA genes that have a
single copy, conserved miRNA genes are usually multi-
copy [25]. miRNAs that are highly conserved across

(P < 0.001). Comparison of the miRNA pair distances
with the pair distances of protein-coding genes
revealed that plant miRNAs are more clustered than
Table 3. Significantly over-represented sequence motifs related to the conservation of miRNAs.
Conservation Index Consensus sequence
a
Motif logo
b
E-value
c
Z-score
d
Highly M1 CATGCATGCA 3.6e-019 7.82
M2 CTAGCTAGCT
1.6e-024 5.76
Low M3 TGGCGGGAAA
24e-014 4.32
a
The consensus sequence represents a sequence of the most frequent base at each position.
b
The motif logos show the information con-
tent present at each position in the sequence.
c
The expected frequencies of motifs in a random database of the same size.
d
The Z-score
value obtained by whole-genome Monte Carlo simulation analysis.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
936 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
protein-coding genes in the very short pairwise dis-

conservation, and examined characteristic sequence
motifs in the flanking sequences of these miRNA
genes. Several significant motifs appeared to be related
to the degree of miRNA conservation. We hope that
our results can contribute to gaining a better under-
standing of transcriptional regulation and process-
ing of miRNAs and provide useful data for further
computational identification of miRNAs in plants.
Also, we anticipate that these motifs related to the
degree of miRNA conservation may be useful for
understanding the mechanism of the origin of new
plant miRNAs.
Materials and methods
Data sets
To obtain the upstream and downstream sequences of plant
miRNA genes, we chose four species of plant (A. thaliana,
P. trichocarpa, O. sativa and S. bicolor) to study clustering
patterns and sequence characteristics in the flanking regions
of plant miRNA genes because the number of miRNA
genes in these four plant species is relatively large and the
genome sequences are relatively complete. All known
miRNAs and genome coordinates in these four plant spe-
cies were downloaded from the miRBase Sequence Data-
base, release 16 ( [45]. The genome
sequences and the protein-coding genes of A. thaliana and
S. bicolor were downloaded from MapViewer in National
Center for Biotechnology Information (i.
nlm.nih.gov/). The genome sequences of P. trichocarpa and
O. sativa and the protein-coding genes were downloaded
from the Poplar site on Phytozome v6.0 (P. trichocarpa

No. of upstream
sequences
No. of downstream
sequences
A. thaliana TAIR9 213 21 167 167
P. trichocarpa JGI_Poptr2.0 234 17 163 163
O. sativa MSU6.0 462 33 326 326
S. bicolor JGI_sbi1 148 12 125 125
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 937
variations [19]. Finally, we divided the miRNAs of the four
plant species into three groups: the miRNAs whose homo-
logues were found simultaneously in monocots and eudicots
were considered as highly conserved miRNAs; those found
only in monocots or eudicots were considered as low con-
served miRNAs; and those found only in one species were
considered as nonconserved miRNAs.
Analysis of clustering patterns
To study the clustering patterns of miRNA genes in differ-
ent plant species, we computed the neighbour distances
between every two same-strand consecutive miRNA genes
in the same chromosome. The average distance of the
neighbour miRNA pairs was calculated across all chromo-
somes in the four plant species studied. To evaluate the sta-
tistical significance of miRNA clustering patterns in the
four plant species, we performed a sampling approach to
evaluate significance. First, we selected random positions
whose number was equal to the number of miRNA genes
on each chromosome. Then we computed the neighbour
distances between consecutive random points and the aver-

based on Gibbs sampling [50], to find over-represented
motifs. MotifSampler is a stochastic algorithm and the
results may vary for different runs. Therefore, we carried
out 50 repeated runs of MotifSampler for each analysis.
The number of different motifs was set to 10 and the
width of the motifs was set to 10. All other options were
set at a variety of arguably sensible settings. The results of
these two programs were integrated to identify motifs that
are frequently reported to have a low E-value among these
settings and among both motif-finding tools in the flanking
regions of the microRNA genes from the four plant spe-
cies. Sequence logos for all motifs found by these two pro-
grams were created using WebLogo Version 2.8.2 (http://
weblogo.berkeley.edu) [51].
In order to determine whether a motif is statistically sig-
nificant in the flanking regions of plant miRNA genes,
whole-genome Monte Carlo simulation, resulting in a
Z-score, was used to take into account the specificity and
significance of a motif, as previously described by
Zhou et al. [11]. For a given motif, we first obtained the
average number of occurrences per target sequence, denoted
as Nt, and then randomly generated the same number of ref-
erence sets from protein-coding genes and an intergenic
sequence, far upstream of the miRNA, as an appropriate
background. Next, the MEME motifs were individually
aligned using the MAST program with default values [52] to
the reference sets to compute the average number of occur-
rences of a motif, Nr, and its standard deviation, rr, over
the reference sets. The Z-score was computed as Z =
(Nt ⁄ Nr)=rr, which measures the normalized difference

CARPEL FACTORY, a Dicer homolog, and HEN1,
a novel protein, act in microRNA metabolism in
Arabidopsis thaliana. Curr Biol 12 , 1484–1495.
7 Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B &
Bartel DP (2002) MicroRNAs in plants. Genes Dev 16,
1616–1626.
8 Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA &
Carrington JC (2005) Expression of Arabidopsis
MIRNA genes. Plant Physiol 138, 2145–2154.
9 Bartel DP (2004) MicroRNAs: genomics, biogenesis,
mechanism, and function. Cell 116, 281–297.
10 Cui X, Xu SM, Mu DS & Yang ZM (2009) Genomic
analysis of rice microRNA promoters and clusters. Gene
431, 61–66.
11 Zhou X, Ruan J, Wang G & Zhang W (2007)
Characterization and identification of microRNA core
promoters in four model species. PLoS Comput Biol 3,
e37.
12 Megraw M, Baev V, Rusinov V, Jensen ST, Kalantidis
K & Hatzigeorgiou AG (2006) MicroRNA promoter
element discovery in Arabidopsis. RNA 12, 1612–1619.
13 Heikkinen L, Asikainen S & Wong G (2008) Identifica-
tion of phylogenetically conserved sequence motifs
in microRNA 5¢ flanking sites from C. elegans and
C. briggsae. BMC Mol Biol 9, 105.
14 Inouchi A, Shinohara S, Inoue H, Kita K & Itakura M
(2007) Identification of specific sequence motifs in the
upstream region of 242 human miRNA genes. Comput
Biol Chem 31, 207–214.
15 Ohler U, Yekta S, Lim LP, Bartel DP & Burge CB

riev I, Hellsten U, Putnam N, Ralph S, Rombauts S,
Salamov A et al. (2006) The genome of black cotton-
wood, Populus trichocarpa (Torr. & Gray). Science 313,
1596–1604.
24 Zhang B, Pan X, Cannon CH, Cobb GP &
Anderson TA (2006) Conservation and divergence of
plant microRNA genes. Plant J 46, 243–259.
25 Li A & Mao L (2007) Evolution of plant microRNA
gene families. Cell Res 17 , 212–218.
26 Baskerville S & Bartel DP (2005) Microarray profiling
of microRNAs reveals frequent coexpression with
neighboring miRNAs and host genes. RNA 11,
241–247.
27 Kim YK & Kim VN (2007) Processing of intronic
microRNAs. EMBO J 26, 775–783.
28 Lindow M & Krogh A (2005) Computational evidence
for hundreds of non-conserved plant microRNAs. BMC
Genomics 6, 119.
29 Hofmann NR (2010) MicroRNA evolution in the genus
Arabidopsis. Plant Cell 22, 994.
30 Todesco M, Rubio-Somoza I, Paz-Ares J & Weigel D
(2010) A collection of target mimics for comprehensive
analysis of microRNA function in Arabidopsis thaliana.
PLoS Genet 6, e1001031.
31 Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y,
Van de Peer Y, Rouze P & Rombauts S (2002)
PlantCARE, a database of plant cis-acting regulatory
elements and a portal to tools for in silico analysis of
promoter sequences. Nucleic Acids Res 30, 325–327.
32 Baumlein H, Nagy I, Villarroel R, Inze D & Wobus U

& Weigel D (2008) Evolution of Arabidopsis thaliana
microRNAs from random sequences. RNA 14, 2455–
2459.
40 Axtell MJ (2008) Evolution of microRNAs and their
targets: are all microRNAs biologically relevant?
Biochim Biophys Acta 1779, 725–734.
41 Haberer G, Hindemitt T, Meyers BC & Mayer KF
(2004) Transcriptional similarities, dissimilarities, and
conservation of cis-elements in duplicated genes of
Arabidopsis. Plant Physiol 136, 3009–3022.
42 Wang Y, Hindemitt T & Mayer KF (2006) Significant
sequence similarities in promoters and precursors of
Arabidopsis thaliana non-conserved microRNAs.
Bioinformatics 22, 2585–2589.
43 Guddeti S, Zhang DC, Li AL, Leseberg CH, Kang H,
Li XG, Zhai WX, Johns MA & Mao L (2005) Molecu-
lar evolution of the rice miR395 gene family. Cell Res
15, 631–638.
44 Allen E, Xie Z, Gustafson AM & Carrington JC (2005)
microRNA-directed phasing during trans-acting siRNA
biogenesis in plants. Cell 121, 207–221.
45 Griffiths-Jones S, Saini HK, van Dongen S
& Enright AJ (2008) miRBase: tools for microRNA
genomics. Nucleic Acids Res 36, D154–158.
46 Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M,
Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng
L et al. (2007) The TIGR Rice Genome Annotation
Resource: improvements and new features. Nucleic
Acids Res 35, D883–887.
47 Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ,


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status