Báo cáo khoa học: A study on genomic distribution and sequence features of human long inverted repeats reveals species-specific intronic inverted repeats - Pdf 11

A study on genomic distribution and sequence features
of human long inverted repeats reveals species-specific
intronic inverted repeats
Yong Wang and Frederick C. C. Leung
School of Biological Sciences and Genome Research Centre, The University of Hong Kong, China
An inverted repeat consists of two repeat copies (here-
after termed arms) that are approximately complemen-
tary to each other. Generally, there is a spacer between
the arms, and the full structure of an inverted repeat
can form a stem-loop or palindrome. The potential to
form a stable stem-loop is determined by the arm size,
spacer size and the matching degree of the arms [1,2].
For example, a relatively huge spacer makes it difficult
for the two arms to form a stem.
Studies of inverted repeats show that they may raise
instability in a genome and, on the other hand, regu-
late gene expression in both prokaryotes and eukary-
otes. Being capable of forming secondary structures
[3], inverted repeats can induce genomic instability via
gene amplification, recombination, DNA double-strand
breaks and rearrangement [1,2,4–8]. Moreover,
inverted repeats provide sites for the integration of
viruses into eukaryotic genomes [9,10] and also com-
prise replication stall sites, as shown in a recent study
in which evidence obtained in vivo demonstrated repli-
cation stalling by hairpins formed by inverted repeats
in bacteria, yeast and mammalian cells [11]. As a
result, they are restricted in a genome to some extent.
For example, neighboring repetitive elements, such as
Alu repeats, are generally found to occur in the same
direction, and those in the styles of head-to-head and

terms. The functional pathways related to the development and functions
of the nervous system are not enriched in chimpanzee and mouse ortho-
logs. The findings of the present study provide insight into the role of
intronic LIRs in gene regulation and primate speciation.
Abbreviations
FDR, false discovery rate; GO, Gene Ontology; LIR, long inverted repeats; siRNA, small interference RNA; TIR, terminal inverted repeat.
1986 FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS
gene experiment, the introduction of a large palin-
drome was followed by numerous rearrangements,
which were assumed to comprise a solution for attenu-
ating the impact of the palindrome in the progeny [13].
Inverted repeats also regulate gene expression. The
stem-loops and palindromes constructed by inverted
repeats are involved in RNA interference, transcription
initiation of genes, initiation of DNA replication and
alternative splicing of exons. The small interference
(si)RNA genes active in RNA interference comprise
inverted repeats capable of forming a stem-loop motif
longer than 22 bp. Some are derived from miniature
inverted-repeat transposable elements [14]. At present,
studies have identified siRNA genes from Caenorhabd-
itis elegans to humans. RNA interference was initially
discovered as an efficient mechanism for inhibiting the
expression of specific genes [15,16], and later was
found to be responsible for developmental regulation
[17,18] and heterochromatin maintenance [19,20]. In
promoters, inverted repeats can facilitate the recogni-
tion process and the subsequent binding of RNA poly-
merase during gene transcription [21,22]. Moreover,
the inverted repeats in a cruciform structure will

[32,33]. Human inverted repeats were investigated in a
previous study, in which a majority of them were
found to be weak with respect to their capacity to
form a simple stem-loop or hairpin in terms of their
structural features [32]. Genome-wide distribution of
human palindromes has also been surveyed, and a
database has been created for public use [30]. How-
ever, the palindromes with mismatches and indels were
not collected in the database.
In the present study, we first located all the long
inverted repeats (LIRs) characterized with long arms,
high arm similarity and a short internal spacer in the
human genome. They were termed as recombinogenic
LIRs in our previous study on human chromosomes
21 and 22 [33], although their distribution and fre-
quency had not been fully surveyed in the whole
human genome. The present study aims to provide a
panoramic view of recombinogenic LIRs. On the basis
of evidence obtained in vivo [1,2,11], the LIRs identi-
fied in the present study can easily form stem-loops or
palindromes. Their presence in the human genome by
itself implies that they are functional in some manner.
The results obtained showed that 37% of the LIRs
were located in intronic regions and some were
primate-specific. TG ⁄ CA-rich repeats are the most
frequently observed feature in LIR arms. Considering
that the LIRs probably have essential functions and
drive the speciation of primates, we studied the degree
of conservation and species specification of the LIRs
among orthologous genes from the mouse (Mus mus-

) (Fig. 3). The point
denoting chromosome 19 is notably far from the
regression line, accounting for gene clusters that con-
tribute to the two-fold higher gene density of the chro-
mosome 19 compared to the genomic average [34].
We found that the negative correlation is due to the
high frequency of LIRs in long genes. A total of 956
LIRs (in 702 genes) were located in genic regions, and
1595 in intergenic regions. In other words, 37% of
the LIRs were found within genes. However, our
Fig. 1. Characteristics of human LIRs. The 2551 LIRs were classified according to spacer size, mismatch rate and arm length.
Fig. 2. Distribution of LIRs in the human
genome. The density is represented by the
amount of LIRs per 1 Mb sequence. The
shortest bars denote one LIR per 1 Mb.
A study on human long inverted repeats Y. Wang and F. C. C. Leung
1988 FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS
calculation of the coverage of genes in the human gen-
ome was 26.9%, which is consistent with the value
reported previously [35]. When introns were taken into
account, the percentage was 25.1%. This implies that
the distribution of the LIRs is not random. Statistical
analysis performed on the results shows that the pres-
ence of LIRs is significantly biased to be within genes
(chi-square test; P < 0.0001). The LIRs that have long
arms (> 400 bp) and the associated genes are listed in
Table S1. Surprisingly, over half of them were found
within genes. There are two cases of partial overlap
between LIRs and exons. In one case, the left arm of
an intronic LIR extends into an exon of c14orf165

groups, the numbers of LIRs in the ratio ranges do
not show any significant difference (chi-square tests;
d.f. = 4; P > 0.1) (see Fig. S1). Therefore, the LIRs
do not avoid approaching the boundaries for exonic or
genic regions. The median distance to the exon bound-
aries is 7.8 kb, and that to the gene boundaries is
69 kb.
Strikingly, pseudogenes were frequently found
around the intergenic LIRs. A total of 803 intergenic
LIRs (50%) have one or two neighboring pseudogenes,
of which 422 are RNA pseudogenes. According to the
annotation in the Ensembl database (http://www.
ensembl.org), approximately 27% of the human genes
are pseudogenes. The occurrence of pseudogenes
adjacent to LIRs is statistically significant (chi-square
test; P < 0.0001).
Sequence features of the human LIRs
We found that over half (51%) of the identified LIRs
could be packed into groups consisting of at least three
members on the basis of sequence similarity. The
group members are comprised of simple repeats,
known repetitive elements, amplified genes or dupli-
cated genomic fragments. The largest group consists of
LIRs formed by stretches of TG ⁄ CA dinucleotides and
interspersed TA dinucleotides. We defined them as
TG ⁄ CA-rich LIRs, accounting for 33% and 39% of
all the LIRs in the intronic and intergenic regions,
respectively (Fig. 4). By contrast, we also identified
TC ⁄ GA-rich LIRs that occupy only 3% in both of the
regions. Thus, the frequency of TG ⁄ CA-rich LIRs is at

Fig. 3. Negative correlation between gene density and LIR fre-
quency. The black dots show the correlation between gene density
and LIR density in the 22 chromosomes.
Y. Wang and F. C. C. Leung A study on human long inverted repeats
FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS 1989
spacer and long terminal inverted repeats (TIRs). In
the present study, they were considered as LIRs in
cases of high identity between TIRs. Within both
intronic and intergenic LIRs, they occupy 6% in total.
Alu repeats in the LIRs are mostly in a partial struc-
ture and found to be in the styles of head-to-head or
tail-to-tail. In some large LIRs, more than one Alu
was included in one arm, and the complete structure
of Alu could be retained therein. The proportion of
inverted Alus within the LIRs is 6% for intronic
regions and 3% for intergenic regions.
The grouping of the LIRs is also a result of gene
amplification or fragmental duplication, although the
numbers of such groups and the members inside the
groups are not large. We identified 20 LIRs in genes
encoding a novel protein similar to septin (NPS-Sep-
tin) and eight in genes encoding POTE. The genes
belong to gene families and their duplication is coupled
with the spread of the LIRs inside the gene. The
remaining LIRs aside from the above groups show
similarity either to one or none of the others. They are
labeled as rare LIRs, accounting for 49% of all the
LIRs in the human genome.
We explored the LIRs in the NPS-Septin gene
family in more detail. A blat search in the University

sus monkey-specific. For the nonspecific LIRs, 13
groups of orthologs from the three primate species all
have at least one LIR, and 104 ortholog pairs from
humans and chimpanzees possess LIR(s). This suggests
that most of the nonspecies-specific LIRs are shared
by the primates, and some LIRs were specifically
developed in the primate lineage.
We next obtained the biological profile of the human
orthologs that have human-specific LIR(s). Compared
to randomly-selected human genes, the orthologs are
significantly enriched with GO terms within the catego-
ries of development, binding, membrane, cell communi-
cation and signal transduction (Table 1). An important
finding is that a number of the terms are related to the
nervous system, including neurotransmitter receptor
activity (GO:0030594), central nervous system develop-
ment (GO:0007417), GABA receptor activity (GO:
0016917), axonogenesis (GO:0007409), projection,
generation, differentiation and development of neurons
(GO:0043005, GO:0048699, GO:0030182, GO:0048666),
synapse (GO:0045202), and so on. The GO term that is
under-represented in these genes is GO:0006955 for
immune response [false discovery rate (FDR) = 0.048].
We also performed the same test on 104 orthologs
with human- and chimpanzee-specific LIRs. The GO
Fig. 4. Composition of LIRs in the human
genome. The LIRs in POTE and NPS-Septin
families occupy 3% of all the intronic LIRs.
The ‘other’ LIRs, occupying 49% of all LIRs,
refer to those with unique sequence

are listed in Table S2. Essentially, the arms
of the LIRs are approximately 1–48 and
97–143 bp, and can be extended into the
spacers in some LIRs.
Y. Wang and F. C. C. Leung A study on human long inverted repeats
FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS 1991
for homologous fragments in other mammalian
genomes. The species specification of the LIRs was
demonstrated in several cases, where we found
half-sized LIRs in the rhesus monkey genome. One
case is the LIR in the human gene c9orf52 that has 19
ORFs and four transcription variants. Positioned
Table 1. GO terms over-represented in human genes having human-specific LIRs. The genes are human orthologs that have at least one
human-specific LIR. Reference genes are randomly selected from the list of orthologs, and are used for comparison with the test human
genes with specific LIRs. The GO terms in 352 test genes were compared with those in 296 reference genes, using Fisher’s exact test in
BLAST2GO. FDR was applied to obtain significantly over-represented (FDR < 0.05) GO terms in the test genes. Several GO terms belonging to
levels 1 or 2 are not included.
GO term Name FDR GO term Name FDR
GO:0060089 Molecular transdducer activity 6.49E-05 GO:0005524 ATP binding 0.011768
GO:0007165 Signal transduction 6.49E-05 GO:0048731 System development 0.013028
GO:0005886 Plasma membrane 1.22E-04 GO:00434 12 Biopolymer modification 0.013028
GO:0044425 Membrane part 2.44E-04 GO:0005887 Integral to plasma membrane 0.013136
GO:0007154 Cell communication 2.65E-04 GO:0005230 Extracellular ion channel activity 0.014854
GO:0032501 Multicellular organismal process 2.65E-04 GO:0031175 Neurite development 0.014854
GO:0045202 Synapse 3.10E-04 GO:0031402 Sodium ion binding 0.014854
GO:0004888 Transmembrane receptor activity 7.24E-04 GO:0030030 Cell projection organization 0.016226
GO:0031224 Intrinsic to membrane 7.24E-04 GO:0022804 Active transmembrane transporter 0.016354
GO:0016021 Integral to membrane 8.83E-04 GO:0004672 Protein kinase activity 0.017032
GO:0032502 Developmental process 0.001316 GO:0048856 Anatomical structure development 0.017199
GO:0030695 GTPase regulator activity 0.001397 GO:0043687 Post-translational protein modification 0.017406

GO:0006814 Sodium ion transport 0.005868 GO:0048468 Cell development 0.025037
GO:0005509 Calcium ion binding 0.006231 GO:0022891 Transmembrane transporter activity 0.026669
GO:0016773 Phosphotransferase activity 0.006277 GO:0009790 Embryonic development 0.028903
GO:0017076 Purine nucleotide binding 0.00637 GO:0006793 Phosphorus metabolic process 0.030622
GO:0022008 Neurogenesis 0.006378 GO:0022892 Substrate-specific transporter 0.030713
GO:0048699 Generation of neurons 0.006378 GO:0043005 Neuron projection 0.033582
GO:0000902 Cell morphogenesis 0.007098 GO:0005096 GTPase activator activity 0.033582
GO:0032989 Cellular structure morphogenesis 0.007098 GO:0015698 Inorganic anion transport 0.033582
GO:0030234 Enzyme regulator activity 0.007765 GO:0065007 Biological regulation 0.034756
GO:0051056 Regulation of small GTPase 0.009344 GO:0005089 Rho guanyl-nucleotide exchange factor 0.041165
GO:0048869 Cellular developmental process 0.009344 GO:0005088 Ras guanyl-nucleotide exchange factor 0.041165
GO:0030154 Cell differentiation 0.009344 GO:0007010 Cytoskeleton organization 0.041874
GO:0005215 Transporter activity 0.009344 GO:0007417 Central nervous system development 0.043131
GO:0032559 Adenyl ribonucleotide binding 0.009362 GO:0030594 Neurotransmitter receptor activity 0.043131
GO:0032555 Purine ribonucleotide binding 0.01047 GO:0004674 Protein serine ⁄ threonine kinase 0.045995
GO:0032553 Ribonucleotide binding 0.01047 GO:0008092 Cytoskeletal protein binding 0.046358
GO:0031226 Intrinsic to plasma membrane 0.011022 GO:0005515 Protein binding 0.047541
A study on human long inverted repeats Y. Wang and F. C. C. Leung
1992 FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS
between exons 17 and 18 (5.3 kb to exon 18; 36.59 kb
to exon 17) (Fig. 6), it has a homolog in the chimpan-
zee genome. However, all homologous fragments from
the rhesus monkey correspond to one arm of the LIR.
Moreover, motif conservation was exhibited at the
flanking sequences of the LIR in primates (Fig. 6). In
other words, the half-sized LIR represents the
ancestral status, and the full-sized LIR was developed
in the chimpanzee and human lineages. We did not
find fragments homologous to the LIR in the mouse
genome. Instead, a half-sized LIR was observed in the

LIRs with a huge spacer (see Table S1). In previous
studies, the methods employed for inverted repeat
identification could not search the inverted repeats by
freely defining arm similarity, spacer size and indels
[30,32,36]. Thus, the map of the LIRs obtained in the
present study provides a more detailed distribution of
stem-loops in the human genome, and confirms that
the LIRs are mostly located in long introns and inter-
genic regions. Furthermore, the inverted repeats in the
present study are more likely to be functional than
those of previous studies because functional inverted
repeats such as siRNA genes are rarely palindromes
showing 100% arm similarity [30,32,36].
Because of the difficulties encountered in the design
of the algorithm for LIR searching at the genome
level and the complex folding structures of inverted
repeats, we could not target all the inverted repeats
with a strong potential to form a stem-loop or a pal-
indrome. Particularly, there are a large number of
AT-rich regions in the human genome, and the fre-
quency of (TA)
n
repeats is 19.4 per Mb [37]. The
self-complementary (TA)
n
repeats can by themselves
form variant secondary structures. To remove these
simple repeats, we set the GC content of the arm
sequences at > 20%. This step, however, unavoidably
deleted AT-rich LIRs, and some of them have been

will not greatly disturb the coding parts of the genes.
Knowing the genomic distribution and sequence fea-
tures of the LIRs enables us to speculate about the
biological functions of the LIRs.
First, there are a large number of TG ⁄ CA-rich LIRs
in our collection, and these intronic TG and CA tracts
are probably involved in the alternative splicing of
genes. One study revealed that intronic TG tracts,
particularly in hairpin structure, are important in the
intron knockout process and help to create
complicated splicing patterns [39]. On the other hand,
CA-tracts and CA-rich sequences are confirmed to be
regulators for alternative splicing. One study showed
that the insertion of a CA repeat into different intronic
places will result in variant splicing patterns in a
human gene [40]. Perhaps splicing sites at intron–exon
boundaries can be recognized easily by a signal of sec-
ondary structure. Taken together, this allows us to
propose that the TG ⁄ CA-rich LIRs are regulators in
human genes.
Second, approximately half of the LIRs are unique
in sequence features, and some of them are probably
unidentified siRNA genes. In the present study, the
LIRs are longer than the minimal length required for
an siRNA. Although arm similarity is higher than that
observed in most siRNAs, some of them are still can-
didates for siRNA genes. We used emboss sirna
( to
identify the candidates with a threshold score of 8, and
found that 267 of the LIRs are potential siRNA genes.

NPS-Septin genes are spread in the human genome
possibly due to interchromosomal recombination and
fragmental duplication. One study showed that inter-
chromosomal recombination frequently occurs at the
subtelomeric regions in humans [42]. NPS-Septin was
assumed to be one of the gene families that amplified
themselves by this mechanism. The result of gene
amplification is concurrent duplication of the intronic
LIRs, as observed in chromosomes 1 and Y in the
present study. By contrast, only two chimpanzee NPS-
Septin LIRs were identified, which is in accordance
with the low frequency of subtelomeric duplications in
the chimpanzee genome [42]. Regarding the spread of
LIRs in the POTE family, at least those on chromo-
some 2 subtelomeres are most likely the result of intra-
chromosomal recombination, as inferred from genomic
locations. Similarly, the chimpanzee genome has two
POTE homologs: one on chromosome 12 and another
one on chromosome 22. In addition, the LIRs in
POTE and NPS-Septin families were entirely absent
from other mammals in current genomic assemblies.
Probable role of the LIRs in primate speciation
Among the orthologous genes, we found that human
and chimpanzee genes contain more LIRs than rhesus
monkey and mouse orthologs. Our data suggest that
most of the LIRs shared by human and chimpanzee
orthologs were developed and maintained by the com-
mon ancestor of humans and chimpanzees. However,
the difference in LIR frequency among the primates
could be narrowed to some extent. Our search for

ance of the LIRs probably provides humans with an
evolutionary advantage and contributes to the specia-
tion of primates.
Experimental procedures
Identification of LIRs
The human genome (Build 35) was downloaded from the
NCBI ( and the locations of
all human genes and their exons (for protein-coding genes)
were obtained from the Ensembl database (http://www.
ensembl.org). From the gene list, we obtained the locations
of the boundaries of the genic and nongenic regions. Exons
belonging to the same genes were sorted again according to
their genomic locations, and the introns were defined as the
intervals between the exons. From the list, the boundaries
of exons and introns were determined.
We first searched for inverted repeats across the human
genome using bespoke software [33]. The settings for this
step were: arm length > 30 bp; arm identity > 85%; and
spacer < 2 kb. In addition, inverted repeats with a GC
content of the arms of < 20% were filtered out. This
aimed to exclude an abundance of inverted repeats formed
by (TA)
n
simple repeats as shown in our primary study. A
(TA)
n
by itself is an inverted repeat, and can form variant
secondary structures rather than an exclusive and stable
stem-loop. Therefore, (TA)
n

was considered to be formed by a known repetitive element
if the identity of the homologous part (> 20 bp) was
higher than 75%. For the results obtained, LIRs formed by
inverted Alu repeats were further confirmed by repeatmas-
ker (). Second, LIRs homologous
to each other were searched. Similarly, the criteria were:
homologous part > 20 bp and identity of homologous part
> 75%. Put simply, the algorithm for searching the
homologous part aimed to find an identical seed of 5 bp
and then extend the seed at both ends until continuous two
mismatches occur at both sides.
LIRs in mammalian orthologous genes
We obtained orthologous genes for human–chimpanzee,
human–rhesus monkey and human–mouse species pairs
from the BIOMART database ( />biomart/), which employs the Ensembl 42 Homology Data-
base. By searching the same human gene IDs in the three
ortholog tables, we created a new ortholog table containing
12 723 groups of orthologous genes from the four species.
In the BIOMART database, some orthologous genes are of
the types ‘one-to-many’ and ‘many-to-many’ that denote a
multiple orthologous relationship between the genes. In the
Y. Wang and F. C. C. Leung A study on human long inverted repeats
FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS 1995
present study, we kept the orthologs in the ‘one-to-one’
type to allow an easier comparison among the orthologs in
terms of the absence ⁄ presence of the LIRs.
The criteria for searching inverted repeats among the
orthologs were: arm length > 30 bp; arm identity > 75%;
spacer < 500 bp. Again, the inverted repeats with a GC
content of the arms of < 20% were excluded. We then

repeats unstable in yeast are excluded from the human
genome. EMBO J 19, 3822–3830.
2 Lobachev KS, Shor BM, Tran HT, Taylor W, Keen
JD, Resnick MA & Gordenin DA (1998) Factors affect-
ing inverted repeat stimulation of recombination and
deletion in Saccharomyces cerevisiae. Genetics 148,
1507–1524.
3 Gordenin DA, Lobachev KS, Degtyareva NP, Malkova
AL, Perkins E & Resnick MA (1993) Inverted DNA
repeats: a source of eukaryotic genomic instability. Mol
Cell Biol 13, 5315–5322.
4 Tanaka H, Tapscott SJ, Trask BJ & Yao M-C (2002)
Short inverted repeats initiate gene amplification
through the formation of a large DNA palindrome
in mammalian cells. Proc Natl Acad Sci USA 99, 8772–
8777.
5 Lin C-T, Lin W-H, Lyu YL & Whang-Peng J (2001)
Inverted repeats as genetic elements for promoting
DNA inverted duplication: implications in gene amplifi-
cation. Nucleic Acids Res 29, 3529–3538.
6 Gotter AL, Nimmakayalu MA, Jalali GR, Hacker AM,
Vorstman J, Conforto Duffy D, Medne L & Emanuel
BS (2007) A palindrome-driven complex rearrangement
of 22q11.2 and 8q24.1 elucidated using novel technolo-
gies. Genome Res 17, 470–481.
7 Nag DK & Kurst A (1997) A 140-bp-long palindromic
sequence induces double strand breaks during meiosis
in the yeast Saccharomyces cerevisiae. Genetics 146,
835–847.
8 Collick A, Drew J, Penberth J, Bois P, Luckett J, Scae-

ference in Caenorhabditis elegans. Proc Natl Acad Sci
USA 95, 15502–15507.
16 Bartel DP (2004) MicroRNAs: genomics, biogenesis,
mechanism, and function. Cell 116, 281.
17 Moss EG, Lee RC & Ambros V (1997) The cold shock
domain protein LIN-28 controls developmental timing
in C. elegans and is regulated by the lin-4 RNA. Cell
88, 637–646.
18 Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE,
Bettinger JC, Rougvie AE, Horvitz HR & Ruvkun G
(2000) The 21-nucleotide let-7 RNA regulates develop-
A study on human long inverted repeats Y. Wang and F. C. C. Leung
1996 FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS
mental timing in Caenorhabditis elegans. Nature 403 ,
901–906.
19 Schramke V, Sheedy DM, Denli AM, Bonila C, Ekwall
K, Hannon GJ & Allshire RC (2005) RNA-interfer-
ence-directed chromatin modification coupled to RNA
polymerase II transcription. Nature 435, 1275–1279.
20 Volpe TA, Kidner C, Hall IM, Teng G, Grewal SIS &
Martienssen RA (2002) Regulation of heterochromatic
silencing and histone H3 lysine-9 methylation by RNAi.
Science 297, 1833–1837.
21 Glucksmann MA, Markiewicz P, Malone C & Roth-
man-Denes LB (1992) Specific sequences and a hairpin
structure in the template strand are required for N4
virion RNA polymerase promoter recognition. Cell 70,
491–500.
22 Kim EL, Peng H, Esparza FM, Maltchenko SZ &
Stachowiak MK (1998) Cruciform-extruding regulatory

B, Svetec I-K, S
ˇ
aric
´
H, Nikolic
´
I & Zgaga Z
(2005) Palindrome content of the yeast Saccharomyces
cerevisiae genome. Curr Genet 47, 289–297.
30 Lu L, Jia H, Dro
¨
ge P & Li J (2007) The human
genome-wide distribution of DNA palindromes. Funct
Integr Genomics 7, 221–227.
31 Zhao G, Chang KY, Varley K & Stormo GD (2007)
Evidence for active maintenance of inverted repeat
structures identified by a comparative genomic
approach. PLoS ONE 2, e262.
32 Warburton PE, Giordano J, Cheung F, Gelfand Y &
Benson G (2004) Inverted repeat structure of the human
genome: the X-chromosome contains a preponderance
of large, highly homologous inverted repeats that con-
tain testes genes. Genome Res 14, 1861–1869.
33 Wang Y & Leung FCC (2006) Long inverted repeats in
eukaryotic genomes: recombinogenic motifs determine
genomic plasticity. FEBS Lett 580, 1277–1284.
34 Grimwood J, Gordon LA, Olsen A, Terry A, Schmutz
J, Lamerdin J, Hellsten U, Goodstein D, Couronne O,
Tran-Gyamfi M et al. (2004) The DNA sequence and
biology of human chromosome 19. Nature 428, 529–

recombination in mammalian cells. Genetics 153, 1873–
1883.
42 Linardopoulou EV, Williams EM, Fan Y, Friedman C,
Young JM & Trask BJ (2005) Human subtelomeres are
hot spots of interchromosomal recombination and seg-
mental duplication. Nature 437, 94–100.
43 Myers S, Bottolo L, Freeman C, McVean G & Donnelly
P (2005) A fine-scale map of recombination rates and
hotspots across the human genome. Science 310, 321–
324.
44 Bird AP (1995) Gene number, noise reduction and bio-
logical complexity. Trends Genet 11, 94–100.
45 Claverie J-M (2001) Gene number: what if there are
only 30,000 human genes? Science 291, 1255–1257.
46 Bacolla A, Larson JE, Collins JR, Li J, Milosavljevic
A, Stenson PD, Cooper DN & Wells RD (2008)
Abundance and length of simple repeats in vertebrate
Y. Wang and F. C. C. Leung A study on human long inverted repeats
FEBS Journal 276 (2009) 1986–1998 ª 2009 The Authors Journal compilation ª 2009 FEBS 1997
genomes are determined by their structural properties.
Genome Res 18, 1545–1553.
47 Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon
M & Robles M (2005) Blast2GO: a universal tool for
annotation, visualization and analysis in functional
genomics research. Bioinformatics 21, 3674–3676.
48 Blu
¨
thgen N, Brand K, Cajavec B, Swat M, Herzel H &
Beule D (2005) Biological profiling of gene groups
utilizing gene ontology. Genome Inform 16, 106–115.


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status