Seed-based systematic discovery of specific transcription
factor target genes
Ralf Mrowka
1,2,3
, Nils Blu
¨
thgen
4
and Michael Fa
¨
hling
1,3
1 Paul-Ehrlich-Zentrum fu
¨
r Experimentelle Medizin, Berlin, Germany
2 AG Systems Biology – Computational Physiology, Berlin, Germany
3 Johannes-Mu
¨
ller-Institut fu
¨
r Physiologie, Charite
´
-Universita
¨
tsmedizin Berlin, Germany
4 School of Chemical Engineering and Analytical Sciences, Manchester Interdisciplinary Biocentre, University of Manchester, UK
The prediction and analysis of the regulatory networks
underlying gene expression is a central challenge in
systems biology and functional genomics [1,2]. Regula-
tion of transcription is the initial mechanism for con-
trolling the expression of genes. Key regulators of
doi:10.1111/j.1742-4658.2008.06471.x
Reliable prediction of specific transcription factor target genes is a major
challenge in systems biology and functional genomics. Current
sequence-based methods yield many false predictions, due to the short and
degenerated DNA-binding motifs. Here, we describe a new systematic gen-
ome-wide approach, the seed-distribution-distance method, that searches
large-scale genome-wide expression data for genes that are similarly
expressed as known targets. This method is used to identify genes that are
likely targets, allowing sequence-based methods to focus on a subset of
genes, giving rise to fewer false-positive predictions. We show by cross-vali-
dation that this method is robust in recovering specific target genes. Fur-
thermore, this method identifies genes with typical functions and binding
motifs of the seed. The method is illustrated by predicting novel targets of
the transcription factor nuclear factor kappaB (NF-jB). Among the new
targets is optineurin, which plays a key role in the pathogenesis of acquired
blindness caused by adult-onset primary open-angle glaucoma. We show
experimentally that the optineurin gene and other predicted genes are tar-
gets of NF-jB. Thus, our data provide a missing link in the signalling of
NF-jB and the damping function of optineurin in signalling feedback of
NF-jB. We present a robust and reliable method to enhance the genome-
wide prediction of specific transcription factor target genes that exploits the
vast amount of expression information available in public databases today.
Abbreviations
CASP4, caspase 4; ChIP, chromatin immunoprecipitation; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; HEK, human embryonic
kidney; HIF-1, hypoxia-inducible factor 1; HNF4, hepatocyte nuclear factor 4; IKK, IjB kinase; NEMO, nuclear factor kappaB essential
modulator; NF-jB, nuclear factor kappaB; OPTN, optineurin; RGA, reporter gene analysis; STAT5A, signal transducer and activator of
transcription 5A; TNF-a, tumor necrosis factor-a.
3178 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS
applies for the transcription factors analysed in this
study. The major problem is the short length and high
relevant biological functions of transcription factors,
which directly influence mRNA concentrations in the
cell. Well-designed, small-scale expression profile
experiments have been successfully used to identify
transcription factors involved in certain pathways
[16,17]. Especially when applied to time-series data,
seed-based clustering methods have been very success-
ful in identifying novel targets by comparing expres-
sion kinetics with known targets for p53 and for
picking up genes regulated in different cell-cycle phases
[18,19]. However, these approaches require dedicated
microarray experiments. We addressed the question as
to whether it is feasible to explore the large body of
expression information that is already stored in public
databases. These datasets might contain information
about expression at different time points for different
cell lines that might be only marginally related to the
transcription factor under investigation, and we won-
dered whether these datasets would allow us to extract
the relevant information about the action of transcrip-
tion factors on their targets.
In recent years, several microarray techniques have
been developed to measure mRNA concentration on a
genome-wide scale [20]. In addition, efforts have been
made to store individual microarray experiments in
databases. Microarray expression data have been used
in recent times to improve transcription factor target
prediction [21]. In this work, we developed a method
to exploit a dataset of approximately 1200 microarray
experiments in conjunction with a seed group of
genes with the genes in the microarray set resulted in
81 genes, which were used as the seed. We obtained
these large-scale microarray expression data [26]
(detailed description of data in supplementary Doc S1)
from the Stanford microarray database [27]. The set
contains genome-wide data from 1202 hybridization
experiments from human tissues and cell lines. Subse-
quently, we ranked each gene x according to its
similarity L(x) of expression to the seed group
(detailed results given in supplementary Doc S2). We
defined similarity L(x) for a gene x by taking the
R. Mrowka et al. Systematic TF target prediction
FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3179
median correlation of gene x to the seed and subtract-
ing its median correlation to all genes (typical distribu-
tions of correlations of genes to the seed group are
shown in supplementary Fig. S1). Thus, if L(x) showed
high values, the particular gene was similarly regulated
as the seed gene group. In contrast, if the absolute
value of the similarity measure was low, it indicated
that the median of distribution was close to that corre-
lation distribution of the gene to a randomly selected
group. Using the similarity measure L, we then sorted
all remaining human genes and thereby obtained a
ranking of the genes according to their similarity to
the seed group. To avoid a circular argument, we
would like to stress that for all statistical analyses and
characterization of rank, the seed group was excluded.
A schematic representation of this procedure is given
in Fig. 1. The essence of the method is that if a gene’s
We next analysed the top members of the obtained
rank with regard to their gene ontology classification.
For the top 600 genes, we examined whether any gene
ontology classification is significantly enriched using
rigorous statistics [12]. It turns out that the list of sig-
nificant gene functions of the top 600 genes as shown
in supplementary Table S1 is congruent with the func-
tions of NF-jB described in the literature.
We further analysed the occurrences of NF-jB typi-
cal functions within the rank. We found that there was
a steep increase of the density of genes involved in
‘immune response’, starting at approximately rank 700
when moving from lowest to highest ranks. The proba-
bility of a gene being involved in the immune response
is therefore greatly increased for the top members in
the rank, as seen in Fig. 2.
High density of putative NF-jB DNA-binding sites
in promoters in the top group of the rank
As the overrepresentation of typical NF-jB-related
biological functions might be due to coexpression
mediated by different transcription factors, we decided
to analyse the sequences of putative promoter regions
of the high-ranking genes.
We predicted binding sites for all vertebrate tran-
scription factors contained in the transfac database
in the 500 bp putative promoter region of all genes in
the ranking. We derived the 500 bp sequences
upstream of the transcriptional start site from the
ensembl database. We chose to limit our search to
500 bp, because we and others observed earlier that
might also give reliable results if the seed was substan-
tially smaller. We applied a cross-validation strategy
by randomly dividing the original 81 targets into
two groups, one group being the seed, and the remain-
ing genes constituting the other group, named the test
group, t. Several sizes of the seed were used (1, 10, 20
0
500 1000
1500 2000
0
0.05
0.1
0.15
0.2
0.25
0 5000 10 000
0
0.1
0.2
Position of gene in the ranking
Density of occurence
"high rank"
"low rank"position
Density
Genes involved in immune response
Fig. 2. Density of occurrences of genes annotated with the term
‘immune response’ in the ranking after applying the seed-distribu-
tion-distance method. Immune response genes are highly enriched
in the top members of the rank (P < 0.0001, two-sided Mann–Whit-
ney rank-sum test). Red, individual occurrences of immune
terms of the factors for E2F and HNF4. For ETS1,
HIF-1 and c-Myc, this ontology enrichment is not as
clear as for the other three tested factors. One reason
could be the considerably lower number of gene onto-
logy annotated genes for the specific terms and, in the
case of c-Myc, the broad-spectrum ontologies [34].
The results of this jack-knife procedure also provide
an estimate of how many of the true positives will lie
in the upper 5%: about 18–39% of all targets would
be in the upper 5% of genes of the rank (26% for
NF-jB, 39% for E2F, 29% for ETS1, 18% for HIF-1,
36% for HNF4, and 20% for c-Myc). Thus, applying
the seed-distribution-distance method will enrich the
true targets in the top 5% of the rank by a factor of
4–8.
Sites enriched
0
0.5
1
1.5
2
2.5
3
Enr
i
chment o
f
putat
i
ve transcr
analysis.
Recovered position in gradient
Relative occurence
Histogram of recovery test
0 2000 4000
6000
8000 10 000
12 000
14 000
0
0.1
0.2
0.3
0.4
0.5
Original seed n = 81
Seed n = 50
Seed n = 20
Seed n = 10
Seed n = 1
Fig. 4. Recovery of target genes in a cross-validation test: the origi-
nal seed was divided into two parts: (a) a group of members for
rank construction; and (b) a test group with the remaining members
of the original seed. Histograms of the recovery position of the test
group are shown for the newly constructed ranks using the seed
without the test group (median: s,
, h, ). If, for example, 10
genes are used as a seed (71 in the test group), the relative occur-
rence of the recovered positions are still very high (h), i.e. the
enrichment capability of the seed-distribution-distance method is
plasmids in which the predicted consensus sequence of
the NF-jB-binding site was deleted. A widely used
method to induce NF-jB is stimulation by means of
TNF-a. Human HEK293 cells were transiently trans-
fected with the reporter plasmids, and TNF-a stimula-
tion (1.25–20 ngÆmL
)1
) was applied. For all three
unmodified promoters, luciferase activity was strongly
induced in a concentration-dependent manner under
TNF-a stimulation in the undeleted plasmid, very simi-
lar to our positive control NFKBIA. In contrast, in
the experiment with the plasmids in which we had
deleted the putative NF-jB sites, the concentration-
dependent stimulation effect was not seen for OPTN
and CASP4 promoters, and was strongly reduced for
the Spi-B promoter (Fig. 6), indicating that the NF-jB
action was blocked in the deleted mutant. The negative
control (DARS) did not show any significant dose-
dependent change in expression.
Furthermore, we applied the chromatin immunopre-
cipitation (ChIP) analysis in order to verify NF-jB
interaction with the predicted NF-jB-binding sites. A
positive ChiP signal was obtained for OPTN and SPI-B
as well as for NFKBIA in stimulated cells (Fig. 6). NF-
jB-dependent activation of the CASP4 promoter was
not indicated by ChIP analysis in HEK293 cells
(Fig. 6Be). This correlates well with a very low basal
promoter activity, and therefore may be attributed to
a silenced CASP4 promoter in the cellular model used.
microarray experiments to generate a distribution-dis-
tance-derived target prediction based on a seed set of
known target genes of a specific transcription factor.
The target prediction is based on a combination of
transcription factor-binding site information and the
distribution distance. We took especial care to keep
our method simple and the number of free parameters
as low as possible, so our results do not depend on
0
5
10
15
20
25
30
0
2
4
6
8
10
0
5
10
15
20
25
30
0
1
10
5
5
5
5
5
Cross-validation Gene ontology
Immune response
Liver development
Blood coagulation
Lipid metabolic process
Response to hypoxia
Angiogenesis
Extracellular matrix
Cell cycle
Transcription
Factor
0
0
5
10
15
20
25
30
c-Myc
Cel proliferation
0
10
Density (%) Density (%) Density (%) Density (%) Density (%) Density (%)
extraordinarily high amount of false-positives obtained
with purely sequence-based methods [5,7,35]. More
sophisticated clustering methods might even improve
the prediction quality further. We provide both statisti-
cal and biological evidence that the seed-distribution-
distance method is robust and applicable to other
transcription factors and is hence very useful in pre-
dicting specific transcription factor target genes.
Top rank members are involved in typical
NF-jB-regulated functions and are enriched
with putative NF-jB-binding sites
The distance criterion for generating the rank is a kind
of expression profile similarity measure with respect to
the seed group. It is not a priori clear that similarly
regulated genes share the same gene function. The
NF-jB analysis, however, reveals that the seed-distri-
bution-distance method highly enriches genes in the
top ranks that share typical NF-jB-regulated func-
tions. For instance, the processes immune responses,
complement activation, regulation of T-cell differentia-
tion and immune cell activation are significantly pres-
ent in the top group (supplementary Table S1).
Moreover, we found specific enrichment of predicted
binding motifs for NF-jB 50 and NF-jB 65 in the top
5% of the genes among three others. We would expect
the other factors to be functionally related to NF-jB.
This is the case for STAT5A, which has been reported
to be involved in severe combined immunodeficiency
[36] and is involved in the immune response [37].
Please note that these statistics were obtained without
tension glaucoma [42]. We show that a deletion of a
putative NF-jB-binding site in the promoter region of
OPTN completely abolishes the enhancing action and
modulatory effect of NF-jB on OPTN (Fig. 6).
Our experiments show clearly that OPTN is a direct
target of NF-jB. Recent findings indicated that TNF-a
potentiates glutamate neurotoxicity through the
blockade of glutamate transporter activity [43,44]. Fur-
thermore, it was shown that OPTN and NF-jB essen-
tial modulator (NEMO) are competitive inhibitors of
one another [45]. NEMO represents the regulatory
subunit of IKK, which is essential for NF-jB activa-
tion [46]. Together with our data, this makes it appar-
ent that OPTN is part of a negative feedback system
that is important for NF-jB action. Elevated OPTN
expression reduces induced NF-jB activation [45], and
is therefore protective against induced neuronal cell
death, which depends on NF-jB activity. This is in
line with findings indicating that the protective func-
tion of OPTN is lost upon truncation resulting from
the insertion of a premature stop codon, and when the
OPTN protein is changed to the mutated form E50K,
which is markedly reduced in patients suffer from
glaucoma [42]. Our data provide the missing link in
the signalling of NF-jB and the damping function of
OPTN in signalling feedback of NF-jB.
The knowledge about the direct action of NF-jBon
OPTN will greatly enhance our understanding of the
signalling pathways relevant for antiapoptosis, and will
be helpful in designing possible new cell survival strate-
10.000
Relative values
P < 0.003
P < 0.01
n.s.
OPTN
0.010
0.100
1.000
10.000
CASP4
0.001
0.010
0.100
1.000
10.000
Relative values
P < 0.03
n.s.
0
5
10
15
20
25
30
35
40
45
SPI-B SPI-B NFkB del
200
400
600
800
1000
1200
NFKBIA promoter DARS promoter
Luciferase activity(rel.values)
0
20
40
60
80
100
120
P = 0.94
A. Reporter gene activity
P < 10
–15
Lucreportergene
putative
NFkB site
deletion
–409
–409
Lucreportergene
putative
NFkB site
0
0.2
alpha
Input Anti-rabbit-AB Anti-NFkB-AB
Control
TNF-
alpha
Control TNF-
alpha
Control TNF-
alpha
Input Anti-rabbit-AB Anti-NFkB-AB
Control
TNF-
alpha
Control
TNF-
alpha
Control
TNF-
alpha
Input Anti-rabbit-AB Anti-NFkB-AB
Control
TNF-
alpha
Control TNF-
alpha
Control TNF-
alpha
Input Anti-rabbit-AB Anti-NFkB-AB
Control
1.25 ng·mL
–1
20 ng·mL
–1
TNF-alpha
Systematic TF target prediction R. Mrowka et al.
3186 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS
involved in important physiological processes related
to typical known functions of NF-jB. It is known
that the Spi-B transcription factor is expressed in
adult pro-T cells, with Spi-B being maximal in the
newly committed cells at the DN3 stage [47].
Furthermore, Spi-B can interfere with T-cell develop-
ment [47]. CASP4 can function as an endoplasmic
reticulum stress-specific caspase in humans, and may
be involved in pathogenesis of Alzheimer’s disease
[48].
When does the seed-distribution-distance
method work?
The major assumption of our method is that genes
that are regulated by the same factor show at least
some coregulation. We use a genome-wide based simi-
larity measure L(x) based on the comparisons of the
median values of two correlation distributions. For
each gene (x) in the genome, we calculate L(x), which
is the median correlation of gene x with all the genes
within the seed set minus the median correlation of
gene x with all the rest of the genes in the genome.
Our approach is able to ‘add up’ contributions form
all the genes in the seed set, and by the use of the med-
ian and not the mean, it can discard a reasonable
(supplementary Fig. S4).
A second consideration relates to the expression
dataset. The seed-distribution-distance method relies
on the assumption that the transcription factor of
interest shows some biological activity in the data. If,
for example, the transcription factor of interest is com-
pletely shut down in all experiments, one would not
expect to be able to recover the regulation response of
that factor. This issue might be of importance for
genes that are only active at tight periods during devel-
opment. One solution to this problem would be to
generate expression experiments with artificial expres-
sion of that transcription factor or to include native
material from that developmental period in the micro-
array analysis.
The third consideration relates to the size of the
seed. One would expect that if the seed is too small to
define the target response adequately, the rank will be
poorly defined. However, our bootstrapping test
showed that 10 seed genes are capable of enriching
Fig. 6. Experimental validation of predicted NF-jB targets by functional analyses and physical NF-jB interaction with the predicted NF-jB-
binding sites in the nuclear chromatin context. (A) RGA. HEK293 cells were transfected and treated for 24 h with TNF-a in a dose-dependent
manner (n = 4). (a) Schematic illustration of experimental design. RGA was measured with unmodified native promoter constructs (left col-
umn) and in constructs where the putative NF-jB-binding sites were deleted (right column, NF-jB del). (b) Promoter activity for NFKBIA,
which is known to be a target of NF-jB, and a negative control (DARS). Only the NFKBIA promoter responded in a dose-dependent manner
under stimulation with TNF-a. (c, d, e) RGA for the (c) OPTN, (d) SPI-B and (e) CASP4 promoter: All experiments showed a dose-dependent
increase in promoter activity under stimulation with TNF-a. Deletion of the putative NF-jB-binding site resulted in significantly attenuated
dose-dependent responses. (B) ChIP analysis. HEK293 cells were cultured with TNF-a (10 ngÆmL
)1
) or without (control) for 24 h prior to
suggest that the huge body of transcriptome data
available in databases can be used to strongly enhance
the prediction of transcription factor targets for cases
in which targeted microarray experiments are not
available or are too cost-intensive. The described sys-
tematic genome-wide approach for identification of
transcription factor targets is robust and efficient, and
systematically identifies new target genes for any given
transcription factor. We predict that the exploitation
of the expression data stored in public databases with
our or similar seed-based methods will improve the
search for new target genes of transcription factors.
Experimental procedures
Definition of the rank
For all gene pairs in the expression dataset, we calculated
the correlation coefficient in their expression:
r
x;y
¼
n
P
x
i
y
i
À
P
x
i
P
i
being their the expression values of gene x
and gene y in experiment i. We omitted experiments in the
calculation of the correlation coefficient where one of the
genes had no expression value, and discarded correlation
coefficients for further analysis if the number of common
experiments that could be used to calculate the correlation
coefficient was 10 or fewer. Given a seed group, we then
calculate a score L(x ) for all genes x outside the seed by
taking the median correlation to the seed and subtracting
the median correlation to all genes (i.e. its random median
correlation).
Molecular cloning
The OPTN promoter and a part of the 5¢-UTR was cloned
into the pGL3-Basic (Cat. no. E1751; Promega GmbH,
Mannheim, Germany) plasmid at the SacI and HindIII sites
using the tailed primers opti423F (5¢-ACTGAGCTCGGC
ATTCTCCTCTTTCTGTGG-3¢) and opti423R (5¢-ACGT
AAGCTTGGTGCCTAGGGCTGATGCGC-3¢).
The predicted NF-jB-binding site, corresponding to
ccgggaaattcccc, was deleted from the reporter gene con-
struct by means of a PCR strategy. The following primer
inserts were verified by DNA sequencing.
The controls DARS and NFKBIA were generated using
the MluI ⁄ XhoI sites of pGL-3Basic. Inserts were generated
by PCR using the MluI ⁄ XhoI tailed primers DARSfw
(5¢-ACTACGCGTAGTCCAAGAGAGGAGAAACC -3¢)
and DARSrv (5¢-ACTCTCGAGCCCGGAGCGCTGGCG
GCCGC-3¢), and NFKBIAfw (5¢-ACTGAGCTCCCGA
CGACCCCAATTCAAATCG-3¢) and NFKBIArv (5¢-ACT
Cotransfections were performed with the firefly luciferase
pGL3-basic vector (Promega), as well as its transformed
promoter variants, and the Renilla luciferase phRL-TK
vector using the RotiFect Reagent (Carl Roth GmbH,
Karlsruhe, Germany), according to the manufacturer’s pro-
tocol. After 6 h, the transfection medium was removed, and
medium supplemented with TNF-a solvent (controls) or
medium supplemented with TNF-a (1.25–20 ngÆmL
)1
,
n = 4 each) was added, and cells were incubated for 24 h.
Systematic TF target prediction R. Mrowka et al.
3188 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS
Luciferase assays
Cells were lysed after 24 h of treatment using 30 lLof
passive lysis buffer (Cat. no. E1941; Promega) after med-
ium removal and gentle washing with NaCl ⁄ P
i
. The
assays were performed on a Luminoskan RS (Labsystems
Luminoscan RS, Helsinki, Finland) plate-luminometer
using the injector system. The firefly luminescence was
measured by injecting 100 lL of buffer 1 (470 lmd-lucif-
erin, 270 lm CoA, 33.3 mm dithiothreitol, 530 lm ATP,
2.67 mm MgSO
4
,20mm Tricine, 0.1 mm EDTA), and the
Renilla luminescence was measured after injecting 100 lL
of buffer 2 (1.1 m NaCl, 2.2 mm Na
2
the ChIP assay kit [Cat. no. 17-295; Millipore GmbH
(Upstate), Schwalbach ⁄ Ts, Germany] was applied according
to the manufacturer’s protocol, and an anti-(rabbit serum)
(Cat. no. sc-2317, negative antibody control Santa Cruz
Biotechnology, Inc., Heidelberg, Germany) and an anti-
body to NF-jB p65 (A) (Cat. no. sc-109; Santa Cruz Bio-
technology, Inc.) were used. The immunoprecipitated DNA
was purified and then quantified by real-time PCR (Gene-
Amp 5700; Applied Biosystems, Darmstadt, Germany) using
SYBR green and the ready-to-use heat-activated ImmoMix
(Cat. no. 25020; Bioline, Luckenwalde, Germany). The
following primers, bridging the predicted NF-jB-binding
sites, were used for ChIP analysis: NFKBIA forward,
5¢-ACCCCAGCTCAGGGTTTAGGCTTCT-3¢; NFKBIA
reverse, 5¢-TGGCTGGGGATTTCTCTGGG-3¢; OPTN
forward, 5¢-ACCCGGGTCCCAGCCTCGAC-3¢; OPTN
reverse, 5¢-GACAGCCAGCCGCTCCCTGC-3¢; SPI-B for-
ward, 5¢-TCCAGCTCCTGTCCCATCTC-3 ¢; SPI-B reverse,
5¢-TGTCACATGGCAGGGATGGC-3¢; and CASP4 for-
ward, 5¢-GTCTGGCAACCCCTGTTGAAT-3¢; CASP4
reverse, 5¢-GCCTGCTGGCTCTGAAGAGTATC-3¢. Ampli-
fication of a coding region part of the intron-less
gene encoding glyceraldehyde-3-phosphate dehydrogenase
(GAPDH: forward, 5¢-CACCATCTTCCAGGAGCGAG-3¢;
and reverse, 5¢-GCAGGAGGCATTGCTGAT-3¢) served as
control DNA.
Databases
The sequences of the 500 bp upstream regions of all human
genes, as well as the annotation with terms from the gene
ontology, were obtained from emsembl [54], using the tool
The binomial test was used to test for binding site enrich-
ment in the top group. The two categories for the binomial
were: gene having a specific binding site is in the top group,
gene is not in the top group. The null hypothesis was that
there is no deviation of the observed distribution from the
theoretical distribution that would be present if there was
no preference. The alternative hypothesis was that there is
a deviation in a one-tailed manner (enrichment, depletion).
The consensus sequences for vertebrate transcription factors
from transfac version 6.1 [56] were used for predic-
tion, and transcription factors with a minimum genome-
wide promoter hit count of 30 were included in the
analysis.
R. Mrowka et al. Systematic TF target prediction
FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3189
Reporter gene activity
For the concentration- and group-dependent analysis, we
applied a two-way anova with repeated measurement sta-
tistics, and the null hypothesis was rejected at the 0.05 level.
Reporter gene activity is presented as mean and standard
deviation.
Gene ontology overrepresentation
Genes were annotated with gene ontology annotation using
ENSMART. For each gene ontology term, we then tested
whether it is overrepresented in the annotation of the upper
600 genes using a multiple-testing corrected Fisher’s exact
test. This test is based on the hypergeometric distribution
and calculates a false-discovery rate for each P-value
threshold. We selected a maximum expectable false-
discovery rate of 0.05 to determine significantly overre-
informatics for the identification of regulatory elements.
Nat Rev Genet 5, 276–287.
6 Bulyk ML (2003) Computational prediction of tran-
scription-factor binding site locations. Genome Biol 5,
201, doi: 10.1186/gb-2003-5-1-201.
7 Wasserman WW, Palumbo M, Thompson W, Fickett
JW & Lawrence CE (2000) Human–mouse genome
comparisons to locate regulatory sites. Nat Genet 26,
225–228.
8 Dieterich C, Grossmann S, Tanzer A, Ropcke S, Arndt
PF, Stadler PF & Vingron M (2005) Comparative pro-
moter region analysis powered by CORG. BMC
Genomics 6, 24, doi: 10.1186/1471-2164-6-24.
9 Eisen MB, Spellman PT, Brown PO & Botstein D
(1998) Cluster analysis and display of genome-wide
expression patterns. Proc Natl Acad Sci USA 95,
14863–14868.
10 Quackenbush J (2001) Computational analysis of micro-
array data. Nat Rev Genet 2, 418–427.
11 Beissbarth T (2006) Interpreting experimental results
using gene ontologies. Methods Enzymol 411, 340–352.
12 Bluthgen N, Kielbasa SM & Herzel H (2005) Inferring
combinatorial regulation of transcription in silico.
Nucleic Acids Res 33, 272–279.
13 Qian Z, Lu L, Liu X, Cai YD & Li Y (2007) An
approach to predict transcription factor DNA binding
site specificity based upon gene and transcription factor
functional categorization. Bioinformatics 23, 2449–2454.
14 Walker MG, Volkmuth W & Klingler TM (1999)
Pharmaceutical target discovery using Guilt-by-Associa-
22 Sarkar FH & Li Y (2008) NF-kappaB: a potential
target for cancer chemoprevention and therapy. Front
Biosci 13, 2950–2959.
Systematic TF target prediction R. Mrowka et al.
3190 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS
23 Carmody RJ & Chen YH (2007) Nuclear factor-kap-
paB: activation and regulation during toll-like receptor
signaling. Cell Mol Immunol 4, 31–41.
24 Hayden MS & Ghosh S (2004) Signaling to NF-kap-
paB. Genes Dev 18, 2195–2224.
25 Wu JT & Kral JG (2005) The NF-kappaB ⁄ IkappaB sig-
naling system: a molecular target in breast cancer ther-
apy. J Surg Res 123, 158–169.
26 Stuart JM, Segal E, Koller D & Kim SK (2003) A
gene-coexpression network for global discovery of con-
served genetic modules. Science 302, 249–255.
27 Demeter J, Beauheim C, Gollub J, Hernandez-Boussard
T, Jin H, Maier D, Matese JC, Nitzberg M, Wymore F,
Zachariah ZK et al. (2007) The Stanford Microarray
Database: implementation of new analysis tools and
open source release of software. Nucleic Acids Res 35,
D766–D770.
28 Dieterich C, Cusack B, Wang H, Rateitschak K, Kra-
use A & Vingron M (2002) Annotating regulatory
DNA based on man–mouse genomic comparison. Bio-
informatics 18(Suppl. 2), S84–S90.
29 Bracken AP, Ciro M, Cocito A & Helin K (2004) E2F
target genes: unraveling the biology. Trends Biochem
Sci 29, 409–417.
30 Xu X, Bieda M, Jin VX, Rabinovich A, Oberley MJ,
D et al. (2002) Adult-onset primary open-angle glau-
coma caused by mutations in optineurin. Science 295
,
1077–1079.
39 Quigley HA (1996) Number of people with glaucoma
worldwide. Br J Ophthalmol 80, 389–393.
40 Quigley HA & Vitale S (1997) Models of open-angle
glaucoma prevalence and incidence in the United States.
Invest Ophthalmol Vis Sci 38, 83–91.
41 Li Y, Kang J & Horwitz MS (1998) Interaction of an
adenovirus E3 14.7-kilodalton protein with a novel
tumor necrosis factor alpha-inducible cellular protein
containing leucine zipper domains. Mol Cell Biol 18,
1601–1610.
42 De Marco N, Buono M, Troise F & Diez-Roux G
(2006) Optineurin increases cell survival and translo-
cates to the nucleus in a Rab8-dependent manner upon
an apoptotic stimulus. J Biol Chem 281, 16147–16156.
43 Beg AA & Baltimore D (1996) An essential role for
NF-kappaB in preventing TNF-alpha-induced cell
death. Science 274, 782–784.
44 Zou JY & Crews FT (2005) TNF alpha potentiates glu-
tamate neurotoxicity by inhibiting glutamate uptake in
organotypic brain slice cultures: neuroprotection by
NF kappa B inhibition. Brain Res 1034, 11–24.
45 Zhu G, Wu CJ, Zhao Y & Ashwell JD (2007) Optineu-
rin negatively regulates TNFalpha-induced NF-kappaB
activation by competing with NEMO for ubiquitinated
RIP. Curr Biol 17, 1438–1443.
46 Rudolph D, Yeh WC, Wakeham A, Rudolph B, Nallai-
54 Hubbard T, Barker D, Birney E, Cameron G, Chen Y,
Clark L, Cox T, Cuff J, Curwen V, Down T et al.
(2002) The Ensembl genome database project. Nucleic
Acids Res 30, 38–41.
55 Hammond MP & Birney E (2004) Genome information
resources – developments at Ensembl. Trends Genet 20,
268–272.
56 Wingender E, Dietze P, Karas H & Knuppel R (1996)
TRANSFAC: a database on transcription factors and
their DNA binding sites. Nucleic Acids Res 24, 238–241.
57 Bluthgen N, Brand K, Cajavec B, Swat M, Herzel H &
Beule D (2005) Biological profiling of gene groups uti-
lizing gene ontology. Genome Inform Ser Workshop
Genome Inform 16, 106–115.
58 Sun SC, Ganchi PA, Ballard DW & Greene WC (1993)
NF-kappa B controls expression of inhibitor I kappaB
alpha: evidence for an inducible autoregulatory path-
way. Science 259, 1912–1915.
59 Edbrooke MR, Burt DW, Cheshire JK & Woo P (1989)
Identification of cis-acting sequences responsible for
phorbol ester induction of human serum amyloid A
gene expression via a nuclear factor kappaB-like tran-
scription factor. Mol Cell Biol 9, 1908–1916.
60 O’Donnell SM, Holm GH, Pierce JM, Tian B, Watson
MJ, Chari RS, Ballard DW, Brasier AR & Dermody
TS (2006) Identification of an NF-kappaB-dependent
gene network in cells infected by mammalian reovirus.
J Virol 80, 1077–1086.
61 Guitart A, Riezu-Boj JI, Elizalde E, Larrea E, Berasain
C, Aldabe R, Civeira MP & Prieto J (2005) Hepatitis C
the different transcription factors described in this
article.
Fig. S1. Histograms of correlation coefficients from
expression data for three individual genes.
Fig. S2. Cumulative histograms of the cross-validation
analysis with different seed sizes.
Fig. S3. Cross-validation of the seed distribution
method in for six different transcription factors by
means of the median-based ranking procedure as used
in the article and a ranking procedure based on P-val-
ues of Mann–Whitney statistics.
Fig. S4. Histograms of correlation coefficients of
expression data for individual seed groups and all
possible pairs.
Table S1. Analysis of the overrepresented gene ontol-
ogy classifications of the top 600 genes in the rank
with a false discovery rate of less than 1 0.001.
Table S2. List of ensembl gene IDs used as seeds for
the seed distribution method of this article.
Table S3. Literature sources of the seed lists.
Table S4. Distribution of enrichment of putative tran-
scription factor-binding motifs (transfac) in the rank-
ing after applying the seed-distribution-distance
method.
Table S5. Sequences of the NF-jB consensi that have
been used in the analysis.
This material is available as part of the online article
from
Please note: Blackwell Publishing are not responsible
for the content or functionality of any supplementary