QTLtools fenrich - Functional enrichment of molecular QTLs
QTLtools fenrich --qtl significanty_genes.bed --tss
gene_tss.bed --bed TFs.encode.bed.gz --out
output.txt [OPTIONS]
This mode allows assessing whether a set of QTLs fall within some functional
annotations more often than what is expected by chance. The method is detailed
in <
https://www.nature.com/articles/ncomms15452>. Here, we mean by
chance is what is expected given the non-uniform distributions of molQTLs and
functional annotations around the genomic positions of the molecular
phenotypes. To do so, we first enumerate all the functional annotations
located nearby a given molecular phenotype. In practice, for X phenotypes
being quantified, we have X lists of annotations. And, for the subset Y of
those having a significant molQTL, we count how often the Y molQTLs overlap
the annotations in the corresponding lists: this gives the observed overlap
frequency fobs(Y) between molQTLs and functional annotations. Then, we permute
the lists of functional annotations across the phenotypes (e.g, phenotype A
may be assigned the list of annotations coming from phenotype B) and for each
permuted data set, we count how often the Y molQTLs do overlap the newly
assigned functional annotations: this gives the expected overlap frequency
fexp(Y) between molQTLs and functional annotations. By doing this permutation
scheme, we keep the distribution of functional annotations and molQTLs around
molecular phenotypes unchanged. Now that we have the observed and expected
overlap frequencies, we use a fisher test to assess how fobs(Y) and fexp(Y)
differ. This gives an odd ratio estimate and a two-sided p-value which
basically tells us first if there is and enrichment or depletion, and second
how significant this is.
- --qtl in.bed
- List of QTLs of interest in BED format. REQUIRED.
- --bed functional_annotation.bed.gz
- Functional annotations in BED format. REQUIRED.
- --tss genes.bed
- List of positions of all phenotypes you mapped QTLs for, in
BED format. REQUIRED.
- --out output.txt
- Output file. REQUIRED.
- --permute integer
- Number of permutation to run. DEFAULT=1000
- --qtl file
- List of QTLs of interest. An example:
1 15210 15211 1_15211 ENSG00000227232.4 -
1 735984 735985 1_735985 ENSG00000177757.1 +
1 735984 735985 1_735985 ENSG00000240453.1 -
1 739527 739528 1_739528 ENSG00000237491.4 +
The column definitions are:
1 |
The variant chromosome |
2 |
The variant's start position (0-based) |
3 |
The variant's end position (1-based) |
4 |
The variant ID |
5 |
The phenotype ID |
6 |
The phenotype's strand. (not used) |
- --bed file
- List of annotations in BED format. An example:
1 254874 265487
1 730984 735985
1 734984 736585
1 739527 748528
The column definitions are:
1 |
Chromosome |
2 |
Start position (0-based) |
3 |
End position (1-based) |
- --tss file
- List of positions of all phenotypes you mapped QTLs for. An
example:
1 29369 29370 ENSG00000227232.4 1_15211 -
1 135894 135895 ENSG00000268903.1 1_985446 -
1 137964 137965 ENSG00000269981.1 1_1118728 -
1 317719 317720 ENSG00000237094.7 1_15211 +
The column definitions are:
1 |
Phenotype's chromosome |
2 |
The start position of the phenotype (0-based) |
3 |
The end position of the phenotype (1-based) |
4 |
The phenotype ID |
5 |
Top variant (not used) |
6 |
The phenotype's strand |
- --out file
- Space separated results output file detailing the
enrichment with the following columns:
1 |
The observed number of QTLs falling within the functional
annotations |
2 |
The total number of QTLs |
3 |
The mean expected number of QTLs falling within the functional
annotations (across multiple permutations) |
4 |
The standard deviation of the expected number of QTLs falling within
the functional annotations (across multiple permutations) |
5 |
The empirical p-value |
6 |
Lower bound of the 95% confidence interval of the odds ratio |
7 |
The odds ratio |
8 |
Upper bound of the 95% confidence interval of the odds ratio |
- 1
- You need to prepare a BED file containing the positions of
the QTLs of interest. To do so, extract all significant hits at a given
FDR threshold (e.g. 5%), and then transform the significant QTL list into
a BED file:
-
- Rscript ./script/qtltools_runFDR_cis.R
results.genes.full.txt.gz 0.05 results.genes
cat results.genes.significant.txt | awk '{ print $9, $10-1, $11, $8, $1, $5
}' | tr ' ' '\t' | sort -k1,1V -k2,2g >
results.genes.significant.bed
- 2
- Prepare a BED file containing the positions of all
phenotypes you mapped QTLs for:
-
- zcat results.genes.full.txt.gz | awk '{ print $2, $3-1, $4,
$1, $8, $5 }' | tr ' ' '\t' | sort -k1,1V -k2,2g >
results.genes.quantified.bed
- 3
- Run the enrichment analysis:
-
- QTLtools fenrich --qtl results.genes.significant.bed --tss
results.genes.quantified.bed --bed TFs.encode.bed.gz --out
enrichment.QTL.in.TF.txt
QTLtools(1)
QTLtools website: <
https://qtltools.github.io/qtltools>
- o
- Please submit bugs to
<https://github.com/qtltools/qtltools>
Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL
discovery and analysis.
Nat Commun 8, 15452 (2017).
<
https://doi.org/10.1038/ncomms15452>
Olivier Delaneau (
[email protected]), Halit Ongen
(
[email protected])