QTLtools mbv - Match genotypes in a VCF to a BAM file
QTLtools mbv --bam [sample.bam|sample.sam|sample.cram] --vcf
[in.vcf|in.bcf|in.vcf.gz] --out
output_file [OPTIONS]
This mode checks if the genotypes in the VCF are observed in the RNAseq reads in
the BAM file to quickly solve sample mislabeling and detect cross-sample
contamination and PCR amplification bias. The details of the method are
described <
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044394/>. In
brief, we measure, for each individual in the VCF, the proportions of
heterozygous and homozygous genotypes for which both alleles are captured by
the sequencing reads in the BAM file. A 'match' would have close to 100%
concordance for both measures, whereas a 'mismatch' will have significantly
lower concordance for both metrics. Increased cross-sample contaminations
leads to decreased homozygous concordance values with no change in
heterozygous concordance while increased amplification bias leads to decreased
heterozygous concordance with no change in homozygous concordance. We
recommend using uniquely mapping reads only by specifying the correct
--filter-mapping-quality.
- --vcf
[in.vcf|in.bcf|in.vcf.gz]
- Genotypes in VCF/BCF format. Should contain all the samples
in the dataset. REQUIRED.
- --bam
[in.bam|in.sam|in.cram]
- Sequence data in BAM/SAM/CRAM format. REQUIRED.
- --out output
- Output file name REQUIRED.
- --reg chr:start-end
- Genomic region to be processed. E.g.
chr4:12334456-16334456, or chr5
- --filter-mapping-quality integer
- Minimum mapping quality for a read or read pair to be
considered. Set this to only include uniquely mapped reads.
DEFAULT=10
- --filter-base-quality integer
- Minimum phred quality for a base to be considered.
DEFAULT=5
- --filter-binomial-pvalue float
- Binomial p-value threshold below which a heterozygous
genotype is considered as exhibiting allelic imbalance. DEFAULT=0.05
- --filter-minimal-coverage integer
- Minimum number of reads overlapping a genotype for it to be
considered. DEFAULT=10
- --filter-imputation-qual float
- Minimum imputation information score for a variant to be
considered. DEFAULT=0.9
- --filter-imputation-prob float
- Minimum posterior probability for a genotype to be
considered. DEFAULT=0.99
- --filter-keep-duplicates
- Keep reads designated as duplicate by the aligner.
-
--out filename
- This file does not have header and it contains the
following columns:
1 |
The sample ID in the VCF against which the sequence data has been
matched |
2 |
The number of missing genotypes for this sample |
3 |
The total number of heterozygous genotypes examined |
4 |
The total number of homozygous genotypes examined |
5 |
The number of heterozygous genotypes considered for the matching,
i.e. those that are covered by more than
--filter-minimal-coverage |
6 |
The number of homozygous genotypes considered for the matching, i.e.
those that are covered by more than --filter-minimal-coverage
|
7 |
The number of heterozygous genotypes that match between this sample
and the BAM file |
8 |
The number of homozygous genotypes that match between this sample
and the BAM file |
9 |
The percentage of heterozygous genotypes that match between this
sample and the BAM file |
10 |
The percentage of homozygous genotypes that match between this
sample and the BAM file |
11 |
The number of heterozygous genotypes with significant allelic
imbalance |
- o
- Running mbv on an RNAseq sample mapped with GEM:
-
- QTLtools mbv --bam HG00381.chr22.bam --out
HG00381.chr22.mbv.txt --vcf genotypes.chr22.vcf.gz
--filter-mapping-quality 150
You can then plot column 9 vs. 10 to identify the genotyped sample in the
VCF that matches best your sequence data.
QTLtools(1)
QTLtools website: <
https://qtltools.github.io/qtltools>
Please submit bugs to <
https://github.com/qtltools/qtltools>
Fort A., Panousis N. I., Garieri M. et al. MBV: a method to solve sample
mislabeling and detect technical bias in large combined genotype and
sequencing assay datasets,
Bioinformatics 33(12), 1895 2017.
<
https://doi.org/10.1093/bioinformatics/btx074>
Olivier Delaneau (
[email protected]), Halit Ongen
(
[email protected])