QTLtools pca - Conducts PCA
QTLtools pca --vcf [in.vcf|
in.vcf.gz|
in.bcf] |
--bed in.bed.gz --out output.txt [OPTIONS]
This mode allows performing a Principal Component Analysis (PCA) either on
molecular phenotype quantifications or genotype data. It is typically used (i)
to detect outliers in the data, (ii) to detect stratification in the data or
(iii) to build a covariate matrix before QTL mapping. QTLtools' PCA
implementation utilizes singular value decomposition (SVD). When building a
covariate matrix to account for technical covariates we recommend using
--center and
--scale.
- --vcf
[in.vcf|in.bcf|in.vcf.gz|in.bed.gz
]
- Genotypes in VCF/BCF/BED format. REQUIRED unless
--bed.
- --bed quantifications.bed.gz
- Quantifications in BED format. REQUIRED unless
--vcf.
- --out output_prefix
- Output file prefix. REQUIRED.
- --center
- Center the variables (genotypes or phenotypes) by
subtracting the mean from each value
- --scale
- Scale the variables (genotypes or phenotypes) by dividing
each value by the standard deviation
- --region chr:start-end
- Genomic region to be processed. E.g.
chr4:12334456-16334456, or chr5
- --exclude-chrs string
- The chromosomes to exclude given as a space separated list.
Only applies to --vcf. DEFAULT="X Y M MT XY chrX chrY chrM
chrMT chrXY"
- --maf float
- Exclude sites with minor allele frequency less than this.
Only applies to --vcf. DEFAULT=0.0
- --distance integer
- Only include sites separated with this many base pairs.
Only applies to --vcf. DEFAULT=0
- .pca
- This file contains the principal components that were
calculated. The names of the principal components, which is given in the
first column, is composed of the output file prefix, whether the data was
centered, whether the data was scaled, and the principal component
number.
- .pca_stats
- This file contains the standard deviation of each principal
component, and the variance and the cumulative variance explained by each
PC.
- o
- Running pca on RNAseq quantifications to calculate
technical covariates:
-
- QTLtools pca --bed genes.50percent.chr22.bed.gz --out
genes.50percent.chr22 --center --scale
- o
- Running pca on genotypes to detect population
stratification:
-
- QTLtools pca --vcf genotypes.chr22.vcf.gz --out
genotypes.chr22 --center --scale --maf 0.05 --distance 5000
QTLtools(1)
QTLtools website: <
https://qtltools.github.io/qtltools>
- o
- Versions up to and including 1.2, suffer from a bug in
reading missing genotypes in VCF/BCF files. This bug affects variants with
a DS field in their genotype's FORMAT and have a missing genotype (DS
fiels is .) in one of the samples, in which case genotypes for all the
samples are set to missing, effectively removing this variant from the
analyses.
Please submit bugs to <
https://github.com/qtltools/qtltools>
Halit Ongen (
[email protected]), Olivier Delaneau
(
[email protected])