Beagle - Genotype calling, genotype phasing and imputation of ungenotyped
markers
java -Xmx[GB]g -jar /usr/share/beagle/beagle.jar [
options]
Beagle performs genotype calling, genotype phasing, imputation of ungenotyped
markers, and identity-by-descent segment detection. Genotypic imputation works
on phased haplotypes using a Li and Stephens haplotype frequency model. Beagle
also implements the Refined IBD algorithm for detecting
homozygosity-by-descent (HBD) and identity-by-descent (IBD) segments.
-
gt=filename
- Optional
Specifies a VCF file containing a GT (genotype) format field for each
marker. If a genotype contains the phased allele separator, "|",
then Beagle will preserve the phase of the genotype during the analysis.
If you use the gt argument, all genotypes in the output file will be
phased and non-missing.
-
gl=filename
- Optional
Specifies a VCF file containing a GL or PL (genotype likelihood) format
field for each marker. Any data in the GT format field will be ignored. If
both GL and PL format fields are present for a marker, the GL format will
be used.
-
gtgl=filename
- Optional
Specifies a VCF file containing a GT, GL or PL format field for each marker.
If a genotype is non-missing, Beagle will ignore the genotype likelihood.
If both GL and PL format fields are present for a marker, the GL field
will be used.
-
ref=filename
- Optional
Specifies a VCF file containing phased reference genotypes. See the impute
parameter.
-
out=prefix
- Required
Specifies the output filename prefix. The prefix may be an absolute or
relative filename, but it cannot be a directory name.
-
excludesamples=filename
- Optional
Specifies a file containing non-reference samples (one sample per line) to
be excluded from the analysis and output files.
-
excludemarkers=filename
- Optional
Specifies a file containing markers (one marker per line) to be excluded
from the analysis and the output files. An excluded marker identifier can
either be an identifier from the VCF record’s ID field or a genomic
coordinate in the format: CHROM:POS.
-
map=filename
- Optional
Specifies a PLINK format genetic map on the cM scale. HapMap GrCh36 and
GrCh37 genetic maps in PLINK format are available for download from the
Beagle website. Use of a genetic map is recommended if you are imputing
ungenotyped markers. If no genetic map is specified, Beagle will assume a
constant recombination rate of 1 cM / Mb.
-
chrom=chrom:start-end
- Optional
Specifies a chromosome or chromosome interval using a chromosome identifier
in the VCF file and the starting and ending positions of the interval. The
entire chromosome, the beginning of the chromosome, and the end of a
chromosome can be specified by chrom=[chrom],
chrom=[chrom:-end], and chrom=[chrom:start-]
respectively.
-
maxlr=number_≥_1
- Default = 5000
Specifies the maximum likelihood ratio at a genotype. If M is the maximum of
the likelihoods of each possible genotype, any likelihood that is less
than (M ⁄ maxlr) is set to 0.0 to improve computational efficiency.
-
nthreads=positive_integer
- Default: machine-dependent
Specifies the number of threads of execution. If no nthreads
parameter is specified, the nthreads parameter will be set equal to
the number of CPU cores on the host machine.
-
lowmem=true/false
- Default = false
Specifies whether a memory efficient algorithm should be used. The memory
efficient algorithm increases run-time by a factor less than 2.0.
-
window=positive_integer
- Default = 50000
Specifies the number of markers to include in each sliding window. The
window parameter must be at least twice as large as the overlap
parameter. The window parameter controls the amount of memory used
in the analysis. For human data, it is recommended that the window
parameter be greater than or equal to the typical number of markers in 5
cM.
-
overlap=positive_integer
- Default = 3000
Specifies the number of markers of overlap between sliding windows. For
human data, it is recommended that the overlap be set to the typical
number of markers in 0.5 cM (when ibd=false) or 2.0 cM (when
ibd=true).
-
seed=integer
- Default = -99999
Specifies the seed for the random number generator.
-
niterations=non-negative_integer
- Default = 5
Specifies the number of phasing iterations. The phasing iterations are
preceded by 10 burn-in iterations which carry out the Beagle version 4.0
phasing algorithm. If you want to phase your data with the Beagle 4.0
phasing algorithm, use niterations=0. Accuracy and compute time
increase with the number of iterations.
-
impute=true/false
- Default = true
Specifies whether markers that are present in the reference panel but absent
in your data will be imputed. This option has no effect if the ref
and gt arguments are not used.
-
gprobs=true/false
- Default = false
Specifies whether a GP (genotype probability) format field will be included
in the output VCF file when imputing ungenotyped markers. By default, a GP
fields is not printed because a DS (alternate allele dose) format field is
always printed when imputing ungenotyped markers.
-
ne=integer
- Default = 1000000
Specifies the effective population size when imputing ungenotyped markers.
The default value is suitable for a large outbred human population.
Smaller values in the hundreds or thousands for the ne parameter
are suggested for inbred human and animal populations.
-
err=non-negative_number
- Default = 0.0001
Specifies the allele miscall rate. The default value should give good
results for most sequence and SNP array data.
-
cluster=non-negative_number
- Default = 0.005
Specifies the maximum cM distance between individual markers that are
combined into an aggregate marker when imputing ungenotyped markers.
-
ibd=true/false
- Default = false
Specifies whether IBD analysis will be performed when the gt argument
is used.
-
ibdlod=non-negative_integer
- Default = 3.0
Specifies the minimum LOD score for reported IBD.
-
ibdscale=non-negative_number
- Default: data-dependent
Specifies the scale parameter used to build the haplotype frequency model
for IBD analysis. If no ibdscale parameter is specified the scale
parameter for the IBD analysis will be set to max{2, sqrt[sample
size]/100}, which we have found to work well for outbred populations.
-
ibdtrim=non-negative_integer
- Default = 40
Specifies the number of markers trimmed from the end of a shared haplotype
when testing for IBD. Note: The default ibdtrim parameter is
designed for European samples genotyped with a 1M SNP array (~ 1 marker
per 3 kb). For human SNP array data, it is recommended to set the
ibdtrim parameter to the typical number of markers in a 0.15 cM
region. Pilot studies of randomly selected genomic regions can be used to
fine-tune the values of the ibdtrim parameter.
https://faculty.washington.edu/browning/beagle/beagle.html
Beagle was written by Brian L. Browning.
This manual page was written by Dylan Aïssi <
[email protected]>,
for the Debian project (but may be used by others).