NAME

ClonalFrameML - Efficient Inference of Recombination in Whole Bacterial Genomes

SYNOPSIS

ClonalFrameML newick_file fasta_file output_file [OPTIONS]

DESCRIPTION

ClonalFrameML is a software package that performs efficient inference of recombination in bacterial genomes. ClonalFrameML was created by Xavier Didelot and Daniel Wilson. ClonalFrameML can be applied to any type of aligned sequence data, but is especially aimed at analysis of whole genome sequences. It is able to compare hundreds of whole genomes in a matter of hours on a standard Desktop computer. There are three main outputs from a run of ClonalFrameML: a phylogeny with branch lengths corrected to account for recombination, an estimation of the key parameters of the recombination process, and a genomic map of where recombination took place for each branch of the phylogeny.
ClonalFrameML is a maximum likelihood implementation of the Bayesian software ClonalFrame which was previously described by Didelot and Falush (2007). The recombination model underpinning ClonalFrameML is exactly the same as for ClonalFrame, but this new implementation is a lot faster, is able to deal with much larger genomic dataset, and does not suffer from MCMC convergence issues

OPTIONS

Options specifying the analysis type

-em
true (default) or false Estimate parameters by a Baum-Welch expectation maximization algorithm.
-embranch
true or false (default) Estimate parameters for each branch using the EM algorithm.
-rescale_no_recombination
true or false (default) Rescale branch lengths for given sites with no recombination model.
-imputation_only
true or false (default) Perform only ancestral state reconstruction and imputation.

Options affecting all analyses

-kappa
value > 0 (default 2.0) Relative rate of transitions vs transversions in substitution model
-fasta_file_list
true or false (default) Take fasta_file to be a white-space separated file list.
-xmfa_file
true or false (default) Take fasta_file to be an XMFA file.
-ignore_user_sites
sites_file Ignore sites listed in whitespace-separated sites_file.
-ignore_incomplete_sites
true or false (default) Ignore sites with any ambiguous bases.
-use_incompatible_sites
true (default) or false Use homoplasious and multiallelic sites to correct branch lengths.
-show_progress
true or false (default) Output the progress of the maximum likelihood routines.
-chromosome_name
name, eg "chr" Output importation status file in BED format using given chromosome name.
-min_branch_length
value > 0 (default 1e-7) Minimum branch length.
-reconstruct_invariant_sites
true or false (default) Reconstruct the ancestral states at invariant sites.
-label_uncorrected_tree
true or false (default) Regurgitate the uncorrected Newick tree with internal nodes labelled.

Options affecting -em and -embranch:

-prior_mean
df "0.1 0.001 0.1 0.0001" Prior mean for R/theta, 1/delta, nu and M.
-prior_sd
df "0.1 0.001 0.1 0.0001" Prior standard deviation for R/theta, 1/delta, nu and M.
-initial_values
default "0.1 0.001 0.05" Initial values for R/theta, 1/delta and nu.
-guess_initial_m
true (default) or false Initialize M and nu jointly in the EM algorithms.
-emsim
value >= 0 (default 0) Number of simulations to estimate uncertainty in the EM results.
-embranch_dispersion
value > 0 (default .01) Dispersion in parameters among branches in the -embranch model.

Options affecting -rescale_no_recombination:

-brent_tolerance
tolerance (default .001) Set the tolerance of the Brent routine for -rescale_no_recombination.
-powell_tolerance
tolerance (default .001) Set the tolerance of the Powell routine for -rescale_no_recombination.

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.