ClonalFrameML

NAME

ClonalFrameML - Efficient Inference of Recombination in Whole Bacterial Genomes

SYNOPSIS

ClonalFrameML newick_file fasta_file output_file [OPTIONS]

ClonalFrameML is a software package that performs efficient inference of recombination in bacterial genomes. ClonalFrameML was created by Xavier Didelot and Daniel Wilson. ClonalFrameML can be applied to any type of aligned sequence data, but is especially aimed at analysis of whole genome sequences. It is able to compare hundreds of whole genomes in a matter of hours on a standard Desktop computer. There are three main outputs from a run of ClonalFrameML: a phylogeny with branch lengths corrected to account for recombination, an estimation of the key parameters of the recombination process, and a genomic map of where recombination took place for each branch of the phylogeny.

ClonalFrameML is a maximum likelihood implementation of the Bayesian software ClonalFrame which was previously described by Didelot and Falush (2007). The recombination model underpinning ClonalFrameML is exactly the same as for ClonalFrame, but this new implementation is a lot faster, is able to deal with much larger genomic dataset, and does not suffer from MCMC convergence issues

OPTIONS

Options specifying the analysis type

-em: true (default) or false Estimate parameters by a Baum-Welch expectation maximization algorithm.

-embranch: true or false (default) Estimate parameters for each branch using the EM algorithm.

-rescale_no_recombination: true or false (default) Rescale branch lengths for given sites with no recombination model.

-imputation_only: true or false (default) Perform only ancestral state reconstruction and imputation.

Options affecting all analyses

-kappa: value > 0 (default 2.0) Relative rate of transitions vs transversions in substitution model

-fasta_file_list: true or false (default) Take fasta_file to be a white-space separated file list.

-xmfa_file: true or false (default) Take fasta_file to be an XMFA file.

-ignore_user_sites: sites_file Ignore sites listed in whitespace-separated sites_file.

-ignore_incomplete_sites: true or false (default) Ignore sites with any ambiguous bases.

-use_incompatible_sites: true (default) or false Use homoplasious and multiallelic sites to correct branch lengths.

-show_progress: true or false (default) Output the progress of the maximum likelihood routines.

-chromosome_name: name, eg "chr" Output importation status file in BED format using given chromosome name.

-min_branch_length: value > 0 (default 1e-7) Minimum branch length.

-reconstruct_invariant_sites: true or false (default) Reconstruct the ancestral states at invariant sites.

-label_uncorrected_tree: true or false (default) Regurgitate the uncorrected Newick tree with internal nodes labelled.

Options affecting -em and -embranch:

-prior_mean: df "0.1 0.001 0.1 0.0001" Prior mean for R/theta, 1/delta, nu and M.

-prior_sd: df "0.1 0.001 0.1 0.0001" Prior standard deviation for R/theta, 1/delta, nu and M.

-initial_values: default "0.1 0.001 0.05" Initial values for R/theta, 1/delta and nu.

-guess_initial_m: true (default) or false Initialize M and nu jointly in the EM algorithms.

-emsim: value >= 0 (default 0) Number of simulations to estimate uncertainty in the EM results.

-embranch_dispersion: value > 0 (default .01) Dispersion in parameters among branches in the -embranch model.

Options affecting -rescale_no_recombination:

-brent_tolerance: tolerance (default .001) Set the tolerance of the Brent routine for -rescale_no_recombination.

-powell_tolerance: tolerance (default .001) Set the tolerance of the Powell routine for -rescale_no_recombination.

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.

September 2017

ClonalFrameML 1.11