NAME
DefineClones.py - Repertoire clonal assignment toolkit (Python 3)DESCRIPTION
usage: DefineClones.py [--version] [-h] -d DB_FILES [DB_FILES ...]- [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
- [--outname OUT_NAME] [--log LOG_FILE] [--failed] [--format {airr,changeo}] [--nproc NPROC] [--sf SEQ_FIELD] [--vf V_FIELD] [--jf J_FIELD] [--gf GROUP_FIELDS [GROUP_FIELDS ...]] [--mode {allele,gene}] [--act {first,set}] [--model {ham,aa,hh_s1f,hh_s5f,mk_rs1nf,mk_rs5nf,hs1f_compat,m1n_compat}] [--dist DISTANCE] [--norm {len,mut,none}] [--sym {avg,min}] [--link {single,average,complete}] [--maxmiss MAX_MISSING]
help:
- --version
- show program's version number and exit
- -h, --help
- show this help message and exit
standard arguments:
- -d DB_FILES [DB_FILES ...]
- A list of tab delimited database files. (default: None)
- -o OUT_FILES [OUT_FILES ...]
- Explicit output file name. Note, this argument cannot be used with the --failed, --outdir, or --outname arguments. If unspecified, then the output filename will be based on the input filename(s). (default: None)
- --outdir OUT_DIR
- Specify to changes the output directory to the location specified. The input file directory is used if this is not specified. (default: None)
- --outname OUT_NAME
- Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files. (default: None)
- --log LOG_FILE
- Specify to write verbose logging to a file. May not be specified with multiple input files. (default: None)
- --failed
- If specified create files containing records that fail processing. (default: False)
- --format {airr,changeo}
- Specify input and output format. (default: airr)
- --nproc NPROC
- The number of simultaneous computational processes to execute (CPU cores to utilized). (default: 8)
cloning arguments:
- --sf SEQ_FIELD
- Field to be used to calculate distance between records. Defaults to junction (airr) or JUNCTION (changeo). (default: None)
- --vf V_FIELD
- Field containing the germline V segment call. Defaults to v_call (airr) or V_CALL (changeo). (default: None)
- --jf J_FIELD
- Field containing the germline J segment call. Defaults to j_call (airr) or J_CALL (changeo). (default: None)
- --gf GROUP_FIELDS [GROUP_FIELDS ...]
- Additional fields to use for grouping clones aside from V, J and junction length. (default: None)
- --mode {allele,gene}
- Specifies whether to use the V(D)J allele or gene for initial grouping. (default: gene)
- --act {first,set}
- Specifies how to handle multiple V(D)J assignments for initial grouping. The "first" action will use only the first gene listed. The "set" action will use all gene assignments and construct a larger gene grouping composed of any sequences sharing an assignment or linked to another sequence by a common assignment (similar to single-linkage). (default: set)
- --model {ham,aa,hh_s1f,hh_s5f,mk_rs1nf,mk_rs5nf,hs1f_compat,m1n_compat}
- Specifies which substitution model to use for calculating distance between sequences. The "ham" model is nucleotide Hamming distance and "aa" is amino acid Hamming distance. The "hh_s1f" and "hh_s5f" models are human specific single nucleotide and 5-mer content models, respectively, from Yaari et al, 2013. The "mk_rs1nf" and "mk_rs5nf" models are mouse specific single nucleotide and 5-mer content models, respectively, from Cui et al, 2016. The "m1n_compat" and "hs1f_compat" models are deprecated models provided backwards compatibility with the "m1n" and "hs1f" models in Change-O v0.3.3 and SHazaM v0.1.4. Both 5-mer models should be considered experimental. (default: ham)
- --dist DISTANCE
- The distance threshold for clonal grouping (default: 0.0)
- --norm {len,mut,none}
- Specifies how to normalize distances. One of none (do not normalize), len (normalize by length), or mut (normalize by number of mutations between sequences). (default: len)
- --sym {avg,min}
- Specifies how to combine asymmetric distances. One of avg (average of A->B and B->A) or min (minimum of A->B and B->A). (default: avg)
- --link {single,average,complete}
- Type of linkage to use for hierarchical clustering. (default: single)
- --maxmiss MAX_MISSING
- The maximum number of non-ACGT characters (gaps or Ns) to permit in the junction sequence before excluding the record from clonal assignment. Note, under single linkage non-informative positions can create artifactual links between unrelated sequences. Use with caution. (default: 0)
output files:
- clone-pass
- database with assigned clonal group numbers.
- clone-fail
- database with records failing clonal grouping.
required fields:
- sequence_id, v_call, j_call, junction
output fields:
- clone_id
AUTHOR
This manpage was written by Nilesh Patra for the Debian distribution and
can be used for any other usage of the program.
October 2020 | DefineClones.py 1.0.1 |