NAME
BaitFilter-v1.0.6 - manual page for BaitFilter-v1.0.6DESCRIPTION
Welcome to Bait-Filter, version 1.0.6. USAGE:- ./BaitFilter-v1.0.6
- -i <string> [-o <string>] [-c <string>] [-m <string>] [--blast-second-hit-evalue <floating point number>] [--blast-first-hit-evalue <floating point number>] [--blast-min-hit-coverage-of-baits-in-tiling-stack <floating point number>] [--ref-blast-db <string>] [--blast-extra-commandline <string>] [--blast-evalue-cutoff <floating point number>] [-B <string>] [-t <positive integer>] [--ID-prefix <string>] [-S] [--verbosity <unsigned integer>] [-b <string>] [--] [--version] [-h]
-i
<string>, --input-bait-file-name <string>
- (required)
- Name of the input bait locus file. This is the bait file
- obtained from the BaitFisher program or from a previous filter run with BaitFilter.
-o
<string>, --output-bait-file-name <string>
- Name of the output bait file. All modes, except the conversion mode, produce files in the BaitFisher format.
-c
<string>, --convert <string>
- Allows the user to produce the final output file which can be uploaded at a bait producing company. In this mode, BaitFilter reads the input bait file and instead of doing a filtering step, it produces a custom bait file that can be uploaded at the baits producing company. In order to avoid confusion, a filtering step cannot be done in the same run as the conversion. If you want to filter a bait file and convert the output, you will need to call this program more than once, first to do the filtering and second to do the conversion. Allowed conversion parameters currently are: "four-column-upload".
- New output formats can be added upon request. Please contact the author: Christoph Mayer, Email: Mayer Christoph <[email protected]>
-m
<string>, --mode <string>
- Apart from the input file option, the mode option is the most important option. This option specifies which filter mode BaitFilter uses. (See the user manual for more details):
- "ab":
- Retain only the best bait locus for each alignment file
- when using the optimality criterion
- to minimize the total
- number of required baits.
- "as":
- Retain only the best bait locus for each alignment file
- when using the optimality criterion
- to maximize the number
- of sequences the result is based on.
- "fb":
- Retain only the best bait locus for each feature (e.g. CDS)
- when using the optimality criterion
- to minimize the total
- number of required baits. Only applicable if alignment cutting has been used in BaitFisher.
- "fs":
- Retain only the best bait locus for each feature (e.g. CDS)
- when using the optimality criterion
- to maximize the number
- of sequences the result is based on. Only applicable if alignment cutting has been used in BaitFisher.
- "blast-a": Remove all bait regions of all ALIGNMENTs for which one or more baits have at least two good hits to a reference genome. (Not recommended.)
- "blast-f": Remove all bait regions of all FEATUREs for which one or more baits have at least two good hits to a reference genome. (Not recommended.)
- "blast-l": Remove only the bait REGIONs that contain a bait that has multiple good hits to a reference genome. (Recommended over blast-f and blast-a.)
- "blast-c": Conduct a coverage filter run without a search for multiple hits. Requires the blast-min-hit-coverage-of-baits-in-tiling-stack option to be specified.
- "thin-b":
- Thin out a bait file to every Nth bait region, by finding
- the start position that minimizes the number of baits.
- "thin-s":
- Thin out a bait file to every Nth bait region, by finding
- the start position that maximizes the number of sequences.
- "thin-b-old":
- Similar to thin-b, but treats all loci as if they come
- from one alignment file. Identical to behaviour of thin-b in version 1.0.5 or earlier.
- "thin-s-old":
- Similar to thin-s, but treats all loci as if they come
- from one alignment file. Identical to behaviour of thin-b in version 1.0.5 or earlier.
--blast-second-hit-evalue
<floating point number>
- Maximum E-value for the second or second best hit. A bait is characterised to bind ambiguously, if we have at least two good hits. This option is the E-value threshold for the second best hit to different loci of the genome.This option is the E-value threshold for the second best hit. Default: 0.000001
--blast-first-hit-evalue
<floating point number>
- Maximum E-value for the first or best hit of the bait against the genome. A bait is characterized to bind ambiguously, if we have at least two good hits to different loci of the genome. This option is the E-value threshold for the first/best hit. Default: 0.000001
--blast-min-hit-coverage-of-baits-in-tiling-stack
<floating point
- number>
- Can be specified together with the following modes (-m option): blast-a, blast-f, blast-l, blast-c. In all these modes, a blast analysis of all baits against a reference genome is conducted. This option specifies a minimum query hit coverage which at least one bait has to have in each tiling stack (i.e. the column in the tiling design). Otherwise the bait region is discarded. If not specified, no hit coverage is checked. The coverage is determined for each bait by dividing the length of the best hit of this bait against the specified genome by the length of this bait. Then the highest coverage is determined for each bait stack of the tiling design. If this option is used together with another filter, it is important to know the order in which the two are applied, since the order matters for the final result:For the mode options: blast-a, blast-f, blast-l the hit coverage is checked after filtering for baits with multiple good hits to the reference genome.
--ref-blast-db
<string>
- Base name to a blast data base file. This name is passed to the blast command. This is the name of the fasta file of your reference genome. IMPORTANT: The makeblastdb program has to be called before starting the Bait-Filter program. makeblastdb takes the fasta file and creates data base files out of it. Cannot be specified together with the blast-result-file option.
--blast-extra-commandline
<string>
- When invoking the blast command, extra command line parameters can be passed to the blast program with the aid of this option. As an example , this option allows you to specify the number of threads the blast program should use. Example: --blast-extra-commandline "-num_threads 20" sets the number of threads to 20.
--blast-evalue-cutoff
<floating point number>
- When conducting a blast search, a maximum E-value can be specified when calling the blast program. The effect is that hits with a higher E-value are not reported. BaitFilter always specifies such an E-value when calling the blast program. The default E-value passed by BaitFilter to the blast program is twice the --blast-second-hit-evalue. If a coverage filter is requested the default value is set to 0.001 if twice the value of --blast-second-hit-evalue is smaller than 0.001. This should guarantee that all hits necessary for the blast and/or coverage filter are found. If the user wants to set a different E-value threshold, this can be specified with this option. With version 1.0.6 of this program, the value is automatically changed to be larger or equal to 0.001 if the coverage filter is used. This makes the usage of this option unnecessary in most cases.
-B
<string>, --blast-executable <string>
- Name of or path+name to the blast executable. Default: blastn. Minimum blast version number: Blast+ 2.2.x. Default: blastn. Cannot be specified together with the blast-result-file option.
-t
<positive integer>, --thinning-step-width <positive
integer>
- Thin out the bait file by retaining only every Nth bait region. The integer after the option specifies the step width N. If one of the modes thin-b (thin-b-old), or thin-s (thin-s-old) is active, this option is required, otherwise it is not allowed to set this parameter.
--ID-prefix
<string>
- In the conversion mode to the four-column-upload file format, each converted file should get a unique ProbeID prefix, since even among multiple files, ProbeIDs are not allowed to be identical. With this option the user is able to specify a prefix string to all probe IDs in the four-column-upload file created by BaitFilter.
-S,
--stats
- Compute bait file characteristics for the input file and report these. This mode is automatically used for all modes specified with -m option or the conversion mode specified with -c option. The purpose of the -S option is to compute stats without having to filter or convert the input file. In particular, the -S mode does not require specifying an output file.
- This option has no effect if combined with the -m or -c modes.
--verbosity
<unsigned integer>
- The verbosity option controls the amount of information Bait-Filter writes to the console while running. 0: Print only welcome message and essential error messages that lead to exiting the program. 1: report also warnings, 2: report also progress, 3: report more detailed progress, >10: debug output. Maximum 10000: write all possible diagnostic output. A value of 2 is required if startup parameters should be reported.
-b
<string>, --blast-result-file <string>
- Conducting a blast analysis of all baits against a reference genome can take a long time. If different filtering parameters, e.g. different coverage thresholds are to be compared, the same blast has to be done multiple times. With this argument, the blast will be skipped and the specified blast result file will be used. This option has to be used with caution! No checks are done (so far) to ensure that the blast result file corresponds to the specified bait file. If a BaitFilter run was conducted which did a blast search, BaitFilter will not delete the blast result file after the run was completed. The result file with the name blast_result.txt will remain in the working directory. It can be moved or renamed and with this option it can be specified as the input file for further BaitFilter runs. If you have the slightest doubt whether you are using the correct blast result file, you should not use this option. This option is only allowed in modes that would normally do a blast search. This option cannot be specified together with the blast-executable, blast-evalue-cutoff, blast-extra-commandline, ref-blast-db options, since these are options specific to runs in which a blast search is conducted.
--,
--ignore_rest
- Ignores the rest of the labeled arguments following this flag.
--version
- Displays version information and exits.
-h,
--help
- Displays usage information and exits.
- The Bait-Filter program has been designed to post process the output of the BaitFisher program in order select appropriate bait regions and to create the final bait set. BaitFilter offers several filtering and conversion modes. If multiple filtering steps and a final conversion are required, BaitFilter will have to be started multiple times and the output of the different runs are used as input in the next step.
- The BaitFisher program designs baits for every locus for which a bait design is possible for a full bait region. A bait region can start at every nucleotide as long as the remaining sequence is long enough. This output has to be reduced and the purpose of BaitFilter is to find for each feature, gene or alignment the optimal locus or the optimal loci for the bait regions. Before determining the locus with the fewest number of baits or the largest sequence coverage, one might want to determine which baits are expected to bind specifically in a given reference genome. This is achieved by conducting a Blast search of the baits against a genome. Baits which are highly similar to at least two loci of the genome can be determined and their bait regions can be removed. The blast search result can also be used to specify a minimum hit coverage of the baits in a bait region against the reference genome. After removing bait regions at inferior loci, the optimal bait region starting locus (start coordinate) can be inferred with the aid of different criteria in a subsequent run of BaitFilter. As input, BaitFilter requires a bait file generated by the BaitFisher program or a BaitFile generated by a previous filtering run of BaitFilter. This bait file is specified with the -i command line parameter (see below). Furthermore, the user has to specify an output file name with the -o parameter and a filter mode with the -m parameter.
- To convert a file to final and uploadable output format, see the -c option below.
- To compute a bait file statistics of an input file, see the -S option below.
- The different filter modes provided by BaitFilter are the following:
- 1a) Retain only the best bait locus per alignment file. Criterion: Minimize number of required baits.
- 1b) Retain only the best bait locus per alignment file. Criterion: Maximize number of sequences.
- 2a) Retain only best bait locus per feature (requires that features were selected in BaitFisher). Criterion: Minimize number of required baits.
- 2b) Retain only best bait locus per feature (requires that features were selected in BaitFisher). Criterion: Maximize number of sequences.
- 3) Use a blast search of the bait sequences against a reference genome to detect putative non-unique target loci. Non unique target sites will have multiple good hits against the reference genome. Furthermore, a minimum coverage of the best blast hit of bait sequence against the genome can be specified. Note that all blast modes require additional command line parameters! These modes remove bait regions for which multiple good blast hits where found or for which baits have insufficiently long hits. Different versions of this mode are available:
- 3a) If a single bait is not unique, remove all bait regions from the current gene.
- 3b) If a single bait is not unique, remove all bait regions from the current feature (if applicable).
- 3c) If a single bait is not unique, remove only the bait region that contains this bait.
- 4) Thin out the given bait file: Retain only every Nth bait region, where N has to be specified by the user. Two submodes are available:
- 4a) Thin out bait regions by retaining only every Nth bait region in a bait file. The starting offset will by chosen such that the number of required baits is minimized.
- 4b) Thin out bait regions by retaining only every Nth bait region in a bait file. The starting offset will by chosen such that the number of sequences the result is based on is maximized.
SEE ALSO
The full documentation for BaitFilter-v1.0.6 is maintained as a Texinfo manual. If the info and BaitFilter-v1.0.6 programs are properly installed at your site, the command- info BaitFilter-v1.0.6
January 2022 | BaitFilter-v1.0.6 |