NAME

art_454 - Simulation of 454 Pyrosequencing

DESCRIPTION

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data.
art_454 can be used for Simulation of 454 Pyrosequencing.

USAGE

SINGLE-END SIMULATION

art_454 [-s] [-a ] [-t] [-r rand_seed] [ -p read_profile ] [ -c num_flow_cycles ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <FOLD_COVERAGE>

PAIRED-END SIMULATION

art_454
[-s] [-a ] [-t] [-r rand_seed] [ -p read_profile ] [ -c num_flow_cycles ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <FOLD_COVERAGE> <MEAN_FRAG_LEN> <STD_DEV>

AMPLICON SEQUENCING SIMULATION

art_454 [-s] [-a ] [-t] [-r rand_seed] [ -p read_profile ] [ -c num_flow_cycles ] <-A|-B> <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <#_READS/#_READ_PAIRS_PER_AMPLICON>

OPTIONS

MANDATORY OPTIONS

INPUT_SEQ_FILE - the filename of DNA/RNA reference sequences in FASTA format
OUTPUT_FILE_PREFIX - the prefix or directory of output read data file (*.fq) and read alignment file (*.aln)
FOLD_COVERAGE - the fold of read coverage over the reference sequences
MEAN_FRAG_LEN - the average DNA fragment size for paired-end read simulation
STD_DEV - the standard deviation of the DNA fragment size for paired-end read simulation
#READS_PER_AMPLICON - number of reads per amplicon (for 5'end amplicon sequencing)
#READ_PAIRS_PER_AMPLICON - number of read pairs per amplicon (for two-end amplicon sequencing)

OPTIONAL PARAMETERS

-A indicate to perform single-end amplicon sequencing simulation
-B indicate to perform paired-end amplicon sequencing simulation
-M indicate to use CIGAR 'M' instead of '=/X' for alignment match/mismatch
-a indicate to output the ALN alignment file
-s indicate to output the SAM alignment file
-d print out warning messages for debugging
-t indicate to simulate reads from the built-in GS FLX Titanium profile [default: GS FLX profile]
-r specify a fixed random seed for the simulation (to generate two identical datasets from two different runs)
-c specify the number of flow cycles by the sequencer [ default: 100 for GS-FLX, and 200 for GS-FLX Titanium ]
-p specify user's own read profile for simulation
NOTE: the name of a read profile is the directory containing read profile data files. please read the REAME file about the format of 454 read profile data files and. and the default filenames of these data files.

EXAMPLES

1) singl-end simulation with 20X coverage
art_454 -s seq_reference.fa ./outdir/single_dat 20
2) paired-end simulation with the mean fragment size 1500 and STD 20 using GS FLX Titanium platform
art_454 -s -t seq_reference.fa ./outdir/paired_dat 10 1500 20
3) paired-end simulation with a fixed random seed
art_454 -s -r 777 seq_reference.fa ./outdir/paired_fxSeed 10 2500 50
4) single-end amplicon sequencing with 10 reads per amplicon
art_454 -A -s amplicon_ref.fa ./outdir/amp_single 10
5) paired-end amplicon sequencing with 10 read pairs per amplicon
art_454 -B -s amplicon_ref.fa ./outdir/amp_paired 10

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.