NAME
parseblast - Filtering High-scoring Segment Pairs (HSPs) from WU/NCBI BLAST.SYNOPSIS
parseblast [options] <results.from.blast>DESCRIPTION
This manual page documents briefly the parseblast command. Different output options are available, the most important here are those allowing to write HSPs in GFF format (GFFv1, GFFv2 or APLOT). Sequences can be included in the GFF records as a comment field. Furthermore, this script can output also the alignments for each HSP in ALN, MSF or tabular formats.OPTIONS
parseblast prints output in "HSP" format by default (see below). It takes input from <STDIN> or single/multiple files, and writes its output to <STDOUT>, so user can redirect to a file but he also could use the program as a filter within a pipe. "-N", "-M", "-P", "-G", "-F", "-A" and "-X" options (also the long name versions for each one) are mutually exclusive, and their precedence order is shown above.- GFF OPTIONS:
- -G, --gff
- Prints output in GFFv1 format.
- -F, --fullgff
- Prints output in GFFv2 "alignment" format ("target").
- -A, --aplot
- Prints output in pseudo-GFF APLOT "alignment" format.
- -S, --subject
- Projecting GFF output by SUBJECT (default by QUERY).
- -Q, --sequence
- Append query and subject sequences to GFF record.
- -b, --bit-score
- Set <score> field to Bits (default Alignment Score).
- -i, --identity-score
- Set <score> field to Identities (default Alignment).
- -s, --full-scores
- Include all scores for each HSP in each GFF record.
- -u, --no-frame
- Set all frames to "." (GFF for not available frames).
- -t, --compact-tags
- Target coords+strand+frame in short form (NO GFFv2!).
- ALIGNMENT OPTIONS:
- -P, --pairwise
- Prints pairwise alignment for each HSP in TBL format.
- -M, --msf
- Prints pairwise alignment for each HSP in MSF format.
- -N, --aln
- Prints pairwise alignment for each HSP in ALN format.
- -W, --show-coords
- Adds start/end positions to alignment output.
- GENERAL OPTIONS:
- -X, --expanded
- Expanded output (producing multiline output records).
- -c, --comments
- Include parameters from blast program as comments.
- -n, --no-comments
- Do not print "#" lines (raw output without comments).
- -v, --verbose
- Warnings sent to <STDERR>.
- --version
- Prints program version and exits.
- -h, --help
- Shows this help and exits.
OUTPUT FORMATS:
"S_" stands for "Subject_Sequence" and "Q_" for "Query_Sequence". <Program> name is taken from input blast file. <Strands> are calculated from <start> and <end> positions on original blast file. <Frame> is obtained from the blast file if is present else is set to ".". <SCORE> is set to Alignment Score by default, you can change it with "-b" and "-i".
If "-S" or "--subject" options are given, then QUERY fields are referred to SUBJECT and SUBJECT fields are relative to QUERY (this only available for GFF output records).
Dots ("...") mean that record description continues in the following line, but such record is printed as a single line record by parseblast. [HSP] <- (This is the DEFAULT OUTPUT FORMAT)
<Program> <DataBase> : ...
... <IdentityMatches> <Min_Length> <IdentityScore> ...
... <AlignmentScore> <BitScore> <E_Value> <P_Sum> : ...
... <Q_Name> <Q_Start> <Q_End> <Q_Strand> <Q_Frame> : ...
... <S_Name> <S_Start> <S_End> <S_Strand> <S_Frame> : <S_FullDescription> [GFF]
<Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame> <S_Name> [FULL GFF] <- (GFF showing alignment data)
<Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame> ...
... Target "<S_Name>" <S_Start> <S_End> ...
... E_value <E_Value> Strand <S_Strand> Frame <S_Frame> [APLOT] <- (GFF format enhanced for APLOT program)
<Q_Name>:<S_Name> <Program> hsp <Q_Start>:<S_Start> <Q_End>:<S_End> <SCORE> ...
... <Q_Strand>:<S_Strand> <Q_Frame>:<S_Frame> <BitScore>:<HSP_Number> ...
... # E_value <E_Value> [EXPANDED]
MATCH(<HSP_Number>): <Q_Name> x <S_Name>
SCORE(<HSP_Number>): <AlignmentScore>
BITSC(<HSP_Number>): <BitScore>
EXPEC(<HSP_Number>): <E_Value> Psum(<P_Sum>)
IDENT(<HSP_Number>): <IdentityMatches>/<Min_Length> : <IdentityScore> %
T_GAP(<HSP_Number>): <TotalGaps(BothSeqs)>
FRAME(<HSP_Number>): <Q_Frame>/<S_Frame>
STRND(<HSP_Number>): <Q_Strand>/<S_Strand>
MXLEN(<HSP_Number>): <Max_Length>
QUERY(<HSP_Number>): length <Q_Length> : gaps <Q_TotalGaps> : ...
... <Q_Start> <Q_End> : <Q_Strand> : <Q_Frame> : <Q_FullSequence>
SBJCT(<HSP_Number>): length <S_Length> : gaps <S_TotalGaps> : ...
... <S_Start> <S_End> : <S_Strand> : <S_Frame> : <S_FullSequence>
SEE ALSO
ali2gff(1), blat2gff(1), gff2aplot(1), sim2gff(1).AUTHOR
parseblast was written by Josep F. Abril <[email protected]>. This manual page was written by Nelson A. de Oliveira <[email protected]>, for the Debian project (but may be used by others).Mon, 21 Mar 2005 21:44:15 -0300 |