ptof - convert a protein profile into a frame-search profile
- ptof
- [ -hlr ] [ -B init/term_score ] [
-F frameshift ] [ -I insert_multiplier ] [
-X stop_codon ] [ -Y intron_open ] [ -Z
intron_extend ] [ protein_profile | - ] [
parameters ]
ptof converts a protein profile (generated for instance by pftools
programs
pfmake(1),
gtop(1) or
htop(1)) into a so-called
"
frame-search profile". A frame-search profile is used
to search an "
interleaved frame-translated" DNA
sequence (generated by pftools program
2ft(1)) for occurrences of a
protein sequence motif. An
"
interleaved frame-translated" DNA sequence is an
amino acid sequence corresponding to the
N-2 overlapping codons of a
DNAsequence of length
N. Note that in such a sequence, the character
'
O' is used to represent stop codons.
The conversion procedure works as follows: The protein profile is expanded in
length by a factor of three to accommodate three translated codons per
original match position. Two dummy match positions are placed between two
consecutive significant match positions imported from the original profile.
The original insert positions are placed between pairs of adjacent dummy match
positions. The initiation, termination, and transition scores of the original
insert positions are left unchanged; the insert extension scores are divided
by a factor of 3, or by the value of the command-line option
-I. The
two insert positions flanking the significant match positions serve to
accommodate frame-shift errors and introns, respectively. The frame-shift
insert position allows free insertion opening combined with a high insert
extension penalty (command-line option
-F) whereas the intron insertion
position has high opening but low extension penalties (command line options
-Y and
-Z). The deletion opening and closing penalties next to
the significant match positions are set to values that ensure that the total
cost of a single-base deletion is the same as the cost of a single
base-insertion at a frame-shift insert position. Furthermore, the alphabet of
the original profile is extended by the stop codon symbol '
O' which is
assigned a constant negative value (command-line option
-X) at
significant match positions, and zero at dummy match positions. At insert
positions, it is set to the average of the other insert extension scores.
- protein_profile
- Input protein profile.
The protein profile contained in this file will be converted into a
frame-search profile. If the filename is replaced by a '-',
ptof will read the profile from stdin.
- -h
- Display usage help text.
- -l
- Remove output line length limit. Individual lines of the
output profile can exceed a length of 132 characters, removing the need to
wrap them over several lines.
- -r
- Frame-search parameters are given in normalized score
units. This option will only be considered if a linear normalization
function with priority over all other normalization functions is specified
in the profile. In this case, the frame-search scores specified on the
command line will be divided by the slope (R2 parameter) of the
normalization function. This option is particularly useful for profiles
which are already scaled in units that can be interpreted as
−Log(P)-values, e.g. bits.
-
-B init/term_score
- Minimal initiation and termination score.
All internal and external initiation and termination scores will be set to
this value if the corresponding value in the original profile is lower
than this value. This parameter is used to impose a more local alignment
behavior on the frame-search profile in order to deal with discontinuities
in DNA sequences (long introns, alternative splicing, chimeric clones,
etc.)
Default: -50 (-0.5 with option -r)
-
-F frameshift
- Frame-shift error penalty.
Default: -100 (-1.0 with option -r)
-
-I insert_multiplier
- Insert score multiplier.
The values of the original insert extension scores will be multiplied by
this factor in order to compensate for the fact that a single amino acid
corresponds to three overlapping codon positions in the target sequence.
Default: 1/3
-
-X stop_codon
- Stop codon penalty.
Default: -100 (-1.0 with option -r)
-
-Y intron_open
- Intron opening penalty.
Default: -300 (-3.0 with option -r)
-
-Z intron_extend
- Intron extension penalty.
Default: -1 (-0.01 with option -r)
- Note:
- for backwards compatibility, release 2.3 of the
pftools package will parse the version 2.2 style parameters, but
these are deprecated and the corresponding option (refer to the
options section) should be used instead.
- B=#
- Minimal initiation and termination score.
Use option -B instead.
- F=#
- Frame-shift error penalty.
Use option -F instead.
- I=#
- Insert score multiplier.
Use option -I instead.
- X=#
- Stop codon penalty.
Use option -X instead.
- Y=#
- Intron opening penalty.
Use option -Y instead.
- Z=#
- Intron extension penalty.
Use option -Z instead.
- (1)
-
ptof -r -F -1.2 -I 0.6 -X -1.5 -B -0.5 sh3.prf >
sh3.fsp
2ft - < R76849.seq | pfsearch -fy -C 5.0 sh3.fsp -
- The protein domain profile in 'sh3.prf' is first
converted into a frame-search profile 'sh3.fsp'. Then both strands
of the Fasta-formatted EST sequence in 'R76849.seq'
(GenBank/EMBL-accession: R76849) are converted into interleaved
frame-translated protein sequences and searched for SH3 domains with the
frame-search profile generated in the preceding step.
- The output may be compared to the result of a more
conventional search strategy using a protein profile in conjunction with a
six-frame translation of the same DNA sequence:
-
6ft - < R76849.seq | pfsearch -fy -C 5.0
sh3.prf -
On successful completion of its task,
ptof will return an exit code of 0.
If an error occurs, a diagnostic message will be output on standard error and
the exit code will be different from 0. When conflicting options where passed
to the program but the task could nevertheless be completed, warnings will be
issued on standard error.
pfscan(1),
pfsearch(1),
2ft(1),
6ft(1)
The
pftools package was developed by Philipp Bucher.
Any comments or suggestions should be addressed to
<
[email protected]>.