pfmake - generate a profile from a multiple sequence alignment
- pfmake
- [ -0123abcehlms ] [ -E gap_extend ] [
-F score_multiplier ] [ -G gap_open ] [
-H high_init/term ] [ -I gap_increment ] [
-L low_init/term ] [ -M gap_multiplier ] [
-S matrix_multiplier ] [ -T gap_region ] [
-X gap_excision ] [ ms_file | - ]
score_matrix [ profile ] [ parameters ]
pfmake generates a
PROSITE profile from a multiple
sequence alignment using methods described by Gribskov
et al. (1990),
Luethy
et al. (1994), and Thompson
et al. (1994), with
modifications to exploit the features of the new profile format. The file
containing the multiple sequence alignment (
ms_file) must be either in
MSF format as generated by
GCG programs or by
readseq
(checksums are ignored) or in MSA format as created by
psa2msa(1). If
'
-' is specified instead of a filename, the multiple sequence alignment
is read from the standard input. The
score_matrix file must also be in
GCG format.
If an already existing
profile is given as input via the third optional
argument, the parameters of the DISJOINT, NORMALIZATION and CUT_OFF blocks
will be read from input, all other profile parameters will be recalculated.
Header and footer lines outside the matrix block will also be transferred from
input to output.
If no input profile is given, the disjointness definition will be set to PROTECT
with borders leaving short unprotected tails (maximum 5 positions) at the
beginning and at the end of the profile. Furthermore, one normalization mode
(
n_score =
raw_score /
F,
where
F is the output score multiplier,
see below), and two
cut-off values (level 0: 8.5, level -1: 6.5) will be defined.
- ms_file
- Input multiple sequence alignment.
The content of the file must be either in MSF or in MSA format. If the
filename is replaced by a '-', pfmake will read the input
alignment from stdin.
- score_matrix
- Residue score matrix file.
Contains the substitution scores for all pairs of residues of the sequence
alphabet. The file must be in GCG format.
- profile
- Optional profile file.
If a filename is specified, the profile will be parsed and those parameters
mentioned in the description section will be kept for the
computation of the output profile.
- -0
- Global alignment mode.
Initiation (termination) at low cost is possible only if the alignment
starts at the beginning (end) of the profile and at the beginning (end) of
the sequence.
- -1
- Domain global alignment mode.
Initiation (termination) at low cost is possible only at the beginning (end)
of the profile; it may start and end at any position within the
sequence.
- -2
- Semi-global alignment mode.
Initiation (termination) at low cost is possible if the alignment starts
either at the beginning (end) of the profile or at the beginning (end) of
the sequences.
This is the default alignment mode.
- -3
- Local alignment mode.
Initiation (termination) at low cost is possible anywhere. The high-cost
initiation/termination score (parameter H) is meaningless.
- -a
- Causes pfsearch to weight gaps asymmetrically, as in
Gribskov et al. (1990).
- -b
- Block profile mode.
By imposing additional constraints on the placement of insertions and
deletions, this mode produces profiles that favor alignments with
insertions and deletions positioned symmetrically around a few positions.
For each gap region a gap center is defined which usually corresponds to
the place where gap excision has been applied (see parameter X). If
no gap excision has been applied, the position is chosen such as to
maximize the sum of deletion opening events before, and deletion closing
events after the gap center. Within a given gap region reduced deletion
opening penalties are offered only before, reduced deletion closing
penalties only after, and reduced insertion penalties only at the center.
This option is incompatible with options -a and -e and
automatically disables them.
- -c
- Circular profile.
The topology of the profile is declared as circular. The first and the last
insert positions are merged by retaining the higher value of each
parameter type.
- -e
- Enables endgap-weighting mode as implemented in the
GCG program ProfileMake. Endgaps in the multiple
sequence alignment will be interpreted as deletions relative to the other
sequences and thus be considered for the delineation of gap regions. The
default is no endgap weighting as introduced by Thompson et al.
(1994) in the program ProfileWeight.
- -h
- Display usage help text.
- -l
- Remove output line length limit. Individual lines of the
output profile can exceed a length of 132 characters, removing the need to
wrap them over several lines.
- -m
- Input multiple sequence alignment is in MSA format.
- -s
- Causes pfsearch to weight gaps symmetrically
(default mode). The initial gap opening scores
(MD, MI) computed from the maximal gap length and the
command-line parameters E, G, I, and
M, will be divided by two and the resulting value will be assigned
to both gap opening and gap closing scores
(MI, IM, MD, DM).
-
-E gap_extend
- Gap extension penalty. See Gribskov et al.
(1990).
Default: 0.2 (appropriate for 1/3 bit-scaled blosum45 matrix)
-
-F score_multiplier
- Output score multiplier.
On output, all profile scores are multiplied by this factor and rounded to
nearest integers.
Default: 100
-
-G gap_open
- Gap opening penalty. See Gribskov et al.
(1990).
Default: 2.1 (appropriate for 1/3 bit-scaled blosum45 matrix)
-
-H high_init/term
- High-cost initiation/termination score.
This score will be applied to all external and internal initiation and
termination scores corresponding to path matrix positions where initiation
or termination at low cost is not possible according to the alignment mode
specified.
Default: * (low-value)
-
-I gap_increment
- Gap penalty multiplier increment. See Gribskov et
al. (1990).
Default: 0.1
-
-L low_init/term
- Low-cost initiation/termination score.
This score will be applied to all external and internal initiation and
termination scores corresponding to path matrix positions where initiation
or termination at low cost is possible according to the alignment mode
specified.
Default: 0
-
-M gap_multiplier
- Maximum gap penalty multiplier. See Gribskov et
al. (1990). Default: 0.333
-
-S matrix_multiplier
- Score matrix multiplier.
On input, the numbers of the score matrix are multiplied by this factor.
Default: 0.1
-
-T gap_region
- Gap region threshold.
This is the minimal fraction of gap characters a column of the multiple
sequence alignment must contain in order to be considered part of a gap
region.
Default: 0.01
-
-X gap_excision
- Gap excision threshold.
This is the minimal fraction of non-gap characters a column of the multiple
sequence alignment must contain in order to be converted into a match
position. The IM and MI transition scores of insert
positions corresponding to excised columns are set to zero; the other
parameters remain unchanged.
Default: 0.5
- Note:
- for backwards compatibility, release 2.3 of the
pftools package will parse the version 2.2 style parameters, but
these are deprecated and the corresponding option (refer to the
options section) should be used instead.
- E=#
- Gap extension penalty.
Use option -E instead.
- F=#
- Output score multiplier.
Use option -F instead.
- G=#
- Gap opening penalty
Use option -G instead.
- H=#
- High cost initiation/termination score.
Use option -H instead.
- I=#
- Gap penalty multiplier increment.
Use option -I instead.
- L=#
- Low cost initiation/termination score.
Use option -L instead.
- M=#
- maximum gap penalty multiplier.
Use option -M instead.
- S=#
- Score matrix multiplier.
Use option -S instead.
- T=#
- Gap region threshold.
Use option -T instead.
- X=#
- Gap excision threshold.
Use option -X instead.
- (1)
-
pfmake -b1 -H 0.6 sh3.msf blosum45.cmp >
sh3_block.prf
- Generates a domain-global block profile from a multiple
alignment of SH3 domains using the blosum45 matrix. The file
'sh3.msf' contains a multiple alignment of 20 SH3 domains from
SWISS-PROT release 32 including sequence weights. The file
'blosum45.cmp' contains a 1/3 bits-scaled blosum45 matrix in
GCG format.
Note that fragment matches (alignments to parts of the profile) are not
prohibited but penalized by the option -H 0.6.
On successful completion of its task,
pfmake will return an exit code of
0. If an error occurs, a diagnostic message will be output on standard error
and the exit code will be different from 0. When conflicting options where
passed to the program but the task could nevertheless be completed, warnings
will be issued on standard error.
Bucher P, Karplus K, Moeri N & Hofmann, K. (1996).
A flexible motif
search technique based on generalized profiles. Comput.
Chem.
20:3-24.
Gribskov M, Luethy R & Eisenberg D (1990).
Profile analysis. Meth.
Enzymol.
183:146-159.
Luethy R, Xenarios I & Bucher P (1994).
Improving the sensitivity of
the sequence profile method. Prot. Sci.
3:139-146.
Thompson JD, Higgins DG & Gibson TJ (1994)
Improved sensitivity of
profile searches through the use of sequence weights and
gap excision. Comput. Appl. Biosci.
10:19-29.
pfsearch(1),
pfscan(1),
psa2msa(1),
psa(5),
xpsa(5)
The
pftools package was developed by Philipp Bucher.
Any comments or suggestions should be addressed to
<
[email protected]>.