apertium-tagger —
part-of-speech tagger and trainer for
Apertium
apertium-tagger |
[options]
-g
serialized_tagger
[input
[output]] |
apertium-tagger |
[options]
-r iterations
corpus serialized_tagger
|
apertium-tagger |
[options]
-s iterations
dictionary corpus tagger_spec serialized_tagger
tagged_corpus untagged_corpus
|
apertium-tagger |
[options]
-s 0
dictionary tagger_spec serialized_tagger
tagged_corpus untagged_corpus
|
apertium-tagger |
[options]
-s 0
-u model
serialized_tagger tagged_corpus
|
apertium-tagger |
[options]
-t iterations
dictionary corpus tagger_spec serialized_tagger
|
apertium-tagger is the application responsible for
the apertium part-of-speech tagger training or tagging, depending on the
calling options. This command only reads from the standard input if the option
--tagger or
-g is used.
-
-g,
--tagger
- Tags input text by means of Viterbi algorithm.
-
-r
n,
--retrain
n
- Retrains the model with n
additional Baum-Welch iterations (unsupervised). This option is
incompatible with -u
(--unigram)
-
-s
n,
--supervised
n
- Initializes parameters against a hand-tagged text
(supervised) through the maximum likelihood estimate method, then performs
n iterations of the Baum-Welch training
algorithm (unsupervised). The CRP argument can be omitted only when
n = 0.
-
-t
n,
--train
n
- Initializes parameters through Kupiec's method
(unsupervised), then performs n
iterations of the Baum-Welch training algorithm (unsupervised).
-
-u,
--unigram=MODEL
- use unigram algorithm MODEL from
<https://coltekin.net/cagri/papers/trmorph-tools.pdf>
-
-w,
--sliding-window
- use the Light Sliding Window algorithm
-
-x,
--perceptron
- use the averaged perceptron algorithm
-
-d,
--debug
- Print error (if any) or debug messages while
operating.
-
-e,
--skip-on-error
- Used with -xs to ignore
certain types of errors with the training corpus
-
-f,
--first
- Used in conjunction with -g
(--tagger) makes
the tagger give all lexical forms of each word, with the chosen one in the
first place (after the lemma)
-
-m,
--mark
- Mark disambiguated words.
-
-p,
--show-superficial
- Prints the superficial form of the word along side the
lexical form in the output stream.
-
-z,
--null-flush
- Used in conjunction with -g
(--tagger) to
flush the output after getting each null character.
-
--help
- Display a help message.
These are the kinds of files used with each option:
- dictionary
- Full expanded dictionary file
- corpus
- Training text corpus file
- tagger_spec
- Tagger specification file, in XML format
- serialized_tagger
- Tagger data file, built in the training and used while
tagging
- tagged_corpus
- Hand-tagged text corpus
- untagged_corpus
- Untagged text corpus, morphological analysis of hand-tagged
corpus to use both jointly with -s
option
- input
- Input file, stdin by default
- output
- Output file, stdout by default
apertium(1),
lt-comp(1),
lt-expand(1),
lt-proc(1)
Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.
This is free software. You may redistribute copies of it under the terms of
the GNU
General Public License.
Many... lurking in the dark and waiting for you!