lt-proc —
lexical processor for Apertium
lt-proc |
[-a |
-b | -o |
-c | -d |
-e | -g |
-h | -p |
-s | -t |
-v | -h |
-z | -w]
[-W]
[-N
-N]
[-L
-N]
[-i
icx_file]
fst_file
[input_file
[output_file]] |
lt-proc is the application responsible for
providing the four lexical processing functionalities:
- morphological analyser (option
-a)
- lexical transfer (option
-n)
- morphological generator (option
-g)
- post-generator (option
-p)
It accomplishes these tasks by reading binary files containing a compact and
efficient representation of dictionaries (a class of finite-state transducers
called augmented letter transducers). These files are generated by
lt-comp(1).
It is worth mentioning that some characters
(‘
[
’,
‘
]
’,
‘
$
’,
‘
^
’,
‘
/
’,
‘
+
’) are
special chars used for format and encapsulation.
They should be escaped if they have to be used literally, for instance:
‘[’
...‘]’ are
ignored and the format of a
linefeed is
‘^
...$’.
-
-a,
--analysis
- Tokenizes the text in surface forms (lexical units as they
appear in texts) and delivers, for each surface form, one or more lexical
forms consisting of lemma, lexical category and morphological inflection
information. Tokenization is not straightforward due to the existence, on
the one hand, of contractions, and, on the other hand, of multi-word
lexical units. For contractions, the system reads in a single surface form
and delivers the corresponding sequence of lexical forms. Multi-word
surface forms are analysed in a left-to-right, longest-match fashion.
Multi-word surface forms may be invariable (such as a multi-word
preposition or conjunction) or inflected (for example, in es,
“echaban de menos”, “they missed”, is a form
of the imperfect indicative tense of the verb “echar de
menos”, “to miss”). Limited support for some kinds of
discontinuous multi-word units is also available. Single-word surface
forms analysis produces output like the one in these examples:
“cantar” →
“^cantar/cantar<vblex><inf>$” or
“daba” →
“^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$”.
-
-b,
--bilingual
- Does lexical transference, attaching queues of
morphological symbols not specified in the dictionaries. As the analysis
mode, supports multiple lexical forms in the target language for a given
lexical form in the source language. Works typically with the output of
apertium-pretransfer(1).
-
-o,
--surf-bilingual
- As with -b, but takes input
from apertium-tagger(1)
-p, with surface forms, and if the lexical
form is not found in the bilingual dictionary, it outputs the surface form
of the word.
-
-c,
--case-sensitive
- Use the literal case of the incoming characters
-
-d,
--debugged-gen
- Morphological generation with all the stuff
-
-e,
--decompose-compounds
- Try to treat unknown words as compounds, and decompose
them.
-
-w,
--dictionary-case
- Use the case information contained in the lexicon, instead
of the surface case (only applied in analysis mode).
-
-g,
--generation
- Delivers a target-language surface form for each
target-language lexical form, by suitably inflecting it.
-
-n,
--non-marked-gen
- Morphological generation (like
-g) but without unknown word marks (asterisk
‘
*
’).
-
-b,
--tagged-gen
- Morphological generation (like
-g) but retaining part-of-speech tags.
-
-p,
--post-generation
- Performs orthographical operations such as contractions and
apostrophations. The post-generator is usually
dormant (just copies the input to the output)
until a special alarm symbol contained in
some target-language surface forms wakes it
up to perform a particular string transformation if necessary; then it
goes back to sleep.
-
-s,
--sao
- Input processing is in
orthoepikon (previously
sao) annotation system format:
https://orthoepikon.sf.net.
-
-t,
--transliteration
- Apply a transliteration dictionary
-
-i
icx_file,
--ignored-chars
icx_file
- Ignores characters specified in the file
icx_file
-
-z,
--null-flush
- Flush output on the null character
-
-C,
--careful-case
- Use dictionary case if present, else surface
-
-N,
--analyses
- Output no more than N analyses (if the transducer is
weighted, the N best analyses)
-
-L,
--weight-classes
- Output no more than N best weight classes (where analyses
with equal weight constitute a class)
-
-W,
--show-weights
- Print final analysis weights (if any)
-
-v,
--version
- Display the version number.
-
-h,
--help
- Display this help.
- input_file
- The input compiled dictionary.
apertium(1),
apertium-tagger(1),
lt-comp(1),
lt-expand(1)
Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.
This is free software. You may redistribute copies of it under the terms of
the GNU
General Public License.
Many... lurking in the dark and waiting for you!