states - awk alike text processing tool
states [
-hvV] [
-D var=val] [
-f
file] [
-o outputfile] [
-p path] [
-s startstate] [
-W level] [
filename ...]
States is an awk-alike text processing tool with some state machine
extensions. It is designed for program source code highlighting and to similar
tasks where state information helps input processing.
At a single point of time,
States is in one state, each quite similar to
awk's work environment, they have regular expressions which are matched from
the input and actions which are executed when a match is found. From the
action blocks,
states can perform state transitions; it can move to
another state from which the processing is continued. State transitions are
recorded so
states can return to the calling state once the current
state has finished.
The biggest difference between
states and awk, besides state machine
extensions, is that
states is not line-oriented. It matches regular
expression tokens from the input and once a match is processed, it continues
processing from the current position, not from the beginning of the next input
line.
- -D var=val,
--define=var =val
- Define variable var to have string value val.
Command line definitions overwrite variable definitions found from the
config file.
- -f file,
--file=file
- Read state definitions from file file. As a default,
states tries to read state definitions from file states.st
in the current working directory.
- -h, --help
- Print short help message and exit.
- -o file,
--output=file
- Save output to file file instead of printing it to
stdout.
- -p path,
--path=path
- Set the load path to path. The load path defaults to
the directory, from which the state definitions file is loaded.
- -s state,
--state=state
- Start execution from state state. This definition
overwrites start state resolved from the start block.
- -v, --verbose
- Increase the program verbosity.
- -V, --version
- Print states version and exit.
- -W level,
--warning=level
- Set the warning level to level. Possible values for
level are:
- light
- light warnings (default)
- all
- all warnings
States program files can contain on
start block,
startrules
and
namerules blocks to specify the initial state,
state
definitions and
expressions.
The
start block is the main() of the
states program, it is
executed on script startup for each input file and it can perform any
initialization the script needs. It normally also calls the
check_startrules() and
check_namerules() primitives which
resolve the initial state from the input file name or the data found from the
beginning of the input file. Here is a sample start block which initializes
two variables and does the standard start state resolving:
start
{
a = 1;
msg = "Hello, world!";
check_startrules ();
check_namerules ();
}
Once the start block is processed, the input processing is continued from the
initial state.
The initial state is resolved by the information found from the
startrules and
namerules blocks. Both blocks contain regular
expression - symbol pairs, when the regular expression is matched from the
name of from the beginning of the input file, the initial state is named by
the corresponding symbol. For example, the following start and name rules can
distinguish C and Fortran files:
namerules
{
/\.(c|h)$/ c;
/\.[fF]$/ fortran;
}
startrules
{
/-\*- [cC] -\*-/ c;
/-\*- fortran -\*-/ fortran;
}
If these rules are used with the previously shown start block,
states
first check the beginning of input file. If it has string
-*- c -*-,
the file is assumed to contain C code and the processing is started from state
called
c. If the beginning of the input file has string
-*- fortran
-*-, the initial state is
fortran. If none of the start rules
matched, the name of the input file is matched with the namerules. If the name
ends to suffix
c or
C, we go to state
c. If the suffix is
f or
F, the initial state is fortran.
If both start and name rules failed to resolve the start state,
states
just copies its input to output unmodified.
The start state can also be specified from the command line with option
-s,
--state.
State definitions have the following syntax:
state { expr {statements} ... }
where
expr is: a regular expression, special expression or symbol and
statements is a list of statements. When the expression
expr is
matched from the input, the statement block is executed. The statement block
can call
states' primitives, user-defined subroutines, call other
states, etc. Once the block is executed, the input processing is continued
from the current intput position (which might have been changed if the
statement block called other states).
Special expressions
BEGIN and
END can be used in the place of
expr. Expression
BEGIN matches the beginning of the state, its
block is called when the state is entered. Expression
END matches the
end of the state, its block is executed when
states leaves the state.
If
expr is a symbol, its value is looked up from the global environment
and if it is a regular expression, it is matched to the input, otherwise that
rule is ignored.
The
states program file can also have top-level expressions, they are
evaluated after the program file is parsed but before any input files are
processed or the
start block is evaluated.
- call (symbol)
- Move to state symbol and continue input file
processing from that state. Function returns whatever the symbol
state's terminating return statement returned.
- calln (name)
- Like call but the argument name is evaluated
and its value must be string. For example, this function can be used to
call a state which name is stored to a variable.
- check_namerules ()
- Try to resolve start state from namerules rules.
Function returns 1 if start state was resolved or 0
otherwise.
- check_startrules ()
- Try to resolve start state from startrules rules.
Function returns 1 if start state was resolved or 0
otherwise.
- concat (str, ...)
- Concanate argument strings and return result as a new
string.
- float (any)
- Convert argument to a floating point number.
- getenv (str)
- Get value of environment variable str. Returns an
empty string if variable var is undefined.
- int (any)
- Convert argument to an integer number.
- length (item, ...)
- Count the length of argument strings or lists.
- list (any, ...)
- Create a new list which contains items any, ...
- panic (any, ...)
- Report a non-recoverable error and exit with status
1. Function never returns.
- print (any, ...)
- Convert arguments to strings and print them to the
output.
- range (source, start,
end )
- Return a sub-range of source starting from position
start (inclusively) to end (exclusively). Argument
source can be string or list.
- regexp (string)
- Convert string string to a new regular
expression.
- regexp_syntax (char,
syntax)
- Modify regular expression character syntaxes by assigning
new syntax syntax for character char. Possible values for
syntax are:
- 'w'
- character is a word constituent
- ' '
- character isn't a word constituent
- regmatch (string,
regexp)
- Check if string string matches regular expression
regexp. Functions returns a boolean success status and sets
sub-expression registers $n.
- regsub (string, regexp,
subst )
- Search regular expression regexp from string
string and replace the matching substring with string subst.
Returns the resulting string. The substitution string subst can
contain $n references to the n:th parenthesized
sup-expression.
- regsuball (string, regexp,
subst )
- Like regsub but replace all matches of regular
expression regexp from string string with string
subst.
- require_state (symbol)
- Check that the state symbol is defined. If the
required state is undefined, the function tries to autoload it. If the
loading fails, the program will terminate with an error message.
- split (regexp,
string)
- Split string string to list considering matches of
regular rexpression regexp as item separator.
- sprintf (fmt, ...)
- Format arguments according to fmt and return result
as a string.
- strcmp (str1, str2)
- Perform a case-sensitive comparision for strings
str1 and str2. Function returns a value that is:
- -1
- string str1 is less than str2
- 0
- strings are equal
- 1
- string str1 is greater than str2
- string (any)
- Convert argument to string.
- strncmp (str1, str2,
num )
- Perform a case-sensitive comparision for strings
str1 and str2 comparing at maximum num
characters.
- substring (str, start,
end )
- Return a substring of string str starting from
position start (inclusively) to end (exclusively).
- $.
- current input line number
- $n
- the n:th parenthesized regular expression
sub-expression from the latest state regular expression or from the
regmatch primitive
- $`
- everything before the matched regular rexpression. This is
usable when used with the regmatch primitive; the contents of this
variable is undefined when used in action blocks to refer the data before
the block's regular expression.
- $B
- an alias for $`
- argv
- list of input file names
- filename
- name of the current input file
- program
- name of the program (usually states)
- version
- program version string
/usr/share/enscript/hl/*.st enscript's states definitions
awk(1),
enscript(1)
Markku Rossi <
[email protected]> <
http://www.iki.fi/~mtr/>
GNU Enscript WWW home page: <
http://www.iki.fi/~mtr/genscript/>