regexp - Plan 9 regular expression notation
This manual page describes the regular expression syntax used by the Plan 9
regular expression library
regexp(3). It is the form used by
egrep(1) before
egrep got complicated.
A
regular expression specifies a set of strings of characters. A member
of this set of strings is said to be
matched by the regular expression.
In many applications a delimiter character, commonly bounds a regular
expression. In the following specification for regular expressions the word
`character' means any character (rune) but newline.
The syntax for a regular expression
e0 is
-
e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')'
e2: e3
| e2 REP
REP: '*' | '+' | '?'
e1: e2
| e1 e2
e0: e1
| e0 '|' e1
A
literal is any non-metacharacter, or a metacharacter (one of
.*+?[]()|\^$), or the delimiter preceded by
A
charclass is a nonempty string
s bracketed
[s] (or
[^s]); it matches any
character in (or not in)
s. A negated character class never matches
newline. A substring
a-b, with
a and
b in
ascending order, stands for the inclusive range of characters between
a
and
b. In
s, the metacharacters an initial and the regular
expression delimiter must be preceded by a other metacharacters have no
special meaning and may appear unescaped.
A matches any character.
A matches the beginning of a line; matches the end of the line.
The
REP operators match zero or more (
*), one or more (
+),
zero or one (
?), instances respectively of the preceding regular
expression
e2.
A concatenated regular expression,
e1e2, matches a match to
e1
followed by a match to
e2.
An alternative regular expression,
e0|e1, matches either a match to
e0 or a match to
e1.
A match to any part of a regular expression extends as far as possible without
preventing a match to the remainder of the regular expression.
regexp(3)