Chapter 4. Patterns

The patterns in the input are written using regular expressions in the style of lex, with a more Caml-like syntax. These are:

The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom; '*' and '+' have highest precedence, followed by '?', 'concatenation', '|', and then 'as'. For example,


	"foo" | "bar"*

is the same as


	("foo")|("bar"*)

since the '*' operator has higher precedence than than alternation ('|'). This pattern therefore matches either the string "foo" or zero-or-more of the string "bar".

To match zero-or-more "foo"'s-or-"bar"'s:


	("foo"|"bar")*

A negated character set such as the example "[^ 'A'-'Z']" above will match a newline unless "\n" (or an equivalent escape sequence) is one of the characters explicitly present in the negated character set (e.g., "[^ 'A'-'Z' '\n']"). This is unlike how many other regular expression tools treat negated character set, but unfortunately the inconsistency is historically entrenched. Matching newlines means that a pattern like [^"]* can match the entire input unless there's another quote in the input.