| Ocamllex Tutorial | ||
|---|---|---|
| <<< Previous | Next >>> | |
Each pattern in a rule has a corresponding action, which can be any arbitrary Ocaml expression. For example, here is the specification for a program which deletes all occurrences of "zap me" from its input:
{}
rule token = parse
| "zap me" { token lexbuf } (* ignore this token: no processing and continue *)
| _ as c { print_char c; token lexbuf }
|
Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away whitespace found at the end of a line:
{}
rule token = parse
| [' ' '\t']+ { print_char ' '; token lexbuf }
| [' ' '\t']+ '\n' { token lexbuf } (* ignore this token *)
|
Actions can include arbitrary Ocaml code which returns a value. Each time the lexical analyzer function is called it continues processing tokens from where it last left off until it either reaches the end of the file.
Actions are evaluated after the lexbuf is bound to the current lexer buffer and the identifer following the keyword as to the matched string. The usage of lexbuf is provided by the Lexing standard library module;
Lexing.lexeme lexbuf
Return the matched string.
Lexing.lexeme_char lexbuf n
Return the nth character in the matched string. The index number of the first character starts from 0.
Lexing.lexeme_start lexbuf
Lexing.lexeme_end lexbuf
Return the absolute position in the input text of the beginning/end of the matched string. The position of the first character is 0.
Lexing.lexeme_start_p lexbuf
Lexing.lexeme_end_p lexbuf
(Since Ocaml 3.08) Return the position of type position (See Position).
entrypoint [exp1... expn] lexbuf
Call the other lexer on the given entry point. Notice that lexbuf is the last argument.
Since Ocaml 3.08
The position information on scanning the input text is recorded in the lexbuf which has a field lex_curr_p of the type position:
type position = {
pos_fname : string; (* file name *)
pos_lnum : int; (* line number *)
pos_bol : int; (* the offset of the beginning of the line *)
pos_cnum : int; (* the offset of the position *)
}
|
The value of pos_bol field is the number of characters between the beginning of the file and the beginning of the line while the value of pos_cnum field is the number of characters between the beginning of the file and the position.
The lexing engine manages only the pos_cnum field of lexbuf.lex_curr_p with the number of characters read from the start of lexbuf. So you are reponsible for the other fields to be accurate. Typically, whenever the lexer meets a newline character, the action contains a call to the following function:
let incr_linenum lexbuf =
let pos = lexbuf.Lexing.lex_curr_p in
lexbuf.Lexing.lex_curr_p <- { pos with
Lexing.pos_lnum = pos.Lexing.pos_lnum + 1;
Lexing.pos_bol = pos.Lexing.pos_cnum;
}
;;
|
| <<< Previous | Home | Next >>> |
| How the input is matched | The generated scanner |