Chapter 6. Actions

Each pattern in a rule has a corresponding action, which can be any arbitrary Ocaml expression. For example, here is the specification for a program which deletes all occurrences of "zap me" from its input:


{}
rule token = parse
  | "zap me"	{ token lexbuf }	(* ignore this token: no processing and continue *)
  | _ as c	{ print_char c; token lexbuf }

Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away whitespace found at the end of a line:


{}
rule token = parse
  | [' ' '\t']+		{ print_char ' '; token lexbuf }
  | [' ' '\t']+ '\n'	{ token lexbuf }      (* ignore this token *)

Actions can include arbitrary Ocaml code which returns a value. Each time the lexical analyzer function is called it continues processing tokens from where it last left off until it either reaches the end of the file.

Actions are evaluated after the lexbuf is bound to the current lexer buffer and the identifer following the keyword as to the matched string. The usage of lexbuf is provided by the Lexing standard library module;

6.1. Position

Since Ocaml 3.08

The position information on scanning the input text is recorded in the lexbuf which has a field lex_curr_p of the type position:


  type position = {
     pos_fname : string;		(* file name *)
     pos_lnum : int;		(* line number *)
     pos_bol : int;		(* the offset of the beginning of the line *)
     pos_cnum : int;		(* the offset of the position *)
  } 

The value of pos_bol field is the number of characters between the beginning of the file and the beginning of the line while the value of pos_cnum field is the number of characters between the beginning of the file and the position.

The lexing engine manages only the pos_cnum field of lexbuf.lex_curr_p with the number of characters read from the start of lexbuf. So you are reponsible for the other fields to be accurate. Typically, whenever the lexer meets a newline character, the action contains a call to the following function:


  let incr_linenum lexbuf =
    let pos = lexbuf.Lexing.lex_curr_p in
    lexbuf.Lexing.lex_curr_p <- { pos with
      Lexing.pos_lnum = pos.Lexing.pos_lnum + 1;
      Lexing.pos_bol = pos.Lexing.pos_cnum;
    }
  ;;