| Ocamlyacc Tutorial | ||
|---|---|---|
| <<< Previous | Next >>> | |
As Ocamlyacc reads tokens, it pushes them onto a stack along with their semantic values. The stack is called the parser stack. Pushing a token is traditionally called shifting.
For example, suppose the infix calculator has read 1 + 5 *, with a 3 to come. The stack will have four elements, one for each token that was shifted.
But the stack does not always have an element for each token read. When the last n tokens and groupings shifted match the components of a grammar rule, they can be combined according to that rule. This is called reduction. Those tokens and groupings are replaced on the stack by a single grouping whose symbol is the result (left hand side) of that rule. Running the rule's action is part of the process of reduction, because this is what computes the semantic value of the resulting grouping.
For example, if the infix calculator's parser stack contains this:
1 + 5 * 3 |
and the next input token is a newline character, then the last three elements can be reduced to 15 via the rule:
expr: expr MULTIPLY expr; |
Then the stack contains just these three elements:
1 + 15 |
At this point, another reduction can be made, resulting in the single value 16. Then the newline token can be shifted.
The parser tries, by shifts and reductions, to reduce the entire input down to a single grouping whose symbol is the grammar's start-symbol (see Languages and Context-Free Grammars).
This kind of parser is known in the literature as a bottom-up parser.
The Ocamlyacc parser does not always reduce immediately as soon as the last n tokens and groupings match a rule. This is because such a simple strategy is inadequate to handle most languages. Instead, when a reduction is possible, the parser sometimes ``looks ahead'' at the next token in order to decide what to do.
When a token is read, it is not immediately shifted; first it becomes the look-ahead token, which is not on the stack. Now the parser can perform one or more reductions of tokens and groupings on the stack, while the look-ahead token remains off to the side. When no more reductions should take place, the look-ahead token is shifted onto the stack. This does not mean that all possible reductions have been done; depending on the token type of the look-ahead token, some rules may choose to delay their application.
Here is a simple case where look-ahead is needed. These three rules define expressions which contain binary addition operators and postfix unary factorial operators (FACTORIAL for '!'), and allow parentheses for grouping.
expr: term PLUS expr
| term
;
term: LPAREN expr RPAREN
| term FACTORIAL
| NUMBER
;
|
Suppose that the tokens 1 + 2 have been read and shifted; what should be done? If the following token is RPAREN, then the first three tokens must be reduced to form an expr. This is the only valid course, because shifting the RPAREN would produce a sequence of symbols term RPAREN, and no rule allows this.
If the following token is FACTORIAL, that is '!', then it must be shifted immediately so that 2 ! can be reduced to make a term. If instead the parser were to reduce before shifting, 1 + 2 would become an expr. It would then be impossible to shift the ! because doing so would produce on the stack the sequence of symbols expr FACTORIAL. No rule allows that sequence.
| <<< Previous | Home | Next >>> |
| The Error Reporting Function | Shift/Reduce Conflicts |