4.4. TOP-DOWN PARSING -3 217
a) The grammar of Exercise 4.2.1.
b) The grammar of Exercise 4.2.2(a).
c) The grammar of Exercise 4.2.2(c).
d) The grammar of Exercise 4.2.2(e).
e) The grammar of Exercise 4.2.2(g).
! Exercise 4.3.3: The following grammar is proposed to remove the \dangling-
else ambiguity" discussed in Section 4.3.2:
stmt ! if expr then stmt
j matchedStmt
matchedStmt ! if expr then matchedStmt else stmt
j other
Show that this grammar is still ambiguous.
4.4 Top-Down Parsing
Top-down parsing can be viewed as the problem of constructing a parse tree for
the input string, starting from the root and creating the nodes of the parse tree
in preorder (depth- rst, as discussed in Section 2.3.4). Equivalently, top-down
parsing can be viewed as nding a leftmost derivation for an input string.
Example 4.27: The sequence of parse trees in Fig. 4.12 for the input id+idid
is a top-down parse according to grammar (4.2), repeated here:
E ! T E0
E0 ! + T E0 j
T ! F T0 (4.28)
T0 ! F T0 j
F ! ( E ) j id
This sequence of trees corresponds to a leftmost derivation of the input. 2
At each step of a top-down parse, the key problem is that of determining
the production to be applied for a nonterminal, say A. Once an A-production
is chosen, the rest of the parsing process consists of \matching" the terminal
symbols in the production body with the input string.
The section begins with a general form of top-down parsing, called recursive-
descent parsing, which may require backtracking to nd the correct A-produc-
tion to be applied. Section 2.4.2 introduced predictive parsing, a special case of
recursive-descent parsing, where no backtracking is required. Predictive parsing
chooses the correct A-production by looking ahead at the input a xed number
of symbols, typically we may look only at one (that is, the next input symbol).
,218 CHAPTER 4. SYNTAX ANALYSIS
E ) E ) E ) E ) E ) E
lm lm lm lm lm
T E0 T E0 T E0 T E0 T E0
F T0 F T0 F T0 F T 0 + T E0
id id id
)
lm
E )
lm
E )
lm
E
T E0 T E0 T E0
F T0 + T E0 F T0 + T E0 F T0 + T E0
id F T0 id F T0 id F T0
id id F T 0
)
lm
E )
lm
E )
lm
E
T E0 T E0 T E0
F T0 + T E0 F T0 + T E0 F T0 + T E0
id F T0 id F T0 id F T0
id F T 0 id F T 0 id F T 0
id id id
Figure 4.12: Top-down parse for id + id id
For example, consider the top-down parse in Fig. 4.12, which constructs
a tree with two nodes labeled E 0 . At the rst E 0 node (in preorder), the
production E 0 ! +TE 0 is chosen; at the second E 0 node, the production E 0 !
is chosen. A predictive parser can choose between E 0 -productions by looking
at the next input symbol.
The class of grammars for which we can construct predictive parsers looking
k symbols ahead in the input is sometimes called the LL(k) class. We discuss the
LL(1) class in Section 4.4.3, but introduce certain computations, called FIRST
and FOLLOW, in a preliminary Section 4.4.2. From the FIRST and FOLLOW
sets for a grammar, we shall construct \predictive parsing tables," which make
explicit the choice of production during top-down parsing. These sets are also
useful during bottom-up parsing, as we shall see.
In Section 4.4.4 we give a nonrecursive parsing algorithm that maintains
a stack explicitly, rather than implicitly via recursive calls. Finally, in Sec-
tion 4.4.5 we discuss error recovery during top-down parsing.
, 4.4. TOP-DOWN PARSING 219
4.4.1 Recursive-Descent Parsing
void A() f
1) Choose an A-production, A ! X X Xk ;
1 2
2) for ( i = 1 to k ) f
3) if ( Xi is a nonterminal )
4) call procedure Xi ();
5) else if ( Xi equals the current input symbol a )
6) advance the input to the next symbol;
7) else /* an error has occurred */;
g
g
Figure 4.13: A typical procedure for a nonterminal in a top-down parser
A recursive-descent parsing program consists of a set of procedures, one for each
nonterminal. Execution begins with the procedure for the start symbol, which
halts and announces success if its procedure body scans the entire input string.
Pseudocode for a typical nonterminal appears in Fig. 4.13. Note that this
pseudocode is nondeterministic, since it begins by choosing the A-production
to apply in a manner that is not speci ed.
General recursive-descent may require backtracking; that is, it may require
repeated scans over the input. However, backtracking is rarely needed to parse
programming language constructs, so backtracking parsers are not seen fre-
quently. Even for situations like natural language parsing, backtracking is not
very ecient, and tabular methods such as the dynamic programming algo-
rithm of Exercise 4.4.9 or the method of Earley (see the bibliographic notes)
are preferred.
To allow backtracking, the code of Fig. 4.13 needs to be modi ed. First, we
cannot choose a unique A-production at line (1), so we must try each of several
productions in some order. Then, failure at line (7) is not ultimate failure, but
suggests only that we need to return to line (1) and try another A-production.
Only if there are no more A-productions to try do we declare that an input
error has been found. In order to try another A-production, we need to be able
to reset the input pointer to where it was when we rst reached line (1). Thus,
a local variable is needed to store this input pointer for future use.
Example 4.29: Consider the grammar
S ! cAd
A ! ab j a
To construct a parse tree top-down for the input string w = cad, begin with a
tree consisting of a single node labeled S , and the input pointer pointing to c,
the rst symbol of w. S has only one production, so we use it to expand S and