College aantekeningen

VTU Compiler Design Notes (BCS613C) | 2022 Scheme | Module 2

Beoordeling

Verkocht

Pagina's

Geüpload op

12-03-2026

Geschreven in

2025/2026

Comprehensive, exam-oriented notes for Compiler Design (BCS613C) under the VTU 2022 Scheme. These notes are meticulously curated to cover the entire syllabus following the prescribed Aho & Ullman (Dragon Book). What’s inside? Module 1: Lexical Analysis, Tokens, RE to DFA conversions. Module 2 & 3: Detailed Parsing techniques (LL(1), SLR, CLR, LALR) with solved numericals. Module 4: Syntax-Directed Translation (SDD & SDT). * Module 5: Intermediate Code Generation & Optimization. Bonus: Includes repeated VTU exam questions and simplified diagrams for quick revision. Perfect for students looking to score high in SEE (Semester End Exam) without reading the entire textbook!

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

MODULE - 2

Chapter 3

Lexical Analysis
In this chapter we show how to construct a lexical analyzer. To implement a
lexical analyzer by hand, it helps to start with a diagram or other description for
the lexemes of each token. We can then write code to identify each occurrence of
each lexeme on the input and to return information about the token identi ed.
We can also produce a lexical analyzer automatically by specifying the lex-
eme patterns to a lexical-analyzer generator and compiling those patterns into
code that functions as a lexical analyzer. This approach makes it easier to mod-
ify a lexical analyzer, since we have only to rewrite the a ected patterns, not
the entire program. It also speeds up the process of implementing the lexical
analyzer, since the programmer speci es the software at the very high level of
patterns and relies on the generator to produce the detailed code. We shall
introduce in Section 3.5 a lexical-analyzer generator called Lex (or Flex in a
more recent embodiment).
We begin the study of lexical-analyzer generators by introducing regular
expressions, a convenient notation for specifying lexeme patterns. We show
how this notation can be transformed, rst into nondeterministic automata
and then into deterministic automata. The latter two notations can be used as
input to a \driver," that is, code which simulates these automata and uses them
as a guide to determining the next token. This driver and the speci cation of
the automaton form the nucleus of the lexical analyzer.

3.1 The Role of the Lexical Analyzer
As the rst phase of a compiler, the main task of the lexical analyzer is to
read the input characters of the source program, group them into lexemes, and
produce as output a sequence of tokens for each lexeme in the source program.
The stream of tokens is sent to the parser for syntax analysis. It is common
for the lexical analyzer to interact with the symbol table as well. When the
lexical analyzer discovers a lexeme constituting an identi er, it needs to enter
that lexeme into the symbol table. In some cases, information regarding the
109

,110 CHAPTER 3. LEXICAL ANALYSIS
kind of identi er may be read from the symbol table by the lexical analyzer to
assist it in determining the proper token it must pass to the parser.
These interactions are suggested in Fig. 3.1. Commonly, the interaction is
implemented by having the parser call the lexical analyzer. The call, suggested
by the getNextToken command, causes the lexical analyzer to read characters
from its input until it can identify the next lexeme and produce for it the next
token, which it returns to the parser.
token
source Lexical to semantic
Parser
program Analyzer analysis
getNextToken

Symbol
Table

Figure 3.1: Interactions between the lexical analyzer and the parser
Since the lexical analyzer is the part of the compiler that reads the source
text, it may perform certain other tasks besides identi cation of lexemes. One
such task is stripping out comments and whitespace (blank, newline, tab, and
perhaps other characters that are used to separate tokens in the input). Another
task is correlating error messages generated by the compiler with the source
program. For instance, the lexical analyzer may keep track of the number
of newline characters seen, so it can associate a line number with each error
message. In some compilers, the lexical analyzer makes a copy of the source
program with the error messages inserted at the appropriate positions. If the
source program uses a macro-preprocessor, the expansion of macros may also
be performed by the lexical analyzer.
Sometimes, lexical analyzers are divided into a cascade of two processes:
a) Scanning consists of the simple processes that do not require tokenization
of the input, such as deletion of comments and compaction of consecutive
whitespace characters into one.
b) Lexical analysis proper is the more complex portion, which produces to-
kens from the output of the scanner.

3.1.1 Lexical Analysis Versus Parsing
There are a number of reasons why the analysis portion of a compiler is normally
separated into lexical analysis and parsing (syntax analysis) phases.

,3.1. THE ROLE OF THE LEXICAL ANALYZER 111

1. Simplicity of design is the most important consideration. The separation
of lexical and syntactic analysis often allows us to simplify at least one
of these tasks. For example, a parser that had to deal with comments
and whitespace as syntactic units would be considerably more complex
than one that can assume comments and whitespace have already been
removed by the lexical analyzer. If we are designing a new language,
separating lexical and syntactic concerns can lead to a cleaner overall
language design.
2. Compiler eciency is improved. A separate lexical analyzer allows us to
apply specialized techniques that serve only the lexical task, not the job
of parsing. In addition, specialized bu ering techniques for reading input
characters can speed up the compiler signi cantly.
3. Compiler portability is enhanced. Input-device-speci c peculiarities can
be restricted to the lexical analyzer.

3.1.2 Tokens, Patterns, and Lexemes
When discussing lexical analysis, we use three related but distinct terms:
A token is a pair consisting of a token name and an optional attribute
value. The token name is an abstract symbol representing a kind of
lexical unit, e.g., a particular keyword, or a sequence of input characters
denoting an identi er. The token names are the input symbols that the
parser processes. In what follows, we shall generally write the name of a
token in boldface. We will often refer to a token by its token name.
A pattern is a description of the form that the lexemes of a token may take.
In the case of a keyword as a token, the pattern is just the sequence of
characters that form the keyword. For identi ers and some other tokens,
the pattern is a more complex structure that is matched by many strings.
A lexeme is a sequence of characters in the source program that matches
the pattern for a token and is identi ed by the lexical analyzer as an
instance of that token.
Example 3.1: Figure 3.2 gives some typical tokens, their informally described
patterns, and some sample lexemes. To see how these concepts are used in
practice, in the C statement
printf("Total = %d\n", score);

both printf and score are lexemes matching the pattern for token id, and
"Total = %d\n" is a lexeme matching literal. 2
In many programming languages, the following classes cover most or all of
the tokens:

, 112 CHAPTER 3. LEXICAL ANALYSIS

TOKEN INFORMAL DESCRIPTION SAMPLE LEXEMES
if characters i, f if
else characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=
id letter followed by letters and digits pi, score, D2
number any numeric constant 3.14159, 0, 6.02e23
literal anything but ", surrounded by "'s "core dumped"

Figure 3.2: Examples of tokens

1. One token for each keyword. The pattern for a keyword is the same as
the keyword itself.
2. Tokens for the operators, either individually or in classes such as the token
comparison mentioned in Fig. 3.2.

3. One token representing all identi ers.
4. One or more tokens representing constants, such as numbers and literal
strings.
5. Tokens for each punctuation symbol, such as left and right parentheses,
comma, and semicolon.

3.1.3 Attributes for Tokens
When more than one lexeme can match a pattern, the lexical analyzer must
provide the subsequent compiler phases additional information about the par-
ticular lexeme that matched. For example, the pattern for token number
matches both 0 and 1, but it is extremely important for the code generator to
know which lexeme was found in the source program. Thus, in many cases the
lexical analyzer returns to the parser not only a token name, but an attribute
value that describes the lexeme represented by the token; the token name in-
uences parsing decisions, while the attribute value in uences translation of
tokens after the parse.
We shall assume that tokens have at most one associated attribute, although
this attribute may have a structure that combines several pieces of information.
The most important example is the token id, where we need to associate with
the token a great deal of information. Normally, information about an identi-
er | e.g., its lexeme, its type, and the location at which it is rst found (in
case an error message about that identi er must be issued) | is kept in the
symbol table. Thus, the appropriate attribute value for an identi er is a pointer
to the symbol-table entry for that identi er.

Meld schending auteursrecht

Gekoppeld boek

Terence Halsey Compiler Design: Principles, Techniques and Tools

Uitgave:2018
ISBN:9781635496772
Druk:Onbekend

Geschreven voor

Instelling: Srinivas Institute of Technology
Vak: Compiler Design (BCS613C)

Alle documenten voor dit vak (5)

Documentinformatie

Geüpload op: 12 maart 2026
Aantal pagina's: 65
Geschreven in: 2025/2026
Type: College aantekeningen
Docent(en): Dr suresha d
Bevat: Alle colleges

Onderwerpen

vtu
compiler design
bcs613c
2022 scheme

€9,54

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

aryansuvarna

Maak kennis met de verkoper

aryansuvarna Srinivas Institute of Technology

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 maanden

Aantal volgers

Documenten

Laatst verkocht

0,0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper aryansuvarna. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,54. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 45789 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

VTU Compiler Design Notes (BCS613C) | 2022 Scheme | Module 2

Voorbeeld van de inhoud

Gekoppeld boek

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?