APJ Abdul Kalam University,India
KTU University
Course Name : Compiler design
Course code : CST 302
Course year: Btech semester 6 (3rd year)
Summary notes of modules for exam preparation
Module 1: Introduction to Compilers and Lexical Analysis
This module introduces the foundational concepts of compilers and the first phase of the compilation
process, lexical analysis.
Analysis of the Source Program: This involves understanding the structure of the source code written in a
high-level programming language. The compiler breaks down the code into manageable parts to process it
further.
Analysis and Synthesis Phases: Compilers work in two major phases:
Analysis Phase: Breaks the source program into constituent pieces (tokens, syntax, semantics) and creates an
intermediate representation.
Synthesis Phase: Takes the intermediate representation and generates the target code (e.g., machine code or
bytecode).
Phases of a Compiler: A compiler operates in several phases:
Lexical Analysis: Converts the source code into tokens.Syntax Analysis: Checks the structure of the code using
grammar rules.
Semantic Analysis: Ensures the code makes sense logically (e.g., type checking).
Intermediate Code Generation: Produces an intermediate representation.Code Optimization: Improves the
intermediate code for efficiency.
Code Generation: Produces the final target code.Compiler Writing Tools: Tools like Lex (for lexical analysis)
and Yacc (for syntax analysis) help automate the process of building compilers.
Bootstrapping: The process of writing a compiler for a language using the same language (e.g., writing a C
compiler in C). This involves creating a minimal compiler first, then using it to build a more complex one.
Lexical Analysis: This is the first phase of a compiler:
Role of Lexical Analyzer: It scans the source code character by character and groups them into tokens (e.g.,
keywords, identifiers, operators).
Input Buffering: To efficiently read and process the source code, the lexical analyzer uses buffering
techniques (e.g., two-buffer scheme) to minimize I/O operations.
Specification of Tokens: Tokens are defined using regular expressions (e.g., an identifier might be defined as
[a-zA-Z][a-zA-Z0-9]*).
Recognition of Tokens: The lexical analyzer uses finite automata (like DFA or NFA) to recognize tokens based
on the regular expressions.
Module 2: Introduction to Syntax Analysis
This module focuses on the second phase of compilation: syntax analysis, which ensures the code follows the
grammatical rules of the programming language.
Role of the Syntax Analyzer: The syntax analyzer (or parser) takes tokens from the lexical analyzer and
checks if they form valid statements according to the language’s grammar.
Syntax Error Handling: If the code violates grammar rules (e.g., missing semicolon), the parser detects and
reports errors. It may also attempt error recovery to continue parsing.
Review of Context-Free Grammars: These are formal grammars used to define the syntax of programming
languages.
A context-free grammar consists of:Terminals (tokens).Non-terminals (syntactic categories like
"statement").Productions (rules like stmt → if ( expr ) stmt).
Derivation and Parse Trees: A derivation shows how a string is generated from the grammar’s start symbol. A
parse tree visually represents this derivation, showing the hierarchical structure of the code.
KTU University
Course Name : Compiler design
Course code : CST 302
Course year: Btech semester 6 (3rd year)
Summary notes of modules for exam preparation
Module 1: Introduction to Compilers and Lexical Analysis
This module introduces the foundational concepts of compilers and the first phase of the compilation
process, lexical analysis.
Analysis of the Source Program: This involves understanding the structure of the source code written in a
high-level programming language. The compiler breaks down the code into manageable parts to process it
further.
Analysis and Synthesis Phases: Compilers work in two major phases:
Analysis Phase: Breaks the source program into constituent pieces (tokens, syntax, semantics) and creates an
intermediate representation.
Synthesis Phase: Takes the intermediate representation and generates the target code (e.g., machine code or
bytecode).
Phases of a Compiler: A compiler operates in several phases:
Lexical Analysis: Converts the source code into tokens.Syntax Analysis: Checks the structure of the code using
grammar rules.
Semantic Analysis: Ensures the code makes sense logically (e.g., type checking).
Intermediate Code Generation: Produces an intermediate representation.Code Optimization: Improves the
intermediate code for efficiency.
Code Generation: Produces the final target code.Compiler Writing Tools: Tools like Lex (for lexical analysis)
and Yacc (for syntax analysis) help automate the process of building compilers.
Bootstrapping: The process of writing a compiler for a language using the same language (e.g., writing a C
compiler in C). This involves creating a minimal compiler first, then using it to build a more complex one.
Lexical Analysis: This is the first phase of a compiler:
Role of Lexical Analyzer: It scans the source code character by character and groups them into tokens (e.g.,
keywords, identifiers, operators).
Input Buffering: To efficiently read and process the source code, the lexical analyzer uses buffering
techniques (e.g., two-buffer scheme) to minimize I/O operations.
Specification of Tokens: Tokens are defined using regular expressions (e.g., an identifier might be defined as
[a-zA-Z][a-zA-Z0-9]*).
Recognition of Tokens: The lexical analyzer uses finite automata (like DFA or NFA) to recognize tokens based
on the regular expressions.
Module 2: Introduction to Syntax Analysis
This module focuses on the second phase of compilation: syntax analysis, which ensures the code follows the
grammatical rules of the programming language.
Role of the Syntax Analyzer: The syntax analyzer (or parser) takes tokens from the lexical analyzer and
checks if they form valid statements according to the language’s grammar.
Syntax Error Handling: If the code violates grammar rules (e.g., missing semicolon), the parser detects and
reports errors. It may also attempt error recovery to continue parsing.
Review of Context-Free Grammars: These are formal grammars used to define the syntax of programming
languages.
A context-free grammar consists of:Terminals (tokens).Non-terminals (syntactic categories like
"statement").Productions (rules like stmt → if ( expr ) stmt).
Derivation and Parse Trees: A derivation shows how a string is generated from the grammar’s start symbol. A
parse tree visually represents this derivation, showing the hierarchical structure of the code.