A compiler is a system software that converts a program written in high-level language that is suitable for
programmers into a low-level language required by computers. During the process, the compiler will also
attempt to spot and report obvious programmer mistakes. This is illustrated below:
Source Compiler Target
Program Program
Error Warnings
Messa
There are two parts of compilation:
i). The analysis part – breaks up the source program into constant pieces and creates an
intermediate representation of the source program.
ii). Synthesis part – constructs desired target program from the intermediate representation.
Phases of a Compiler
To ease the process of development and understanding, a compiler can be conceptually divided into the
following phases:
i) Lexical analysis
ii) Syntax analysis
iii) Semantic analysis
iv) Intermediate generation
v) Target code generation
vi) Code optimization
vii) Symbol table management
viii) Error handling and recovery
The following diagram shows the relationship between these modules:
Compiler Construction Notes ~ Wainaina Page 1 of 12
, Source Program
Lexical Analysis
Tokens
Syntax Analysis
Parse Tree
Semantic Analysis
Symbol table Error Handling
Semantic Correctness and Recovery
management
Intermediate
Code generation
Intermediate Code
Target Code
Generation
Target Code
Code
Optimization
Optimized
Target Code
i). Lexical Analysis
This is the initial part of reading and analysing the program text; the text is read and divided into tokens,
each of which corresponds to a symbol in the programming language e.g. a variable name, number,
keyword, etc. It is basically used to identify valid words in the input source program.
ii). Syntax Analysis
This phase takes the list of tokens produced by the lexical analysis and arranges them into a tree structure
(syntax tree) that reflects the structure of the program. It is generally used to establish if the program is
grammatically correct. This phase is often called parsing.
iii). Semantic Analysis
This phase analyses the syntax tree to determine if the program violates certain consistency requirements
e.g. if a variable is used and not declared or if it is used in a context that doesn’t make sense given the type
of variable such as trying to assign a value greater than the variable.
iv). Intermediate Code Generation
This phase is used to translate the program into a simple machine independent intermediate language. This
process helps to retarget the compiler generating code from one processor to another.
Generally, generating machine code directly from source code entails two problems
With m languages and n target machines, we need to write m × n compilers
The code optimizer which is one of the largest and very-difficult-to-write components of any
compiler cannot be reused
By converting source code to an intermediate code, a machine-independent code optimizer may be written
Compiler Construction Notes ~ Wainaina Page 2 of 12