compiler design
introduction
-
compilers for c
- gcc
- llvm
-
lexical parsers
- bison
- uses BNF notation (a context free language)
- yacc - yet another compiler compiler
- lex - used for lexical analysis
- flex - another version of the lex tool, which is open source
- bison
compile vs interpreters
compiler
- scans the entire program and translates the entire program into target code
- take time to analyze the source code - overall execution is faster than interpreters
- generate the intermediate object code which require further linking - hence more memory
- debugging is hard
- e.g. - c cpp c#
interpreters
- line by line execution
- less time to analyze the source code - overall execution time is more than compilers
- no intermediate code is generates - so less memory needed
- better error diagnostics
- e.g. - javascript, python,
compiler generate three types of code
- pure machine code -
- for embedded applications
-
doesn't assume existence of any machine code
phases of compiler
Lexical analysis
Passes of Compiler
Passes of GCC
Compiler Construction Tools
Parsing
Types of Parsing
Recursive Descent Parsing
Predictive Parsing
Calculation of First
Calculation of Follow
compiler
- read a program in one language, source language
- and translate it into another language, target language
interpreter
- directly executers the operations specified in the source file and on given data
structure of compiler
- lexical analyzer
- syntax analyzer
- semantic analyzer
- intermediate code generation
- machine independent code optimizer
- code generator
- machine dependent code optimizer
flowchart TB
_("Character Stream") -->A["Lexical Analyzer"]
A -- "Token Stream" -->B["Syntax Analyzer"]
B -- "Syntax Tree" -->C["Semantic Analyzer"]
C -- "Syntax Tree" --> D["Intermediate Code Generation"]
D -- "Intermediate Representation" -->E["Machine Independent Code Generation"]
E -- "Intermediate Representation" -->F["Code Generator"]
F -- "Target Machine Code" -->G["Machine Independent Code Optimizer"]
G --> H("Target Machine Code")
Lexical analyzer
Parser
top down parser
- generates parse tree for given input string with help of grammar productions by expanding non-terminals
- it start at start symbol and end at terminals
-
uses left most derivation
-
recursive decent parser - use backtracking
- non recursive decent parser -
LL(1)
- use parsing table (first()
,follow()
)
bottom up parser
- generates the parse tree for given input string with help of grammar productions by compressing non-terminals
- starts on non-terminals and end at start symbol
- uses reverse of right most derivation
- LR - generates parse tree by using some unambiguous grammar.
- operator precedence parser - two consecutive non-terminals and epsilon never appear on right side of production
parsing
first()
\(\text{first}(A)\) contains all terminals present in first place of every string derived by \(A\)
- \(X \rightarrow abc\)
- \(\text{first}(X) = \text{first}(a) = \{a\}\)
- \(X \rightarrow \epsilon\)
- \(\text{first}(X) = \{\epsilon\}\)
- \(X \rightarrow A\), \(A \rightarrow m\)
- \(\text{first}(X) = \text{first}(A) = \{m\}\)
follow()
\(\text{follow}(A)\) contains set of all terminals present immediate to right of \(A\)
- \(\text{follow}(A)\) never contains \(\epsilon\)
- \(\text{follow}(S) = \{\$\}\), follow of start symbol
- \(X \rightarrow ABC\),
- \(\text{follow}(B) = \text{first}(C)\)
- \(\text{follow}(C) = \text{follow}(X)\)
- \(\text{follow}(S) = \{\$\}\), follow of start symbol
- \(X \rightarrow \alpha B \beta\)
- \(\text{follow}(B) = \text{first}(\beta)\)
- \(\text{follow}(B) = \text{first}(\beta) - \{\epsilon\} \cup \text{follow}(\beta)\), if \(\epsilon \in \text{first}(\beta)\)
- \(X \rightarrow \alpha B\)
- \(\text{follow}(B) = \text{follow}(X)\)
left recursion
- if grammar contains production of form \(\(A \rightarrow A \alpha | \beta\)\)
- it it left recursive grammar
- to remove left recursion
- Convert \(\(A \rightarrow A \alpha | \beta\)\) to $$ A \rightarrow \beta A' \ A' \rightarrow \alpha A' | \epsilon $$
left factoring
- if grammar contains production in form \(\(A \rightarrow \alpha \beta_1 | \alpha \beta_2 | ... | \alpha \beta_n\)\)
- to eliminate it, write it in the form $$ A \rightarrow A' \ A' \rightarrow \beta_1 | \beta_2 | ... | \beta_n $$
LL(1)
- find first and follow sets of the grammar given
- create paring table, add dollar with terminal
- now for production
- \(A \rightarrow \text{RHS}\), fill it in \((A, \text{first}(\text{RHS}))\)
- \(A \rightarrow abc\), fill it in \((A, \text{first}(abc))\)
- if \(A \rightarrow \epsilon\), fill it in \((A, \text{follow}(A))\)
Grammar
- \(1\) \(S \rightarrow (L)\)
- \(2\) \(S \rightarrow a\)
- \(3\) \(L \rightarrow SL'\)
- \(4\) \(L' \rightarrow \epsilon\)
- \(5\) \(L' \rightarrow ,SL'\)
_ | first |
follow |
---|---|---|
\(S\) | \((\) \(a\) | $ \()\) |
\(L\) | \((\) \(a\) | \()\) |
\(L'\) | \(,\) \(\epsilon\) | \()\) |
- \(S \rightarrow (L)\)
- \(S\) and \((\)
- \(S \rightarrow a\)
- \(S\) and \(a\)
- \(L \rightarrow SL'\)
- \(L\) and first\((S)\)
- \(L\) and \((\)
- \(L\) and \(a\)
- \(L' \rightarrow \epsilon\)
- \(L'\) and follow\((L')\), because of \(\epsilon\)
- \(L'\) and \()\)
- \(L' \rightarrow ,SL'\)
- \(L'\) and \(,\)
_ | ( | ) | a | , | $ |
---|---|---|---|---|---|
\(S\) | 1 | 2 | |||
\(L\) | 3 | 3 | |||
\(L'\) | 4 | 5 |
how to parse string in LL(1)
bool ll_1_parser(string s, string[][] parsing_table){
stack<char> st;
st.push('$'); // default
int i = 0; // look ahead symbol
while(i < s.length()){
char tos = st.top(); // top os stack
st.pop();
stack.push(reverse(parsing_table[tos][s[i]]));
if(isTerminal(st.top())){
if(st.top() == s[i]){
i++; // move look ahead on matching
}
}
}
if(st.empty()) return true; // string accepted
else return false; // string not accepted
}