compiler design

introduction

compilers for c
- gcc
- llvm
lexical parsers
- bison
  - uses BNF notation (a context free language)
- yacc - yet another compiler compiler
- lex - used for lexical analysis
- flex - another version of the lex tool, which is open source

compile vs interpreters

compiler

scans the entire program and translates the entire program into target code
take time to analyze the source code - overall execution is faster than interpreters
generate the intermediate object code which require further linking - hence more memory
debugging is hard
e.g. - c cpp c#

interpreters

line by line execution
less time to analyze the source code - overall execution time is more than compilers
no intermediate code is generates - so less memory needed
better error diagnostics
e.g. - javascript, python,

compiler generate three types of code

pure machine code -
- for embedded applications
- doesn't assume existence of any machine code

phases of compiler

Lexical analysis

Passes of Compiler

Passes of GCC

Compiler Construction Tools

Parsing

Types of Parsing

Recursive Descent Parsing

Predictive Parsing

Calculation of First

Calculation of Follow

compiler

read a program in one language, source language
and translate it into another language, target language

interpreter

directly executers the operations specified in the source file and on given data

structure of compiler

lexical analyzer
syntax analyzer
semantic analyzer
intermediate code generation
machine independent code optimizer
code generator
machine dependent code optimizer

flowchart TB
    _("Character Stream") -->A["Lexical Analyzer"]
    A -- "Token Stream" -->B["Syntax Analyzer"]
    B -- "Syntax Tree" -->C["Semantic Analyzer"]
    C -- "Syntax Tree" --> D["Intermediate Code Generation"]
    D -- "Intermediate Representation" -->E["Machine Independent Code Generation"]
    E -- "Intermediate Representation" -->F["Code Generator"]
    F -- "Target Machine Code" -->G["Machine Independent Code Optimizer"]
    G --> H("Target Machine Code")

Lexical analyzer

Parser

top down parser

generates parse tree for given input string with help of grammar productions by expanding non-terminals
it start at start symbol and end at terminals
uses left most derivation
recursive decent parser - use backtracking
non recursive decent parser - LL(1) - use parsing table (first(), follow())

bottom up parser

generates the parse tree for given input string with help of grammar productions by compressing non-terminals
starts on non-terminals and end at start symbol
uses reverse of right most derivation

LR - generates parse tree by using some unambiguous grammar.
operator precedence parser - two consecutive non-terminals and epsilon never appear on right side of production

parsing

`first()`

$\text{first}(A)$ contains all terminals present in first place of every string derived by $A$

$X \rightarrow abc$
- $\text{first}(X) = \text{first}(a) = \{a\}$
$X \rightarrow \epsilon$
- $\text{first}(X) = \{\epsilon\}$
$X \rightarrow A$, $A \rightarrow m$
- $\text{first}(X) = \text{first}(A) = \{m\}$

`follow()`

$\text{follow}(A)$ contains set of all terminals present immediate to right of $A$

$\text{follow}(A)$ never contains $\epsilon$
$\text{follow}(S) = \{\$\}$, follow of start symbol
$X \rightarrow ABC$,
- $\text{follow}(B) = \text{first}(C)$
- $\text{follow}(C) = \text{follow}(X)$

$\text{follow}(S) = \{\$\}$, follow of start symbol
$X \rightarrow \alpha B \beta$
- $\text{follow}(B) = \text{first}(\beta)$
- $\text{follow}(B) = \text{first}(\beta) - \{\epsilon\} \cup \text{follow}(\beta)$, if $\epsilon \in \text{first}(\beta)$
$X \rightarrow \alpha B$
- $\text{follow}(B) = \text{follow}(X)$

left recursion

if grammar contains production of form $\(A \rightarrow A \alpha | \beta$\)
it it left recursive grammar
to remove left recursion
Convert $\(A \rightarrow A \alpha | \beta$\) to $$ A \rightarrow \beta A' \ A' \rightarrow \alpha A' | \epsilon $$

left factoring

if grammar contains production in form $\(A \rightarrow \alpha \beta_1 | \alpha \beta_2 | ... | \alpha \beta_n$\)
to eliminate it, write it in the form $$ A \rightarrow A' \ A' \rightarrow \beta_1 | \beta_2 | ... | \beta_n $$

LL(1)

find first and follow sets of the grammar given
create paring table, add dollar with terminal
now for production
- $A \rightarrow \text{RHS}$, fill it in $(A, \text{first}(\text{RHS}))$
- $A \rightarrow abc$, fill it in $(A, \text{first}(abc))$
- if $A \rightarrow \epsilon$, fill it in $(A, \text{follow}(A))$

Grammar

$1$ $S \rightarrow (L)$
$2$ $S \rightarrow a$
$3$ $L \rightarrow SL'$
$4$ $L' \rightarrow \epsilon$
$5$ $L' \rightarrow ,SL'$

_	`first`	`follow`
$S$	$($ $a$	$ $)$
$L$	$($ $a$	$)$
$L'$	$,$ $\epsilon$	$)$

$S \rightarrow (L)$
- $S$ and $($
$S \rightarrow a$
- $S$ and $a$
$L \rightarrow SL'$
- $L$ and first$(S)$
- $L$ and $($
- $L$ and $a$
$L' \rightarrow \epsilon$
- $L'$ and follow$(L')$, because of $\epsilon$
- $L'$ and $)$
$L' \rightarrow ,SL'$
- $L'$ and $,$

_	(	)	a	,
$S$	1		2
$L$	3		3
$L'$		4		5

how to parse string in `LL(1)`

bool ll_1_parser(string s, string[][] parsing_table){
  stack<char> st;
  st.push('$'); // default

  int i = 0; // look ahead symbol
  while(i < s.length()){
    char tos = st.top(); // top os stack
    st.pop();

    stack.push(reverse(parsing_table[tos][s[i]]));
    if(isTerminal(st.top())){
      if(st.top() == s[i]){
        i++; // move look ahead on matching
      }
    }
  }

  if(st.empty()) return true; // string accepted
  else return false; // string not accepted
}

_	`first`	`follow`
\(S\)	\((\) \(a\)	$ \()\)
\(L\)	\((\) \(a\)	\()\)
\(L'\)	\(,\) \(\epsilon\)	\()\)

compiler design

introduction

compile vs interpreters

compiler

interpreters

compiler generate three types of code

doesn't assume existence of any machine code

phases of compiler

Lexical analysis

Passes of Compiler

Passes of GCC

Compiler Construction Tools

Parsing

Types of Parsing

Recursive Descent Parsing

Predictive Parsing

Calculation of First

Calculation of Follow

compiler

interpreter

structure of compiler

Lexical analyzer

Parser

top down parser

bottom up parser

parsing

first()

follow()

left recursion

left factoring

LL(1)

how to parse string in LL(1)

`first()`

`follow()`

how to parse string in `LL(1)`