Context-Free Grammars
Exam Importance: Very High
CFG to CNF conversion appears in almost every paper (100% frequency). Derivations, parse trees, and ambiguity proofs are also frequently tested. This module typically carries 10-15 marks.
Introduction to Context-Free Grammars
Definition: Context-Free Grammar (CFG)
A Context-Free Grammar is a 4-tuple: G = (V, T, P, S) where:
- V = Finite set of variables (non-terminals)
- T = Finite set of terminals (alphabet Σ)
- P = Finite set of production rules of the form A → α, where A ∈ V and α ∈ (V ∪ T)*
- S = Start symbol (S ∈ V)
Why "Context-Free"?
In a CFG, production rules have the form A → α where A is a single variable. The replacement of A with α can happen regardless of the context (symbols around A). This is in contrast to context-sensitive grammars.
CFG Examples
Example 1: Language L = {aⁿbⁿ | n ≥ 1}
Derivation of "aaabbb":
Example 2: Language L = {0ⁿ1²ⁿ | n ≥ 0}
Each 0 generates two 1s.
Example 3: Balanced Parentheses
Generates: ε, (), (()), ()(), (())(), ...
CFG vs Regular Grammar
| Aspect | Regular Grammar (Type 3) | Context-Free Grammar (Type 2) |
|---|---|---|
| Production form | A → aB, A → a, A → ε | A → any string of V ∪ T |
| Power | Less powerful | More powerful |
| Can express | Regular languages | Context-free languages |
| Automaton | Finite Automaton | Push Down Automaton |
Derivations
Definition: Derivation
A derivation is a sequence of production rule applications that transforms the start symbol into a string of terminals.
If α ⇒ β in one step, we write α ⇒ β. For multiple steps: α ⇒* β
Types of Derivations
Leftmost Derivation
Always replace the leftmost variable first.
At each step, find the leftmost variable and apply a production to it.
Rightmost Derivation
Always replace the rightmost variable first.
At each step, find the rightmost variable and apply a production to it.
Derivations for Grammar: S → aB | bA, A → aS | bAA | a, B → bS | aBB | b
Derive: "aaabbabbba"
Leftmost Derivation:
Rightmost Derivation:
Parse Trees (Derivation Trees)
Definition: Parse Tree
A parse tree is a graphical representation of a derivation that shows how the start symbol derives a string.
- Root is labeled with the start symbol
- Interior nodes are labeled with variables
- Leaf nodes are labeled with terminals or ε
- Children of a node represent the right-hand side of a production
Parse Tree for "aabb" using S → aSb | ab
Parse tree for "aabb": S → aSb → aabb
Reading leaves left to right: a - a - b - b = "aabb" ✓
Key Properties
- Different derivations (leftmost vs rightmost) of the same string produce the same parse tree
- A parse tree uniquely represents the structure of derivation
- The yield of a parse tree (reading leaves left to right) gives the derived string
Ambiguity
Definition: Ambiguous Grammar
A grammar G is ambiguous if there exists at least one string w ∈ L(G) that has more than one:
- Leftmost derivation, OR
- Rightmost derivation, OR
- Parse tree
All three conditions are equivalent.
Exam Strategy
To prove a grammar is ambiguous:
- Find a specific string in the language
- Show TWO DIFFERENT parse trees for that string
- Or show two different leftmost derivations
Prove that S → aS | aSbS | ε is ambiguous
String: "aab"
Leftmost Derivation 1:
Leftmost Derivation 2:
Parse Tree 1:
Tree 1: S → aSbS first
Parse Tree 2:
Tree 2: S → aS first, then S → aSbS
Two different parse trees for the same string → Grammar is AMBIGUOUS
Simplification of CFG
Before converting to normal forms, we need to simplify the grammar by removing:
Remove Useless Symbols
A symbol is useless if it's either:
- Non-generating: Cannot derive any terminal string
- Non-reachable: Cannot be reached from the start symbol
Order matters: First remove non-generating, then non-reachable.
Remove ε-Productions (Null Productions)
Productions of the form A → ε (except if S → ε and S doesn't appear on RHS)
Method:
- Find all nullable variables (can derive ε)
- For each production with nullable variables, add all combinations with/without them
- Remove all ε-productions
Remove Unit Productions
Productions of the form A → B (single variable)
Method:
- Find all pairs (A, B) where A ⇒* B
- For each non-unit production B → α, add A → α
- Remove all unit productions
Simplify: S → aAa, A → bBb, B → cCc, C → dDd, D → eF
Step 1: F is non-generating (no productions for F).
Step 2: Remove D → eF, making D non-generating.
Step 3: Remove C → dDd, making C non-generating.
Continue... All symbols become useless!
Result: Empty grammar (L = ∅)
Chomsky Normal Form (CNF)
Definition: Chomsky Normal Form
A CFG is in Chomsky Normal Form if every production is of one of these forms:
- A → BC (two variables)
- A → a (single terminal)
- S → ε (only if ε ∈ L and S doesn't appear on RHS)
Exam Importance
CNF conversion is one of the most frequently asked questions (appears in Dec 2022, Dec 2023, Dec 2024, May 2025). Master this algorithm!
CNF Conversion Algorithm
Step 1: Remove ε-Productions
Find nullable variables and expand productions.
Step 2: Remove Unit Productions
Replace A → B chains with direct productions.
Step 3: Remove Useless Symbols
Remove non-generating and non-reachable symbols.
Step 4: Handle Terminals in Long Productions
For productions like A → aBc, replace each terminal with a new variable:
- Create Cₐ → a, Cc → c
- Replace: A → CₐBCc
Step 5: Break Long Productions
For productions with more than 2 variables, break into binary chains:
- A → BCDE becomes:
- A → BX₁, X₁ → CX₂, X₂ → DE
Convert to CNF: S → ABA, A → aA | ε, B → bB | ε
Step 1: Find Nullable Variables
A → ε, so A is nullable
B → ε, so B is nullable
S → ABA, all nullable, so S is nullable
Nullable = {A, B, S}
Step 2: Remove ε-Productions
Original: S → ABA
Expand with nullable combinations:
- S → ABA | AB | BA | AA | A | B | ε
A → aA | a (remove ε, keep aA and add a)
B → bB | b (remove ε, keep bB and add b)
Keep S → ε separately if needed
Step 3: Remove Unit Productions
S → A becomes S → aA | a
S → B becomes S → bB | b
Updated:
Step 4: Handle Terminals
For A → aA: Create Cₐ → a, then A → CₐA
For B → bB: Create Cᵦ → b, then B → CᵦB
Step 5: Break Long Productions
S → ABA becomes S → AX₁, X₁ → BA
Final CNF:
Greibach Normal Form (GNF)
Definition: Greibach Normal Form
A CFG is in Greibach Normal Form if every production is of the form:
- A → aα where a is a terminal and α is a string of variables (possibly empty)
- S → ε (only if ε ∈ L)
Every production starts with exactly one terminal!
GNF Conversion Steps
Start from CNF
First convert grammar to CNF.
Order Variables
Assign order: A₁, A₂, ..., Aₙ with S = A₁
Eliminate Left Recursion
For direct left recursion A → Aα | β, replace with:
- A → βZ
- Z → αZ | α
Substitute to Get Terminal First
Replace leading variables with their productions until each starts with a terminal.
Convert to GNF: S → XY, X → 0X | 1Y | 1, Y → 1
Step 1: Already simplified.
Step 2: S → XY starts with variable X, substitute:
Result is in GNF! Each production starts with a terminal.
Chomsky Hierarchy
The Four Types of Grammars
Noam Chomsky classified grammars into four types based on the restrictions on their productions.
Chomsky Hierarchy: Each type is a proper subset of the outer types
Detailed Comparison
| Type | Grammar Name | Production Form | Automaton | Example Language |
|---|---|---|---|---|
| Type 3 | Regular Grammar | A → aB, A → a, A → ε | Finite Automaton (DFA/NFA) | a*b*, (ab)* |
| Type 2 | Context-Free Grammar | A → α (any string) | Push Down Automaton (PDA) | aⁿbⁿ, palindromes |
| Type 1 | Context-Sensitive Grammar | αAβ → αγβ (|αAβ| ≤ |αγβ|) | Linear Bounded Automaton | aⁿbⁿcⁿ |
| Type 0 | Unrestricted Grammar | α → β (no restrictions) | Turing Machine | All computable languages |
Key Points for Exam
- Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0 (proper subsets)
- Every Regular Language is Context-Free, but NOT vice versa
- aⁿbⁿ is Context-Free but NOT Regular
- aⁿbⁿcⁿ is Context-Sensitive but NOT Context-Free
Practice Questions
5 Mark Questions
Definition: A grammar is ambiguous if there exists at least one string in the language that has more than one:
- Leftmost derivation
- Rightmost derivation
- Parse tree
Example: Consider G: S → a | Sa | bSS | SSb | SbS
For string "baa":
Derivation 1: S ⇒ bSS ⇒ baS ⇒ baa
Derivation 2: S ⇒ bSS ⇒ bSa ⇒ baa
Two different leftmost derivations → Grammar is ambiguous.
Implications:
- Parsing becomes non-deterministic
- Multiple interpretations of the same program
- Some languages are inherently ambiguous (no unambiguous grammar exists)
Chomsky Hierarchy classifies formal grammars into four types:
| Type | Name | Production Restriction | Automaton |
|---|---|---|---|
| 3 | Regular | A → aB or A → a | FA |
| 2 | Context-Free | A → γ (single variable LHS) | PDA |
| 1 | Context-Sensitive | αAβ → αγβ, |LHS| ≤ |RHS| | LBA |
| 0 | Unrestricted | No restriction | TM |
Diagram: Concentric circles with Type 3 (Regular) innermost and Type 0 (Unrestricted) outermost.
Relationship: Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0
Greibach Normal Form (GNF):
A CFG is in GNF if every production has the form:
- A → aα where a is terminal, α is string of variables
- S → ε (if ε ∈ L)
Properties:
- Each production starts with exactly one terminal
- Useful for building top-down parsers
- Derivation of string of length n takes exactly n steps
Example:
10 Mark Questions
Step 1: Find Nullable Variables
- A → ε, so A is nullable
- S → A and A is nullable, so S is nullable
- S → ε confirms S is nullable
Nullable = {S, A}
Step 2: Remove ε-Productions
(Keep S → ε as S₀ → S | ε if needed)
Step 3: Remove Unit Productions
S → A becomes S → a | b (substituting A's productions)
Step 4: Handle Terminals in Long Productions
Create: Cₐ → a, Cᵦ → b
Step 5: Break Long Productions
Final CNF (with start ε if needed):
String to derive: "aaabbabbba"
Leftmost Derivation:
Rightmost Derivation:
Parse Tree Structure:
- Root: S
- S → aB (a + B subtree)
- B → aBB (a + B + B)
- Continue expanding...
Key Productions Used:
- S → aB (3 times via recursion)
- B → aBB, B → bS, B → b
- A → a (for the final 'a')
Analysis:
- For each 0, we need exactly two 1s
- 0s come first, then 1s
- n = 0 gives ε (empty string)
Grammar:
Derivations:
- n=0: S ⇒ ε ✓
- n=1: S ⇒ 0S11 ⇒ 011 ✓
- n=2: S ⇒ 0S11 ⇒ 00S1111 ⇒ 001111 ✓
- n=3: S ⇒ 0S11 ⇒ 00S1111 ⇒ 000S111111 ⇒ 000111111 ✓
Alternative (for n ≥ 1):