Context-Free Grammars

Exam Importance: Very High

CFG to CNF conversion appears in almost every paper (100% frequency). Derivations, parse trees, and ambiguity proofs are also frequently tested. This module typically carries 10-15 marks.

Introduction to Context-Free Grammars

Definition: Context-Free Grammar (CFG)

A Context-Free Grammar is a 4-tuple: G = (V, T, P, S) where:

  • V = Finite set of variables (non-terminals)
  • T = Finite set of terminals (alphabet Σ)
  • P = Finite set of production rules of the form A → α, where A ∈ V and α ∈ (V ∪ T)*
  • S = Start symbol (S ∈ V)

Why "Context-Free"?

In a CFG, production rules have the form A → α where A is a single variable. The replacement of A with α can happen regardless of the context (symbols around A). This is in contrast to context-sensitive grammars.

CFG Examples

Example 1: Language L = {aⁿbⁿ | n ≥ 1}

S aSb | ab

Derivation of "aaabbb":

S aSb aaSbb aaabbb

Example 2: Language L = {0ⁿ1²ⁿ | n ≥ 0}

S 0S11 | ε

Each 0 generates two 1s.

Example 3: Balanced Parentheses

S (S) | SS | ε

Generates: ε, (), (()), ()(), (())(), ...

CFG vs Regular Grammar

Aspect Regular Grammar (Type 3) Context-Free Grammar (Type 2)
Production form A → aB, A → a, A → ε A → any string of V ∪ T
Power Less powerful More powerful
Can express Regular languages Context-free languages
Automaton Finite Automaton Push Down Automaton

Derivations

Definition: Derivation

A derivation is a sequence of production rule applications that transforms the start symbol into a string of terminals.

If α ⇒ β in one step, we write α ⇒ β. For multiple steps: α ⇒* β

Types of Derivations

Leftmost Derivation

Always replace the leftmost variable first.

At each step, find the leftmost variable and apply a production to it.

Rightmost Derivation

Always replace the rightmost variable first.

At each step, find the rightmost variable and apply a production to it.

Example

Derivations for Grammar: S → aB | bA, A → aS | bAA | a, B → bS | aBB | b

Derive: "aaabbabbba"

Leftmost Derivation:
S ⇒ aB ⇒ aaBB ⇒ aaaBBB ⇒ aaabBB ⇒ aaabbSB ⇒ aaabbaBB ⇒ aaabbabB ⇒ aaabbabbS ⇒ aaabbabbbA ⇒ aaabbabbba
Rightmost Derivation:
S ⇒ aB ⇒ aaBB ⇒ aaBbS ⇒ aaBbbA ⇒ aaBbba ⇒ aaaBBbba ⇒ aaaBbbba ⇒ aaabSbbba ⇒ aaabbAbbba ⇒ aaabbabbba

Parse Trees (Derivation Trees)

Definition: Parse Tree

A parse tree is a graphical representation of a derivation that shows how the start symbol derives a string.

  • Root is labeled with the start symbol
  • Interior nodes are labeled with variables
  • Leaf nodes are labeled with terminals or ε
  • Children of a node represent the right-hand side of a production
Example

Parse Tree for "aabb" using S → aSb | ab

S a S b a b

Parse tree for "aabb": S → aSb → aabb

Reading leaves left to right: a - a - b - b = "aabb" ✓

Key Properties

  • Different derivations (leftmost vs rightmost) of the same string produce the same parse tree
  • A parse tree uniquely represents the structure of derivation
  • The yield of a parse tree (reading leaves left to right) gives the derived string

Ambiguity

Definition: Ambiguous Grammar

A grammar G is ambiguous if there exists at least one string w ∈ L(G) that has more than one:

  • Leftmost derivation, OR
  • Rightmost derivation, OR
  • Parse tree

All three conditions are equivalent.

Exam Strategy

To prove a grammar is ambiguous:

  1. Find a specific string in the language
  2. Show TWO DIFFERENT parse trees for that string
  3. Or show two different leftmost derivations
Classic Example

Prove that S → aS | aSbS | ε is ambiguous

String: "aab"

Leftmost Derivation 1:
S ⇒ aSbS ⇒ aaSbS ⇒ aabS ⇒ aab
Leftmost Derivation 2:
S ⇒ aS ⇒ aaSbS ⇒ aabS ⇒ aab

Parse Tree 1:

S a S b S aS ε

Tree 1: S → aSbS first

Parse Tree 2:

S a S a ε b ε

Tree 2: S → aS first, then S → aSbS

Two different parse trees for the same string → Grammar is AMBIGUOUS

Simplification of CFG

Before converting to normal forms, we need to simplify the grammar by removing:

Remove Useless Symbols

A symbol is useless if it's either:

  • Non-generating: Cannot derive any terminal string
  • Non-reachable: Cannot be reached from the start symbol

Order matters: First remove non-generating, then non-reachable.

Remove ε-Productions (Null Productions)

Productions of the form A → ε (except if S → ε and S doesn't appear on RHS)

Method:

  1. Find all nullable variables (can derive ε)
  2. For each production with nullable variables, add all combinations with/without them
  3. Remove all ε-productions

Remove Unit Productions

Productions of the form A → B (single variable)

Method:

  1. Find all pairs (A, B) where A ⇒* B
  2. For each non-unit production B → α, add A → α
  3. Remove all unit productions
Example

Simplify: S → aAa, A → bBb, B → cCc, C → dDd, D → eF

Step 1: F is non-generating (no productions for F).

Step 2: Remove D → eF, making D non-generating.

Step 3: Remove C → dDd, making C non-generating.

Continue... All symbols become useless!

Result: Empty grammar (L = ∅)

Chomsky Normal Form (CNF)

Definition: Chomsky Normal Form

A CFG is in Chomsky Normal Form if every production is of one of these forms:

  • A → BC (two variables)
  • A → a (single terminal)
  • S → ε (only if ε ∈ L and S doesn't appear on RHS)

Exam Importance

CNF conversion is one of the most frequently asked questions (appears in Dec 2022, Dec 2023, Dec 2024, May 2025). Master this algorithm!

CNF Conversion Algorithm

Step 1: Remove ε-Productions

Find nullable variables and expand productions.

Step 2: Remove Unit Productions

Replace A → B chains with direct productions.

Step 3: Remove Useless Symbols

Remove non-generating and non-reachable symbols.

Step 4: Handle Terminals in Long Productions

For productions like A → aBc, replace each terminal with a new variable:

  • Create Cₐ → a, Cc → c
  • Replace: A → CₐBCc

Step 5: Break Long Productions

For productions with more than 2 variables, break into binary chains:

  • A → BCDE becomes:
  • A → BX₁, X₁ → CX₂, X₂ → DE
Complete Example

Convert to CNF: S → ABA, A → aA | ε, B → bB | ε

Step 1: Find Nullable Variables

A → ε, so A is nullable

B → ε, so B is nullable

S → ABA, all nullable, so S is nullable

Nullable = {A, B, S}

Step 2: Remove ε-Productions

Original: S → ABA

Expand with nullable combinations:

  • S → ABA | AB | BA | AA | A | B | ε

A → aA | a (remove ε, keep aA and add a)

B → bB | b (remove ε, keep bB and add b)

Keep S → ε separately if needed

Step 3: Remove Unit Productions

S → A becomes S → aA | a

S → B becomes S → bB | b

Updated:

S → ABA | AB | BA | AA | aA | a | bB | b | ε A → aA | a B → bB | b
Step 4: Handle Terminals

For A → aA: Create Cₐ → a, then A → CₐA

For B → bB: Create Cᵦ → b, then B → CᵦB

Step 5: Break Long Productions

S → ABA becomes S → AX₁, X₁ → BA

Final CNF:
S → AX₁ | AB | BA | AA | CₐA | a | CᵦB | b S → ε (if ε ∈ L) X₁ → BA A → CₐA | a B → CᵦB | b Cₐ → a Cᵦ → b

Greibach Normal Form (GNF)

Definition: Greibach Normal Form

A CFG is in Greibach Normal Form if every production is of the form:

  • A → aα where a is a terminal and α is a string of variables (possibly empty)
  • S → ε (only if ε ∈ L)

Every production starts with exactly one terminal!

GNF Conversion Steps

Start from CNF

First convert grammar to CNF.

Order Variables

Assign order: A₁, A₂, ..., Aₙ with S = A₁

Eliminate Left Recursion

For direct left recursion A → Aα | β, replace with:

  • A → βZ
  • Z → αZ | α

Substitute to Get Terminal First

Replace leading variables with their productions until each starts with a terminal.

Example

Convert to GNF: S → XY, X → 0X | 1Y | 1, Y → 1

Step 1: Already simplified.

Step 2: S → XY starts with variable X, substitute:

S → 0XY | 1YY | 1Y X → 0X | 1Y | 1 Y → 1

Result is in GNF! Each production starts with a terminal.

Chomsky Hierarchy

The Four Types of Grammars

Noam Chomsky classified grammars into four types based on the restrictions on their productions.

Type 0: Recursively Enumerable Type 1: Context-Sensitive Type 2: Context-Free Type 3: Regular

Chomsky Hierarchy: Each type is a proper subset of the outer types

Detailed Comparison

Type Grammar Name Production Form Automaton Example Language
Type 3 Regular Grammar A → aB, A → a, A → ε Finite Automaton (DFA/NFA) a*b*, (ab)*
Type 2 Context-Free Grammar A → α (any string) Push Down Automaton (PDA) aⁿbⁿ, palindromes
Type 1 Context-Sensitive Grammar αAβ → αγβ (|αAβ| ≤ |αγβ|) Linear Bounded Automaton aⁿbⁿcⁿ
Type 0 Unrestricted Grammar α → β (no restrictions) Turing Machine All computable languages

Key Points for Exam

  • Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0 (proper subsets)
  • Every Regular Language is Context-Free, but NOT vice versa
  • aⁿbⁿ is Context-Free but NOT Regular
  • aⁿbⁿcⁿ is Context-Sensitive but NOT Context-Free

Practice Questions

5 Mark Questions

What is Ambiguous Grammar? Explain with example. 5 Marks

Definition: A grammar is ambiguous if there exists at least one string in the language that has more than one:

  • Leftmost derivation
  • Rightmost derivation
  • Parse tree

Example: Consider G: S → a | Sa | bSS | SSb | SbS

For string "baa":

Derivation 1: S ⇒ bSS ⇒ baS ⇒ baa

Derivation 2: S ⇒ bSS ⇒ bSa ⇒ baa

Two different leftmost derivations → Grammar is ambiguous.

Implications:

  • Parsing becomes non-deterministic
  • Multiple interpretations of the same program
  • Some languages are inherently ambiguous (no unambiguous grammar exists)
Explain Chomsky Hierarchy with diagram 5 Marks

Chomsky Hierarchy classifies formal grammars into four types:

Type Name Production Restriction Automaton
3 Regular A → aB or A → a FA
2 Context-Free A → γ (single variable LHS) PDA
1 Context-Sensitive αAβ → αγβ, |LHS| ≤ |RHS| LBA
0 Unrestricted No restriction TM

Diagram: Concentric circles with Type 3 (Regular) innermost and Type 0 (Unrestricted) outermost.

Relationship: Type 3 ⊂ Type 2 ⊂ Type 1 ⊂ Type 0

Explain GNF with example 5 Marks

Greibach Normal Form (GNF):

A CFG is in GNF if every production has the form:

  • A → aα where a is terminal, α is string of variables
  • S → ε (if ε ∈ L)

Properties:

  • Each production starts with exactly one terminal
  • Useful for building top-down parsers
  • Derivation of string of length n takes exactly n steps

Example:

Given: S → AB, A → aA | a, B → bB | b Already in GNF for A and B. For S → AB, substitute A: S → aAB | aB Final GNF: S → aAB | aB A → aA | a B → bB | b

10 Mark Questions

Convert to CNF: S → aSa | bSb | A | ε, A → a | b | ε 10 Marks

Step 1: Find Nullable Variables

  • A → ε, so A is nullable
  • S → A and A is nullable, so S is nullable
  • S → ε confirms S is nullable

Nullable = {S, A}

Step 2: Remove ε-Productions

S → aSa | aa | bSb | bb | A | a | b A → a | b

(Keep S → ε as S₀ → S | ε if needed)

Step 3: Remove Unit Productions

S → A becomes S → a | b (substituting A's productions)

S → aSa | aa | bSb | bb | a | b A → a | b

Step 4: Handle Terminals in Long Productions

Create: Cₐ → a, Cᵦ → b

S → CₐSCₐ | CₐCₐ | CᵦSCᵦ | CᵦCᵦ | a | b Cₐ → a Cᵦ → b

Step 5: Break Long Productions

S → CₐX₁ | CₐCₐ | CᵦX₂ | CᵦCᵦ | a | b X₁ → SCₐ X₂ → SCᵦ Cₐ → a Cᵦ → b

Final CNF (with start ε if needed):

S₀ → S | ε S → CₐX₁ | CₐCₐ | CᵦX₂ | CᵦCᵦ | a | b X₁ → SCₐ X₂ → SCᵦ Cₐ → a Cᵦ → b
Find leftmost and rightmost derivations with parse trees for "aaabbabbba" using S → aB|bA, A → aS|bAA|a, B → bS|aBB|b 10 Marks

String to derive: "aaabbabbba"

Leftmost Derivation:

S ⇒ aB ⇒ aaBB ⇒ aaaBBB ⇒ aaabBB ⇒ aaabbSB ⇒ aaabbaBB ⇒ aaabbabB ⇒ aaabbabbS ⇒ aaabbabbbA ⇒ aaabbabbba

Rightmost Derivation:

S ⇒ aB ⇒ aaBB ⇒ aaBbS ⇒ aaBbbA ⇒ aaBbba ⇒ aaaBBbba ⇒ aaaBbbba ⇒ aaabSbbba ⇒ aaabbAbbba ⇒ aaabbabbba

Parse Tree Structure:

  • Root: S
  • S → aB (a + B subtree)
  • B → aBB (a + B + B)
  • Continue expanding...

Key Productions Used:

  • S → aB (3 times via recursion)
  • B → aBB, B → bS, B → b
  • A → a (for the final 'a')
Design CFG for L = {0ⁿ1²ⁿ | n ≥ 0} 5 Marks

Analysis:

  • For each 0, we need exactly two 1s
  • 0s come first, then 1s
  • n = 0 gives ε (empty string)

Grammar:

S → 0S11 | ε

Derivations:

  • n=0: S ⇒ ε ✓
  • n=1: S ⇒ 0S11 ⇒ 011 ✓
  • n=2: S ⇒ 0S11 ⇒ 00S1111 ⇒ 001111 ✓
  • n=3: S ⇒ 0S11 ⇒ 00S1111 ⇒ 000S111111 ⇒ 000111111 ✓

Alternative (for n ≥ 1):

S → 0S11 | 011