Introduction & Regular Languages

Introduction to Automata Theory

What is Automata Theory?

Automata Theory is the study of abstract machines (automata) and the computational problems they can solve. It forms the theoretical foundation of computer science, helping us understand what can and cannot be computed.

The word "automaton" (plural: automata) comes from Greek, meaning "self-acting." An automaton is an abstract self-propelled computing device that follows a predetermined sequence of operations automatically.

Why Study Automata Theory?

  • Compiler Design: Lexical analyzers use finite automata to recognize tokens
  • Text Processing: Regular expressions power search and replace operations
  • Network Protocols: State machines model communication protocols
  • Hardware Design: Digital circuits are modeled as finite state machines
  • Understanding Limits: Helps us know what problems are unsolvable

The Hierarchy of Computational Power

As we progress through this course, we'll study machines with increasing power:

Finite AutomataPush Down AutomataTuring Machine

Each subsequent machine can recognize more complex languages than the previous one.

Alphabets & Symbols

Definition: Alphabet (Σ)

An alphabet is a finite, non-empty set of symbols. It is typically denoted by the Greek letter Σ (sigma).

Examples of Alphabets

Alphabet Notation Description
Binary Alphabet Σ = {0, 1} Used in digital systems
Unary Alphabet Σ = {1} Single symbol (used in TM arithmetic)
Two-letter Alphabet Σ = {a, b} Common in examples
English Lowercase Σ = {a, b, c, ..., z} 26 letters
Digits Σ = {0, 1, 2, ..., 9} Decimal digits

Definition: Symbol

A symbol (or character) is a single element of an alphabet. Each symbol is indivisible and atomic.

Example

Identifying Symbols

For Σ = {0, 1}:

  • 0 is a symbol
  • 1 is a symbol
  • 01 is NOT a symbol (it's a string of two symbols)

Strings & Languages

Definition: String

A string (or word) is a finite sequence of symbols from an alphabet. The length of a string w is denoted as |w| and represents the number of symbols in the string.

Important String Concepts

Empty String (ε or λ)

The empty string is a special string with no symbols. It is denoted by ε (epsilon) or λ (lambda).

  • Length: |ε| = 0
  • It is part of every Σ*
  • ε concatenated with any string w gives w: ε · w = w · ε = w

String Operations

Operation Notation Description Example
Concatenation w₁ · w₂ or w₁w₂ Join two strings end-to-end ab · cd = abcd
Reversal wR Reverse the string (abc)R = cba
Length |w| Count of symbols |abba| = 4
Power wn Concatenate w with itself n times (ab)3 = ababab

Definition: Σ* (Kleene Star of Alphabet)

Σ* represents the set of ALL possible strings (including ε) that can be formed using symbols from Σ.

Example

Σ* for Binary Alphabet

If Σ = {0, 1}, then:

Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, ...}

This is an infinite set containing all binary strings of all lengths.

Definition: Σ⁺ (Positive Closure)

Σ⁺ = Σ* - {ε}. It contains all strings of length 1 or more (excludes empty string).

Σ⁺ = Σ · Σ* = Σ* · Σ

Definition: Language

A language L over alphabet Σ is any subset of Σ*.

L ⊆ Σ*

Examples

Different Languages over Σ = {a, b}

  • L₁ = ∅ (Empty language - contains no strings)
  • L₂ = {ε} (Language containing only empty string)
  • L₃ = {a, ab, abb} (Finite language with 3 strings)
  • L₄ = {aⁿbⁿ | n ≥ 0} (Infinite language)

Common Confusion

∅ ≠ {ε}

  • ∅ is the empty set (has 0 elements)
  • {ε} is a set containing the empty string (has 1 element)

Regular Expressions

Definition: Regular Expression (RE)

A Regular Expression is a notation for describing a regular language. It uses symbols from the alphabet combined with special operators to define patterns.

RE Building Blocks

Expression Language Denoted Description
{} Empty language
ε {ε} Language with only empty string
a (any symbol) {a} Language with single symbol

RE Operators (in order of precedence)

Kleene Star (*) - Highest Precedence

R* = Zero or more repetitions of R

L(R*) = {ε} ∪ L(R) ∪ L(RR) ∪ L(RRR) ∪ ...

a* = {ε, a, aa, aaa, aaaa, ...}

(ab)* = {ε, ab, abab, ababab, ...}

Concatenation (·) - Medium Precedence

R₁R₂ = R₁ followed by R₂

L(R₁R₂) = {xy | x ∈ L(R₁) and y ∈ L(R₂)}

ab = {ab}

a*b = {b, ab, aab, aaab, ...}

Union (+) - Lowest Precedence

R₁ + R₂ = Either R₁ or R₂

L(R₁ + R₂) = L(R₁) ∪ L(R₂)

a + b = {a, b}

a + b* = {a, ε, b, bb, bbb, ...}

Additional RE Notation

Notation Meaning Equivalent
R⁺ One or more of R RR* = R*R
R? Zero or one of R R + ε
Rⁿ Exactly n occurrences RRR...R (n times)

RE Identity Rules

ε + R* = R* (ε is already in R*)

∅R = R∅ = ∅ (concatenation with empty language)

εR = Rε = R (ε is identity for concatenation)

∅ + R = R + ∅ = R (∅ is identity for union)

R + R = R (idempotent law)

(R*)* = R* (star is idempotent)

ε* = ε

∅* = ε

Exam-Style Example

Write RE for: All binary strings containing "011"

Answer: (0+1)*011(0+1)*

Explanation:

  • (0+1)* - any number of 0s and 1s before "011"
  • 011 - the required substring
  • (0+1)* - any number of 0s and 1s after "011"
Exam-Style Example

Write RE for: Strings over {0,1} with odd number of 1s

Answer: 0*(10*10*)*10*

Alternative: (0*10*)(0*10*0*10*)*

Explanation: We need at least one 1 (making it odd), then pairs of 1s can be added (2, 4, 6... keeping total odd).

Exam-Style Example

Write RE for: Strings with no consecutive 1s

Answer: (0 + 10)*(1 + ε)

Or equivalently: (ε + 1)(0 + 01)*

Regular Languages

Definition: Regular Language

A language is called regular if it can be:

  1. Described by a Regular Expression, OR
  2. Recognized by a Finite Automaton (DFA or NFA), OR
  3. Generated by a Regular Grammar (Type 3)

All three representations are equivalent in expressive power.

Characterization of Regular Languages

Regular Expressions Finite Automata Regular Grammars

The three equivalent representations of Regular Languages

Examples of Regular Languages

  • All strings ending with "01" over {0, 1}
  • All strings with even number of 'a's over {a, b}
  • All binary strings divisible by 3
  • Set of valid identifiers (start with letter, followed by letters/digits)
  • Keywords in a programming language

Examples of NON-Regular Languages

  • L = {aⁿbⁿ | n ≥ 0} - requires counting
  • L = {ww | w ∈ Σ*} - requires memory of first half
  • Balanced parentheses
  • Palindromes of arbitrary length

Pumping Lemma (for proving non-regularity)

If L is regular, then there exists a pumping length p such that every string s in L with |s| ≥ p can be split into s = xyz where:

  1. |xy| ≤ p
  2. |y| > 0
  3. For all i ≥ 0, xyⁱz ∈ L

To prove L is NOT regular, find a string where pumping fails.

Regular Grammars

Definition: Grammar

A Grammar G is a 4-tuple: G = (V, T, P, S) where:

  • V = Set of variables (non-terminals)
  • T = Set of terminals (alphabet Σ)
  • P = Set of production rules
  • S = Start symbol (S ∈ V)

Definition: Regular Grammar (Type 3)

A grammar is regular (Type 3) if all productions have the form:

  • A → aB (Right Linear) or A → Ba (Left Linear)
  • A → a
  • A → ε (only if A is start symbol and doesn't appear on RHS)

where A, B ∈ V and a ∈ T.

Types of Regular Grammars

Right Linear Grammar (RLG)

Productions of the form: A → wB or A → w

where w ∈ T* and A, B ∈ V

Example: Grammar for a*b

S → aS | b

This generates: b, ab, aab, aaab, ...

Left Linear Grammar (LLG)

Productions of the form: A → Bw or A → w

where w ∈ T* and A, B ∈ V

Example: Grammar for a*b

S → Sa | b

This generates: b, ab, aab, aaab, ...

Important Rule

A grammar cannot mix left linear and right linear productions. It must be entirely one type to be regular!

A → aB and A → Ba in the same grammar is NOT regular (it's Type 2).

Exam-Style Example

Construct Right Linear Grammar for: RE = 00*(01+0)*

Solution:

S → 0A A → 0A | B | ε B → 0C | 0 | B C → 1B

Simplified:

S → 0A A → 0A | 0B | 0 | ε B → 1A | 0B | 0 | ε

Right Linear & Left Linear Grammars

Comparison Table

Aspect Right Linear (RLG) Left Linear (LLG)
Production Form A → aB, A → a, A → ε A → Ba, A → a, A → ε
Non-terminal Position Right end Left end
Derivation Direction Left to right Right to left
FA Correspondence Matches FA transition direction Reverse FA transitions

Converting RLG to LLG

Step 1: Construct the FA

Build a Finite Automaton from the Right Linear Grammar.

Step 2: Reverse the FA

Reverse all transitions, swap initial and final states.

Step 3: Convert to LLG

Write the Left Linear Grammar from the reversed FA.

Example

Convert RLG to LLG

Given RLG:

S → aA | bB A → aA | a B → bB | b

Language: Strings of only a's or only b's (a⁺ + b⁺)

Equivalent LLG:

S → Aa | Bb | a | b A → Aa | a B → Bb | b

Closure Properties of Regular Languages

What are Closure Properties?

A class of languages is closed under an operation if applying that operation to languages in the class always produces a language that is also in the class.

Regular languages are closed under many operations!

Main Closure Properties

1. Union

If L₁ and L₂ are regular, then L₁ ∪ L₂ is regular.

Proof Sketch: If R₁ and R₂ are REs for L₁ and L₂, then R₁ + R₂ is RE for L₁ ∪ L₂.

2. Concatenation

If L₁ and L₂ are regular, then L₁ · L₂ is regular.

Proof Sketch: If R₁ and R₂ are REs, then R₁R₂ is RE for concatenation.

3. Kleene Star

If L is regular, then L* is regular.

Proof Sketch: If R is RE for L, then R* is RE for L*.

4. Complement

If L is regular over Σ, then Σ* - L (complement of L) is regular.

Proof Sketch: In the DFA for L, swap accepting and non-accepting states.

5. Intersection

If L₁ and L₂ are regular, then L₁ ∩ L₂ is regular.

Proof: L₁ ∩ L₂ = complement(complement(L₁) ∪ complement(L₂)) (De Morgan's Law)

Or use the product construction of DFAs.

6. Difference

If L₁ and L₂ are regular, then L₁ - L₂ is regular.

Proof: L₁ - L₂ = L₁ ∩ complement(L₂)

7. Reversal

If L is regular, then LR = {wR | w ∈ L} is regular.

Proof Sketch: Reverse all transitions in FA, swap initial and final states.

8. Homomorphism

If L is regular and h is a homomorphism, then h(L) is regular.

Summary Table

Operation Regular Languages Method of Proof
Union (∪) Closed RE: R₁ + R₂
Intersection (∩) Closed Product DFA construction
Complement Closed Swap accept/reject states in DFA
Concatenation (·) Closed RE: R₁R₂
Kleene Star (*) Closed RE: R*
Difference (-) Closed L₁ ∩ L₂'
Reversal Closed Reverse FA

Exam Tip

When asked to "List 5 closure properties of Regular Languages", list: Union, Intersection, Complement, Concatenation, Kleene Star. These are the most frequently tested ones.

Practice Questions

Based on Mumbai University exam patterns (2022-2025):

2-3 Mark Questions (Definitions)

Define: Alphabet, String, and Language 2-3 Marks

Alphabet (Σ): A finite, non-empty set of symbols.

Example: Σ = {0, 1} is the binary alphabet.

String: A finite sequence of symbols from an alphabet. The empty string is denoted by ε.

Example: "abba" is a string over Σ = {a, b} with length 4.

Language: Any subset of Σ* (set of all possible strings over alphabet Σ).

Example: L = {aⁿbⁿ | n ≥ 0} is a language over {a, b}.

Define Regular Expression with basic operators 3 Marks

Regular Expression: A notation for describing regular languages using:

  • Basic elements: ∅ (empty language), ε (empty string), a (single symbol)
  • Union (+): R₁ + R₂ means either R₁ or R₂
  • Concatenation (·): R₁R₂ means R₁ followed by R₂
  • Kleene Star (*): R* means zero or more repetitions of R

Precedence: * > Concatenation > Union

Example: (a+b)*abb represents all strings over {a,b} ending with "abb".

What is the difference between Σ* and Σ⁺? 2 Marks
Σ* (Kleene Star) Σ⁺ (Positive Closure)
All strings including empty string All strings excluding empty string
Σ* = {ε, a, b, aa, ab, ...} Σ⁺ = {a, b, aa, ab, ...}
Σ* = Σ⁺ ∪ {ε} Σ⁺ = Σ* - {ε}

5 Mark Questions

Explain any 5 closure properties of Regular Languages 5 Marks

Regular languages are closed under the following operations:

1. Union: If L₁ and L₂ are regular, then L₁ ∪ L₂ is regular.

Proof: If R₁ and R₂ are REs, then R₁ + R₂ is RE for the union.

2. Concatenation: If L₁ and L₂ are regular, then L₁ · L₂ is regular.

Proof: RE R₁R₂ describes the concatenation.

3. Kleene Star: If L is regular, then L* is regular.

Proof: RE R* describes L*.

4. Complement: If L is regular, then L' = Σ* - L is regular.

Proof: In DFA for L, swap accepting and non-accepting states.

5. Intersection: If L₁ and L₂ are regular, then L₁ ∩ L₂ is regular.

Proof: Use De Morgan's Law: L₁ ∩ L₂ = (L₁' ∪ L₂')', or construct product DFA.

Write Regular Expressions for the following languages over Σ = {0, 1} 5 Marks

(i) Strings with odd number of 1s:

0*(10*10*)*10*

Explanation: Start with any 0s, then one 1 (making it odd), then pairs of 1s (staying odd).

(ii) Strings having substring "10" or "11":

(0+1)*(10+11)(0+1)*

Explanation: Any prefix, then 10 or 11, then any suffix.

(iii) Strings with no consecutive 1s:

(0+10)*(1+ε)

Explanation: Either 0s or "10" blocks, optionally ending with single 1.

(iv) Strings containing sequence "011":

(0+1)*011(0+1)*

(v) Strings whose length is a multiple of 3:

((0+1)(0+1)(0+1))* or ((0+1)³)*

Describe the language for given Regular Expressions 5 Marks

(i) 1(0+1)*0:

All binary strings that start with 1 and end with 0.

Examples: 10, 100, 110, 1010, 1110, ...

(ii) (aa)*(bb)*(b):

Strings with even number of a's followed by odd number of b's.

Examples: b, aab, aabbb, aaaab, ...

(iii) (ab+ba)*:

Strings made by concatenating "ab" or "ba" any number of times (including empty string).

Examples: ε, ab, ba, abab, abba, baba, ...

(iv) (A-Z)(a-z)*(a+e+i+o+u):

Strings starting with uppercase letter, followed by any lowercase letters, ending with a vowel.

Examples: Aa, He, Hello, Indonesia, ...

Explain Pumping Lemma for Regular Languages 5 Marks

Statement:

If L is a regular language, then there exists a constant p (pumping length) such that for every string s ∈ L with |s| ≥ p, s can be written as s = xyz where:

  1. |xy| ≤ p
  2. |y| > 0 (y is not empty)
  3. For all i ≥ 0, xyⁱz ∈ L

Usage:

Used to prove a language is NOT regular by contradiction:

  1. Assume L is regular
  2. Choose a string s ∈ L with |s| ≥ p
  3. For ALL ways to split s = xyz (satisfying conditions 1 & 2)
  4. Show that some xyⁱz ∉ L
  5. Contradiction → L is not regular

Example: Prove L = {aⁿbⁿ | n ≥ 0} is not regular.

Choose s = aᵖbᵖ. Since |xy| ≤ p, y consists only of a's.

Pumping: xy²z = aᵖ⁺|ʸ|bᵖ ∉ L (unequal a's and b's). Contradiction!

10 Mark Questions

Construct Right Linear Grammar for RE: 00*(01+0)* 10 Marks

Step 1: Analyze the RE

  • Must start with 0
  • Followed by zero or more 0s
  • Then zero or more (01 or 0)

Step 2: Break into components

  • S generates initial "0"
  • A handles "0*"
  • B handles "(01+0)*"

Step 3: Write the grammar

G = (V, T, P, S) V = {S, A, B, C} T = {0, 1} S = S P: S → 0A A → 0A | B | ε B → 0C | 0B | ε C → 1B

Step 4: Simplify (combine B and C)

P: S → 0A A → 0A | 0B | 0 | ε B → 1A | 0B | 0 | ε

Step 5: Verify with examples

  • "0" → S ⇒ 0A ⇒ 0
  • "00" → S ⇒ 0A ⇒ 00A ⇒ 00
  • "001" → S ⇒ 0A ⇒ 00A ⇒ 00B ⇒ 001A ⇒ 001
Convert Right Linear Grammar to Left Linear Grammar 10 Marks

Given RLG:

S → aA | bB A → aA | a B → bB | b

Step 1: Construct FA from RLG

States: S (start), A, B, F (final)

Transitions:

  • S --a--> A, S --b--> B
  • A --a--> A, A --a--> F
  • B --b--> B, B --b--> F

Step 2: Reverse the FA

New start: F, New final: S

Reversed transitions:

  • A --a--> S, B --b--> S
  • A --a--> A, F --a--> A
  • B --b--> B, F --b--> B

Step 3: Write LLG from reversed FA

S' → Aa | Bb | a | b A → Aa | a B → Bb | b

Language: a⁺ + b⁺ (one or more a's, OR one or more b's)