Introduction & Regular Languages

Introduction to Automata Theory

What is Automata Theory?

Automata Theory is the study of abstract machines (automata) and the computational problems they can solve. It forms the theoretical foundation of computer science, helping us understand what can and cannot be computed.

The word "automaton" (plural: automata) comes from Greek, meaning "self-acting." An automaton is an abstract self-propelled computing device that follows a predetermined sequence of operations automatically.

Why Study Automata Theory?

Compiler Design: Lexical analyzers use finite automata to recognize tokens
Text Processing: Regular expressions power search and replace operations
Network Protocols: State machines model communication protocols
Hardware Design: Digital circuits are modeled as finite state machines
Understanding Limits: Helps us know what problems are unsolvable

The Hierarchy of Computational Power

As we progress through this course, we'll study machines with increasing power:

Finite Automata → Push Down Automata → Turing Machine

Each subsequent machine can recognize more complex languages than the previous one.

Alphabets & Symbols

Definition: Alphabet (Σ)

An alphabet is a finite, non-empty set of symbols. It is typically denoted by the Greek letter Σ (sigma).

Examples of Alphabets

Alphabet	Notation	Description
Binary Alphabet	Σ = {0, 1}	Used in digital systems
Unary Alphabet	Σ = {1}	Single symbol (used in TM arithmetic)
Two-letter Alphabet	Σ = {a, b}	Common in examples
English Lowercase	Σ = {a, b, c, ..., z}	26 letters
Digits	Σ = {0, 1, 2, ..., 9}	Decimal digits

Definition: Symbol

A symbol (or character) is a single element of an alphabet. Each symbol is indivisible and atomic.

Example

Identifying Symbols

For Σ = {0, 1}:

0 is a symbol
1 is a symbol
01 is NOT a symbol (it's a string of two symbols)

Strings & Languages

Definition: String

A string (or word) is a finite sequence of symbols from an alphabet. The length of a string w is denoted as |w| and represents the number of symbols in the string.

Important String Concepts

Empty String (ε or λ)

The empty string is a special string with no symbols. It is denoted by ε (epsilon) or λ (lambda).

Length: |ε| = 0
It is part of every Σ*
ε concatenated with any string w gives w: ε · w = w · ε = w

String Operations

Operation	Notation	Description	Example
Concatenation	w₁ · w₂ or w₁w₂	Join two strings end-to-end	ab · cd = abcd
Reversal	w^R	Reverse the string	(abc)^R = cba
Length	\|w\|	Count of symbols	\|abba\| = 4
Power	wⁿ	Concatenate w with itself n times	(ab)³ = ababab

Definition: Σ* (Kleene Star of Alphabet)

Σ* represents the set of ALL possible strings (including ε) that can be formed using symbols from Σ.

Example

Σ* for Binary Alphabet

If Σ = {0, 1}, then:

Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, ...}

This is an infinite set containing all binary strings of all lengths.

Definition: Σ⁺ (Positive Closure)

Σ⁺ = Σ* - {ε}. It contains all strings of length 1 or more (excludes empty string).

Σ⁺ = Σ · Σ* = Σ* · Σ

Definition: Language

A language L over alphabet Σ is any subset of Σ*.

L ⊆ Σ*

Examples

Different Languages over Σ = {a, b}

L₁ = ∅ (Empty language - contains no strings)
L₂ = {ε} (Language containing only empty string)
L₃ = {a, ab, abb} (Finite language with 3 strings)
L₄ = {aⁿbⁿ | n ≥ 0} (Infinite language)

Common Confusion

∅ ≠ {ε}

∅ is the empty set (has 0 elements)
{ε} is a set containing the empty string (has 1 element)

Regular Expressions

Definition: Regular Expression (RE)

A Regular Expression is a notation for describing a regular language. It uses symbols from the alphabet combined with special operators to define patterns.

RE Building Blocks

Expression	Language Denoted	Description
∅	{}	Empty language
ε	{ε}	Language with only empty string
a (any symbol)	{a}	Language with single symbol

RE Operators (in order of precedence)

Kleene Star (*) - Highest Precedence

R* = Zero or more repetitions of R

L(R*) = {ε} ∪ L(R) ∪ L(RR) ∪ L(RRR) ∪ ...

a* = {ε, a, aa, aaa, aaaa, ...}

(ab)* = {ε, ab, abab, ababab, ...}

Concatenation (·) - Medium Precedence

R₁R₂ = R₁ followed by R₂

L(R₁R₂) = {xy | x ∈ L(R₁) and y ∈ L(R₂)}

ab = {ab}

a*b = {b, ab, aab, aaab, ...}

Union (+) - Lowest Precedence

R₁ + R₂ = Either R₁ or R₂

L(R₁ + R₂) = L(R₁) ∪ L(R₂)

a + b = {a, b}

a + b* = {a, ε, b, bb, bbb, ...}

Additional RE Notation

Notation	Meaning	Equivalent
R⁺	One or more of R	RR* = R*R
R?	Zero or one of R	R + ε
Rⁿ	Exactly n occurrences	RRR...R (n times)

RE Identity Rules

ε + R* = R* (ε is already in R*)

∅R = R∅ = ∅ (concatenation with empty language)

εR = Rε = R (ε is identity for concatenation)

∅ + R = R + ∅ = R (∅ is identity for union)

R + R = R (idempotent law)

(R*)* = R* (star is idempotent)

ε* = ε

∅* = ε

Exam-Style Example

Write RE for: All binary strings containing "011"

Answer: (0+1)*011(0+1)*

Explanation:

(0+1)* - any number of 0s and 1s before "011"
011 - the required substring
(0+1)* - any number of 0s and 1s after "011"

Exam-Style Example

Write RE for: Strings over {0,1} with odd number of 1s

Answer: 0*(10*10*)*10*

Alternative: (0*10*)(0*10*0*10*)*

Explanation: We need at least one 1 (making it odd), then pairs of 1s can be added (2, 4, 6... keeping total odd).

Exam-Style Example

Write RE for: Strings with no consecutive 1s

Answer: (0 + 10)*(1 + ε)

Or equivalently: (ε + 1)(0 + 01)*

Regular Languages

Definition: Regular Language

A language is called regular if it can be:

Described by a Regular Expression, OR
Recognized by a Finite Automaton (DFA or NFA), OR
Generated by a Regular Grammar (Type 3)

All three representations are equivalent in expressive power.

Characterization of Regular Languages

The three equivalent representations of Regular Languages

Examples of Regular Languages

All strings ending with "01" over {0, 1}
All strings with even number of 'a's over {a, b}
All binary strings divisible by 3
Set of valid identifiers (start with letter, followed by letters/digits)
Keywords in a programming language

Examples of NON-Regular Languages

L = {aⁿbⁿ | n ≥ 0} - requires counting
L = {ww | w ∈ Σ*} - requires memory of first half
Balanced parentheses
Palindromes of arbitrary length

Pumping Lemma (for proving non-regularity)

If L is regular, then there exists a pumping length p such that every string s in L with |s| ≥ p can be split into s = xyz where:

|xy| ≤ p
|y| > 0
For all i ≥ 0, xyⁱz ∈ L

To prove L is NOT regular, find a string where pumping fails.

Regular Grammars

Definition: Grammar

A Grammar G is a 4-tuple: G = (V, T, P, S) where:

V = Set of variables (non-terminals)
T = Set of terminals (alphabet Σ)
P = Set of production rules
S = Start symbol (S ∈ V)

Definition: Regular Grammar (Type 3)

A grammar is regular (Type 3) if all productions have the form:

A → aB (Right Linear) or A → Ba (Left Linear)
A → a
A → ε (only if A is start symbol and doesn't appear on RHS)

where A, B ∈ V and a ∈ T.

Types of Regular Grammars

Right Linear Grammar (RLG)

Productions of the form: A → wB or A → w

where w ∈ T* and A, B ∈ V

Example: Grammar for a*b

S → aS | b

This generates: b, ab, aab, aaab, ...

Left Linear Grammar (LLG)

Productions of the form: A → Bw or A → w

where w ∈ T* and A, B ∈ V

Example: Grammar for a*b

S → Sa | b

This generates: b, ab, aab, aaab, ...

Important Rule

A grammar cannot mix left linear and right linear productions. It must be entirely one type to be regular!

A → aB and A → Ba in the same grammar is NOT regular (it's Type 2).

Exam-Style Example

Construct Right Linear Grammar for: RE = 00(01+0)

Solution:

S → 0A
A → 0A | B | ε
B → 0C | 0 | B
C → 1B
                            

Simplified:

S → 0A
A → 0A | 0B | 0 | ε
B → 1A | 0B | 0 | ε
                            

Right Linear & Left Linear Grammars

Comparison Table

Aspect	Right Linear (RLG)	Left Linear (LLG)
Production Form	A → aB, A → a, A → ε	A → Ba, A → a, A → ε
Non-terminal Position	Right end	Left end
Derivation Direction	Left to right	Right to left
FA Correspondence	Matches FA transition direction	Reverse FA transitions

Converting RLG to LLG

Step 1: Construct the FA

Build a Finite Automaton from the Right Linear Grammar.

Step 2: Reverse the FA

Reverse all transitions, swap initial and final states.

Step 3: Convert to LLG

Write the Left Linear Grammar from the reversed FA.

Example

Convert RLG to LLG

Given RLG:

S → aA | bB
A → aA | a
B → bB | b
                            

Language: Strings of only a's or only b's (a⁺ + b⁺)

Equivalent LLG:

S → Aa | Bb | a | b
A → Aa | a
B → Bb | b
                            

Closure Properties of Regular Languages

What are Closure Properties?

A class of languages is closed under an operation if applying that operation to languages in the class always produces a language that is also in the class.

Regular languages are closed under many operations!

Main Closure Properties

1. Union

If L₁ and L₂ are regular, then L₁ ∪ L₂ is regular.

Proof Sketch: If R₁ and R₂ are REs for L₁ and L₂, then R₁ + R₂ is RE for L₁ ∪ L₂.

2. Concatenation

If L₁ and L₂ are regular, then L₁ · L₂ is regular.

Proof Sketch: If R₁ and R₂ are REs, then R₁R₂ is RE for concatenation.

3. Kleene Star

If L is regular, then L* is regular.

Proof Sketch: If R is RE for L, then R* is RE for L*.

4. Complement

If L is regular over Σ, then Σ* - L (complement of L) is regular.

Proof Sketch: In the DFA for L, swap accepting and non-accepting states.

5. Intersection

If L₁ and L₂ are regular, then L₁ ∩ L₂ is regular.

Proof: L₁ ∩ L₂ = complement(complement(L₁) ∪ complement(L₂)) (De Morgan's Law)

Or use the product construction of DFAs.

6. Difference

If L₁ and L₂ are regular, then L₁ - L₂ is regular.

Proof: L₁ - L₂ = L₁ ∩ complement(L₂)

7. Reversal

If L is regular, then L^R = {w^R | w ∈ L} is regular.

Proof Sketch: Reverse all transitions in FA, swap initial and final states.

8. Homomorphism

If L is regular and h is a homomorphism, then h(L) is regular.

Summary Table

Operation	Regular Languages	Method of Proof
Union (∪)	Closed	RE: R₁ + R₂
Intersection (∩)	Closed	Product DFA construction
Complement	Closed	Swap accept/reject states in DFA
Concatenation (·)	Closed	RE: R₁R₂
Kleene Star (*)	Closed	RE: R*
Difference (-)	Closed	L₁ ∩ L₂'
Reversal	Closed	Reverse FA

Exam Tip

When asked to "List 5 closure properties of Regular Languages", list: Union, Intersection, Complement, Concatenation, Kleene Star. These are the most frequently tested ones.

Practice Questions

Based on Mumbai University exam patterns (2022-2025):

2-3 Mark Questions (Definitions)

Define: Alphabet, String, and Language 2-3 Marks ▼

Alphabet (Σ): A finite, non-empty set of symbols.

Example: Σ = {0, 1} is the binary alphabet.

String: A finite sequence of symbols from an alphabet. The empty string is denoted by ε.

Example: "abba" is a string over Σ = {a, b} with length 4.

Language: Any subset of Σ* (set of all possible strings over alphabet Σ).

Example: L = {aⁿbⁿ | n ≥ 0} is a language over {a, b}.

Define Regular Expression with basic operators 3 Marks ▼

Regular Expression: A notation for describing regular languages using:

Basic elements: ∅ (empty language), ε (empty string), a (single symbol)
Union (+): R₁ + R₂ means either R₁ or R₂
Concatenation (·): R₁R₂ means R₁ followed by R₂
Kleene Star (*): R* means zero or more repetitions of R

Precedence: * > Concatenation > Union

Example: (a+b)*abb represents all strings over {a,b} ending with "abb".

What is the difference between Σ* and Σ⁺? 2 Marks ▼

Σ* (Kleene Star)	Σ⁺ (Positive Closure)
All strings including empty string	All strings excluding empty string
Σ* = {ε, a, b, aa, ab, ...}	Σ⁺ = {a, b, aa, ab, ...}
Σ* = Σ⁺ ∪ {ε}	Σ⁺ = Σ* - {ε}

5 Mark Questions

Explain any 5 closure properties of Regular Languages 5 Marks ▼

Regular languages are closed under the following operations:

1. Union: If L₁ and L₂ are regular, then L₁ ∪ L₂ is regular.

Proof: If R₁ and R₂ are REs, then R₁ + R₂ is RE for the union.

2. Concatenation: If L₁ and L₂ are regular, then L₁ · L₂ is regular.

Proof: RE R₁R₂ describes the concatenation.

3. Kleene Star: If L is regular, then L* is regular.

Proof: RE R* describes L*.

4. Complement: If L is regular, then L' = Σ* - L is regular.

Proof: In DFA for L, swap accepting and non-accepting states.

5. Intersection: If L₁ and L₂ are regular, then L₁ ∩ L₂ is regular.

Proof: Use De Morgan's Law: L₁ ∩ L₂ = (L₁' ∪ L₂')', or construct product DFA.

Write Regular Expressions for the following languages over Σ = {0, 1} 5 Marks ▼

(i) Strings with odd number of 1s:

0*(10*10*)*10*

Explanation: Start with any 0s, then one 1 (making it odd), then pairs of 1s (staying odd).

(ii) Strings having substring "10" or "11":

(0+1)*(10+11)(0+1)*

Explanation: Any prefix, then 10 or 11, then any suffix.

(iii) Strings with no consecutive 1s:

(0+10)*(1+ε)

Explanation: Either 0s or "10" blocks, optionally ending with single 1.

(iv) Strings containing sequence "011":

(0+1)*011(0+1)*

(v) Strings whose length is a multiple of 3:

((0+1)(0+1)(0+1))* or ((0+1)³)*

Describe the language for given Regular Expressions 5 Marks ▼

(i) 1(0+1)*0:

All binary strings that start with 1 and end with 0.

Examples: 10, 100, 110, 1010, 1110, ...

(ii) (aa)*(bb)*(b):

Strings with even number of a's followed by odd number of b's.

Examples: b, aab, aabbb, aaaab, ...

(iii) (ab+ba)*:

Strings made by concatenating "ab" or "ba" any number of times (including empty string).

Examples: ε, ab, ba, abab, abba, baba, ...

(iv) (A-Z)(a-z)*(a+e+i+o+u):

Strings starting with uppercase letter, followed by any lowercase letters, ending with a vowel.

Examples: Aa, He, Hello, Indonesia, ...

Explain Pumping Lemma for Regular Languages 5 Marks ▼

Statement:

If L is a regular language, then there exists a constant p (pumping length) such that for every string s ∈ L with |s| ≥ p, s can be written as s = xyz where:

|xy| ≤ p
|y| > 0 (y is not empty)
For all i ≥ 0, xyⁱz ∈ L

Usage:

Used to prove a language is NOT regular by contradiction:

Assume L is regular
Choose a string s ∈ L with |s| ≥ p
For ALL ways to split s = xyz (satisfying conditions 1 & 2)
Show that some xyⁱz ∉ L
Contradiction → L is not regular

Example: Prove L = {aⁿbⁿ | n ≥ 0} is not regular.

Choose s = aᵖbᵖ. Since |xy| ≤ p, y consists only of a's.

Pumping: xy²z = aᵖ⁺|ʸ|bᵖ ∉ L (unequal a's and b's). Contradiction!

10 Mark Questions

Construct Right Linear Grammar for RE: 00*(01+0)* 10 Marks ▼

Step 1: Analyze the RE

Must start with 0
Followed by zero or more 0s
Then zero or more (01 or 0)

Step 2: Break into components

S generates initial "0"
A handles "0*"
B handles "(01+0)*"

Step 3: Write the grammar

G = (V, T, P, S)
V = {S, A, B, C}
T = {0, 1}
S = S
P:
    S → 0A
    A → 0A | B | ε
    B → 0C | 0B | ε
    C → 1B
                                    

Step 4: Simplify (combine B and C)

P:
    S → 0A
    A → 0A | 0B | 0 | ε
    B → 1A | 0B | 0 | ε
                                    

Step 5: Verify with examples

"0" → S ⇒ 0A ⇒ 0
"00" → S ⇒ 0A ⇒ 00A ⇒ 00
"001" → S ⇒ 0A ⇒ 00A ⇒ 00B ⇒ 001A ⇒ 001

Convert Right Linear Grammar to Left Linear Grammar 10 Marks ▼

Given RLG:

S → aA | bB
A → aA | a
B → bB | b
                                    

Step 1: Construct FA from RLG

States: S (start), A, B, F (final)

Transitions:

S --a--> A, S --b--> B
A --a--> A, A --a--> F
B --b--> B, B --b--> F

Step 2: Reverse the FA

New start: F, New final: S

Reversed transitions:

A --a--> S, B --b--> S
A --a--> A, F --a--> A
B --b--> B, F --b--> B

Step 3: Write LLG from reversed FA

S' → Aa | Bb | a | b
A → Aa | a
B → Bb | b
                                    

Language: a⁺ + b⁺ (one or more a's, OR one or more b's)

Introduction & Regular Languages

Introduction to Automata Theory

What is Automata Theory?

Why Study Automata Theory?

The Hierarchy of Computational Power

Alphabets & Symbols

Definition: Alphabet (Σ)

Examples of Alphabets

Definition: Symbol

Identifying Symbols

Strings & Languages

Definition: String

Important String Concepts

Empty String (ε or λ)

String Operations

Definition: Σ* (Kleene Star of Alphabet)

Σ* for Binary Alphabet

Definition: Σ⁺ (Positive Closure)

Definition: Language

Different Languages over Σ = {a, b}

Common Confusion

Regular Expressions

Definition: Regular Expression (RE)

RE Building Blocks

RE Operators (in order of precedence)

Kleene Star (*) - Highest Precedence

Concatenation (·) - Medium Precedence

Union (+) - Lowest Precedence

Additional RE Notation

RE Identity Rules

Write RE for: All binary strings containing "011"

Write RE for: Strings over {0,1} with odd number of 1s

Write RE for: Strings with no consecutive 1s

Regular Languages

Definition: Regular Language

Characterization of Regular Languages

Examples of Regular Languages

Examples of NON-Regular Languages

Pumping Lemma (for proving non-regularity)

Regular Grammars

Definition: Grammar

Definition: Regular Grammar (Type 3)

Types of Regular Grammars

Right Linear Grammar (RLG)

Left Linear Grammar (LLG)

Important Rule

Construct Right Linear Grammar for: RE = 00*(01+0)*

Right Linear & Left Linear Grammars

Comparison Table

Converting RLG to LLG

Step 1: Construct the FA

Step 2: Reverse the FA

Step 3: Convert to LLG

Convert RLG to LLG

Closure Properties of Regular Languages

What are Closure Properties?

Main Closure Properties

1. Union

2. Concatenation

3. Kleene Star

4. Complement

5. Intersection

6. Difference

7. Reversal

8. Homomorphism

Summary Table

Exam Tip

Practice Questions

2-3 Mark Questions (Definitions)

5 Mark Questions

10 Mark Questions

Construct Right Linear Grammar for: RE = 00(01+0)