Introduction & Regular Languages
Introduction to Automata Theory
What is Automata Theory?
Automata Theory is the study of abstract machines (automata) and the computational problems they can solve. It forms the theoretical foundation of computer science, helping us understand what can and cannot be computed.
The word "automaton" (plural: automata) comes from Greek, meaning "self-acting." An automaton is an abstract self-propelled computing device that follows a predetermined sequence of operations automatically.
Why Study Automata Theory?
- Compiler Design: Lexical analyzers use finite automata to recognize tokens
- Text Processing: Regular expressions power search and replace operations
- Network Protocols: State machines model communication protocols
- Hardware Design: Digital circuits are modeled as finite state machines
- Understanding Limits: Helps us know what problems are unsolvable
The Hierarchy of Computational Power
As we progress through this course, we'll study machines with increasing power:
Finite Automata → Push Down Automata → Turing Machine
Each subsequent machine can recognize more complex languages than the previous one.
Alphabets & Symbols
Definition: Alphabet (Σ)
An alphabet is a finite, non-empty set of symbols. It is typically denoted by the Greek letter Σ (sigma).
Examples of Alphabets
| Alphabet | Notation | Description |
|---|---|---|
| Binary Alphabet | Σ = {0, 1} | Used in digital systems |
| Unary Alphabet | Σ = {1} | Single symbol (used in TM arithmetic) |
| Two-letter Alphabet | Σ = {a, b} | Common in examples |
| English Lowercase | Σ = {a, b, c, ..., z} | 26 letters |
| Digits | Σ = {0, 1, 2, ..., 9} | Decimal digits |
Definition: Symbol
A symbol (or character) is a single element of an alphabet. Each symbol is indivisible and atomic.
Identifying Symbols
For Σ = {0, 1}:
0is a symbol1is a symbol01is NOT a symbol (it's a string of two symbols)
Strings & Languages
Definition: String
A string (or word) is a finite sequence of symbols from an alphabet. The length of a string w is denoted as |w| and represents the number of symbols in the string.
Important String Concepts
Empty String (ε or λ)
The empty string is a special string with no symbols. It is denoted by ε (epsilon) or λ (lambda).
- Length: |ε| = 0
- It is part of every Σ*
- ε concatenated with any string w gives w: ε · w = w · ε = w
String Operations
| Operation | Notation | Description | Example |
|---|---|---|---|
| Concatenation | w₁ · w₂ or w₁w₂ | Join two strings end-to-end | ab · cd = abcd |
| Reversal | wR | Reverse the string | (abc)R = cba |
| Length | |w| | Count of symbols | |abba| = 4 |
| Power | wn | Concatenate w with itself n times | (ab)3 = ababab |
Definition: Σ* (Kleene Star of Alphabet)
Σ* represents the set of ALL possible strings (including ε) that can be formed using symbols from Σ.
Σ* for Binary Alphabet
If Σ = {0, 1}, then:
Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, ...}
This is an infinite set containing all binary strings of all lengths.
Definition: Σ⁺ (Positive Closure)
Σ⁺ = Σ* - {ε}. It contains all strings of length 1 or more (excludes empty string).
Σ⁺ = Σ · Σ* = Σ* · Σ
Definition: Language
A language L over alphabet Σ is any subset of Σ*.
L ⊆ Σ*
Different Languages over Σ = {a, b}
- L₁ = ∅ (Empty language - contains no strings)
- L₂ = {ε} (Language containing only empty string)
- L₃ = {a, ab, abb} (Finite language with 3 strings)
- L₄ = {aⁿbⁿ | n ≥ 0} (Infinite language)
Common Confusion
∅ ≠ {ε}
- ∅ is the empty set (has 0 elements)
- {ε} is a set containing the empty string (has 1 element)
Regular Expressions
Definition: Regular Expression (RE)
A Regular Expression is a notation for describing a regular language. It uses symbols from the alphabet combined with special operators to define patterns.
RE Building Blocks
| Expression | Language Denoted | Description |
|---|---|---|
| ∅ | {} | Empty language |
| ε | {ε} | Language with only empty string |
| a (any symbol) | {a} | Language with single symbol |
RE Operators (in order of precedence)
Kleene Star (*) - Highest Precedence
R* = Zero or more repetitions of R
L(R*) = {ε} ∪ L(R) ∪ L(RR) ∪ L(RRR) ∪ ...
a* = {ε, a, aa, aaa, aaaa, ...}
(ab)* = {ε, ab, abab, ababab, ...}
Concatenation (·) - Medium Precedence
R₁R₂ = R₁ followed by R₂
L(R₁R₂) = {xy | x ∈ L(R₁) and y ∈ L(R₂)}
ab = {ab}
a*b = {b, ab, aab, aaab, ...}
Union (+) - Lowest Precedence
R₁ + R₂ = Either R₁ or R₂
L(R₁ + R₂) = L(R₁) ∪ L(R₂)
a + b = {a, b}
a + b* = {a, ε, b, bb, bbb, ...}
Additional RE Notation
| Notation | Meaning | Equivalent |
|---|---|---|
| R⁺ | One or more of R | RR* = R*R |
| R? | Zero or one of R | R + ε |
| Rⁿ | Exactly n occurrences | RRR...R (n times) |
RE Identity Rules
ε + R* = R* (ε is already in R*)
∅R = R∅ = ∅ (concatenation with empty language)
εR = Rε = R (ε is identity for concatenation)
∅ + R = R + ∅ = R (∅ is identity for union)
R + R = R (idempotent law)
(R*)* = R* (star is idempotent)
ε* = ε
∅* = ε
Write RE for: All binary strings containing "011"
Answer: (0+1)*011(0+1)*
Explanation:
(0+1)*- any number of 0s and 1s before "011"011- the required substring(0+1)*- any number of 0s and 1s after "011"
Write RE for: Strings over {0,1} with odd number of 1s
Answer: 0*(10*10*)*10*
Alternative: (0*10*)(0*10*0*10*)*
Explanation: We need at least one 1 (making it odd), then pairs of 1s can be added (2, 4, 6... keeping total odd).
Write RE for: Strings with no consecutive 1s
Answer: (0 + 10)*(1 + ε)
Or equivalently: (ε + 1)(0 + 01)*
Regular Languages
Definition: Regular Language
A language is called regular if it can be:
- Described by a Regular Expression, OR
- Recognized by a Finite Automaton (DFA or NFA), OR
- Generated by a Regular Grammar (Type 3)
All three representations are equivalent in expressive power.
Characterization of Regular Languages
The three equivalent representations of Regular Languages
Examples of Regular Languages
- All strings ending with "01" over {0, 1}
- All strings with even number of 'a's over {a, b}
- All binary strings divisible by 3
- Set of valid identifiers (start with letter, followed by letters/digits)
- Keywords in a programming language
Examples of NON-Regular Languages
- L = {aⁿbⁿ | n ≥ 0} - requires counting
- L = {ww | w ∈ Σ*} - requires memory of first half
- Balanced parentheses
- Palindromes of arbitrary length
Pumping Lemma (for proving non-regularity)
If L is regular, then there exists a pumping length p such that every string s in L with |s| ≥ p can be split into s = xyz where:
- |xy| ≤ p
- |y| > 0
- For all i ≥ 0, xyⁱz ∈ L
To prove L is NOT regular, find a string where pumping fails.
Regular Grammars
Definition: Grammar
A Grammar G is a 4-tuple: G = (V, T, P, S) where:
- V = Set of variables (non-terminals)
- T = Set of terminals (alphabet Σ)
- P = Set of production rules
- S = Start symbol (S ∈ V)
Definition: Regular Grammar (Type 3)
A grammar is regular (Type 3) if all productions have the form:
- A → aB (Right Linear) or A → Ba (Left Linear)
- A → a
- A → ε (only if A is start symbol and doesn't appear on RHS)
where A, B ∈ V and a ∈ T.
Types of Regular Grammars
Right Linear Grammar (RLG)
Productions of the form: A → wB or A → w
where w ∈ T* and A, B ∈ V
Example: Grammar for a*b
S → aS | b
This generates: b, ab, aab, aaab, ...
Left Linear Grammar (LLG)
Productions of the form: A → Bw or A → w
where w ∈ T* and A, B ∈ V
Example: Grammar for a*b
S → Sa | b
This generates: b, ab, aab, aaab, ...
Important Rule
A grammar cannot mix left linear and right linear productions. It must be entirely one type to be regular!
A → aB and A → Ba in the same grammar is NOT regular (it's Type 2).
Construct Right Linear Grammar for: RE = 00*(01+0)*
Solution:
Simplified:
Right Linear & Left Linear Grammars
Comparison Table
| Aspect | Right Linear (RLG) | Left Linear (LLG) |
|---|---|---|
| Production Form | A → aB, A → a, A → ε | A → Ba, A → a, A → ε |
| Non-terminal Position | Right end | Left end |
| Derivation Direction | Left to right | Right to left |
| FA Correspondence | Matches FA transition direction | Reverse FA transitions |
Converting RLG to LLG
Step 1: Construct the FA
Build a Finite Automaton from the Right Linear Grammar.
Step 2: Reverse the FA
Reverse all transitions, swap initial and final states.
Step 3: Convert to LLG
Write the Left Linear Grammar from the reversed FA.
Convert RLG to LLG
Given RLG:
Language: Strings of only a's or only b's (a⁺ + b⁺)
Equivalent LLG:
Closure Properties of Regular Languages
What are Closure Properties?
A class of languages is closed under an operation if applying that operation to languages in the class always produces a language that is also in the class.
Regular languages are closed under many operations!
Main Closure Properties
1. Union
If L₁ and L₂ are regular, then L₁ ∪ L₂ is regular.
Proof Sketch: If R₁ and R₂ are REs for L₁ and L₂, then R₁ + R₂ is RE for L₁ ∪ L₂.
2. Concatenation
If L₁ and L₂ are regular, then L₁ · L₂ is regular.
Proof Sketch: If R₁ and R₂ are REs, then R₁R₂ is RE for concatenation.
3. Kleene Star
If L is regular, then L* is regular.
Proof Sketch: If R is RE for L, then R* is RE for L*.
4. Complement
If L is regular over Σ, then Σ* - L (complement of L) is regular.
Proof Sketch: In the DFA for L, swap accepting and non-accepting states.
5. Intersection
If L₁ and L₂ are regular, then L₁ ∩ L₂ is regular.
Proof: L₁ ∩ L₂ = complement(complement(L₁) ∪ complement(L₂)) (De Morgan's Law)
Or use the product construction of DFAs.
6. Difference
If L₁ and L₂ are regular, then L₁ - L₂ is regular.
Proof: L₁ - L₂ = L₁ ∩ complement(L₂)
7. Reversal
If L is regular, then LR = {wR | w ∈ L} is regular.
Proof Sketch: Reverse all transitions in FA, swap initial and final states.
8. Homomorphism
If L is regular and h is a homomorphism, then h(L) is regular.
Summary Table
| Operation | Regular Languages | Method of Proof |
|---|---|---|
| Union (∪) | Closed | RE: R₁ + R₂ |
| Intersection (∩) | Closed | Product DFA construction |
| Complement | Closed | Swap accept/reject states in DFA |
| Concatenation (·) | Closed | RE: R₁R₂ |
| Kleene Star (*) | Closed | RE: R* |
| Difference (-) | Closed | L₁ ∩ L₂' |
| Reversal | Closed | Reverse FA |
Exam Tip
When asked to "List 5 closure properties of Regular Languages", list: Union, Intersection, Complement, Concatenation, Kleene Star. These are the most frequently tested ones.
Practice Questions
Based on Mumbai University exam patterns (2022-2025):
2-3 Mark Questions (Definitions)
Alphabet (Σ): A finite, non-empty set of symbols.
Example: Σ = {0, 1} is the binary alphabet.
String: A finite sequence of symbols from an alphabet. The empty string is denoted by ε.
Example: "abba" is a string over Σ = {a, b} with length 4.
Language: Any subset of Σ* (set of all possible strings over alphabet Σ).
Example: L = {aⁿbⁿ | n ≥ 0} is a language over {a, b}.
Regular Expression: A notation for describing regular languages using:
- Basic elements: ∅ (empty language), ε (empty string), a (single symbol)
- Union (+): R₁ + R₂ means either R₁ or R₂
- Concatenation (·): R₁R₂ means R₁ followed by R₂
- Kleene Star (*): R* means zero or more repetitions of R
Precedence: * > Concatenation > Union
Example: (a+b)*abb represents all strings over {a,b} ending with "abb".
| Σ* (Kleene Star) | Σ⁺ (Positive Closure) |
|---|---|
| All strings including empty string | All strings excluding empty string |
| Σ* = {ε, a, b, aa, ab, ...} | Σ⁺ = {a, b, aa, ab, ...} |
| Σ* = Σ⁺ ∪ {ε} | Σ⁺ = Σ* - {ε} |
5 Mark Questions
Regular languages are closed under the following operations:
1. Union: If L₁ and L₂ are regular, then L₁ ∪ L₂ is regular.
Proof: If R₁ and R₂ are REs, then R₁ + R₂ is RE for the union.
2. Concatenation: If L₁ and L₂ are regular, then L₁ · L₂ is regular.
Proof: RE R₁R₂ describes the concatenation.
3. Kleene Star: If L is regular, then L* is regular.
Proof: RE R* describes L*.
4. Complement: If L is regular, then L' = Σ* - L is regular.
Proof: In DFA for L, swap accepting and non-accepting states.
5. Intersection: If L₁ and L₂ are regular, then L₁ ∩ L₂ is regular.
Proof: Use De Morgan's Law: L₁ ∩ L₂ = (L₁' ∪ L₂')', or construct product DFA.
(i) Strings with odd number of 1s:
0*(10*10*)*10*
Explanation: Start with any 0s, then one 1 (making it odd), then pairs of 1s (staying odd).
(ii) Strings having substring "10" or "11":
(0+1)*(10+11)(0+1)*
Explanation: Any prefix, then 10 or 11, then any suffix.
(iii) Strings with no consecutive 1s:
(0+10)*(1+ε)
Explanation: Either 0s or "10" blocks, optionally ending with single 1.
(iv) Strings containing sequence "011":
(0+1)*011(0+1)*
(v) Strings whose length is a multiple of 3:
((0+1)(0+1)(0+1))* or ((0+1)³)*
(i) 1(0+1)*0:
All binary strings that start with 1 and end with 0.
Examples: 10, 100, 110, 1010, 1110, ...
(ii) (aa)*(bb)*(b):
Strings with even number of a's followed by odd number of b's.
Examples: b, aab, aabbb, aaaab, ...
(iii) (ab+ba)*:
Strings made by concatenating "ab" or "ba" any number of times (including empty string).
Examples: ε, ab, ba, abab, abba, baba, ...
(iv) (A-Z)(a-z)*(a+e+i+o+u):
Strings starting with uppercase letter, followed by any lowercase letters, ending with a vowel.
Examples: Aa, He, Hello, Indonesia, ...
Statement:
If L is a regular language, then there exists a constant p (pumping length) such that for every string s ∈ L with |s| ≥ p, s can be written as s = xyz where:
- |xy| ≤ p
- |y| > 0 (y is not empty)
- For all i ≥ 0, xyⁱz ∈ L
Usage:
Used to prove a language is NOT regular by contradiction:
- Assume L is regular
- Choose a string s ∈ L with |s| ≥ p
- For ALL ways to split s = xyz (satisfying conditions 1 & 2)
- Show that some xyⁱz ∉ L
- Contradiction → L is not regular
Example: Prove L = {aⁿbⁿ | n ≥ 0} is not regular.
Choose s = aᵖbᵖ. Since |xy| ≤ p, y consists only of a's.
Pumping: xy²z = aᵖ⁺|ʸ|bᵖ ∉ L (unequal a's and b's). Contradiction!
10 Mark Questions
Step 1: Analyze the RE
- Must start with 0
- Followed by zero or more 0s
- Then zero or more (01 or 0)
Step 2: Break into components
- S generates initial "0"
- A handles "0*"
- B handles "(01+0)*"
Step 3: Write the grammar
Step 4: Simplify (combine B and C)
Step 5: Verify with examples
- "0" → S ⇒ 0A ⇒ 0
- "00" → S ⇒ 0A ⇒ 00A ⇒ 00
- "001" → S ⇒ 0A ⇒ 00A ⇒ 00B ⇒ 001A ⇒ 001
Given RLG:
Step 1: Construct FA from RLG
States: S (start), A, B, F (final)
Transitions:
- S --a--> A, S --b--> B
- A --a--> A, A --a--> F
- B --b--> B, B --b--> F
Step 2: Reverse the FA
New start: F, New final: S
Reversed transitions:
- A --a--> S, B --b--> S
- A --a--> A, F --a--> A
- B --b--> B, F --b--> B
Step 3: Write LLG from reversed FA
Language: a⁺ + b⁺ (one or more a's, OR one or more b's)