Study Notes on Context-Free Grammars and Push-Down Automata

By BYJU'S Exam Prep

Updated on: September 25th, 2023

Context-Free Language

The languages which are generated by context-free grammars are called Context-Free Languages (CFLs).
CFLs are accepted by Pushdown Automata.
CFLs are also called non-deterministic CFL.

Table of content

1. Deterministic Context-Free Language (DCFL)
2. Pushdown Automata (PDA)
3. Detailed GATE CSE 2021 Champion Study Plan
4. Click Here to Avail GATE CSE Test Series! (100+ Mock Tests)
5. Click here to avail Online Classroom Program for Computer Science Engineering

Definition

If v in one-step derivable from u, then written as u ⇒ v. If v is derivable from u, written u if there is a chain of one derivation of the form u ⇒ u₁ ⇒ u₂ ⇒ … ⇒ v

Example: Consider the context free grammar G = ({s}, {0, 1}, P, S) where Productions are:

(i) S → 0S1

(ii) S →ε

Derivations are:

Derivations:

A derivation of a string is a sequence of rule applications. The language known a context-free grammar is the set of strings derivable from the start symbol S (for sentence).

A string can be derived with any of the given below derivations.

Left Most Derivation:

In each sentential form, the left-most non-terminal is substituted first to derive a string from the starting symbol.
A derivation is left most if at each step in the derivation a production is applied to the leftmost non-terminal in the sentential form.

Right Most Derivation:

In each sentential form, the rightmost non-terminal is substituted first to derive a string from the starting symbol.
A derivation is left most if at each step in the derivation a production is applied to the leftmost non-terminal in the sentential form.

Example:

Every derivation corresponds to one derivation tree.

S ⇒ AB

⇒ aAAB

⇒ aaAB

⇒ aaaB

⇒ aaab

Every derivation tree corresponds to one or more derivations.

S ⇒ AB S ⇒ AB S ⇒ AB

⇒ aAAB ⇒ Ab ⇒ AB

⇒ aaAB ⇒ aAAb ⇒ aAAb

⇒ aaaB ⇒ aAab ⇒ aaAb

⇒ aaab ⇒ aaab ⇒ aaab

Derivation Tree (Parse Tree)

A derivation tree (or parse tree) can be defined with any non-terminal as the root, internal nodes are non-terminals and leaf nodes are terminals.
Every derivation corresponds to one derivation tree.
If a vertex A has k children with labels A₁, A₂, A₃,…A_k, then A → A₁ A₂ A₃…A_k will be a production in context-free grammar G.

Example:

S → AB, A → aAA, A → aA, B → bB, B → b

Ambiguous Grammar

A context-free grammar G is ambiguous if there is at least one string in L(G) having two or more distinct derivation trees (or equivalently, two or more distinct left-most derivations or two or more distinct right-most derivations).

e.g., consider the context-free grammar G having productions E → E + E/a. The string a + a + a has two leftmost derivations.

Let’s see the derivations

E ⇒ E + E ⇒ E + E + E ⇒ a + E + E ⇒ a + a + E ⇒ a + a + a

E ⇒ E + E ⇒ a + E ⇒ a ⇒ E + E ⇒ a + a + E ⇒ a + a + a

and the derivation trees are

CFG Simplification

The four main steps will be followed in CFG simplification

Eliminate ambiguity.
Eliminate useless symbols productions.
Eliminate ∧ productions: A → ∧
• Eliminate unit productions: A → B

Eliminate the Ambiguity

We can remove the ambiguity by removing the left recursing and left factoring.

Left Recursion

A production of the context free grammar G = (V_N, E, P, S) is said to be left recursive if it is of the form

A → Aα

Where A is a nonterminal and

α∈(V_N ∪ E)^*

Removal of Left Recursion

Let the variable A has left recursive productions as follows

(i) A → Aα₁ |Aα₂|Aα₃|…|Aα_n|β₁|β₂|β₃|…|β_n

Where β₁, β₂ ….. β_n do not begin with A. Then we replace A production in the form of

(ii) A → β₁A¹ |β₂A¹|…|β_mA¹ where

A¹ → α₁A¹|α₂A¹|α₃A¹|…,|α_nA¹|∧

Left Factoring

Two or more productions of a variable A of the grammar G = (V_N, E, S, P) are said to have left factoring if the productions are of the form

A → αβ₁|αβ₂|…αβ_n where β₁,…β_n(V_N ∪Σ)

Removal of Left Factoring

Let the variable A has left factoring productions as follows

A → αβ₁|αβ₂|…|αβ_n|y₁|y₂|y₃|…|y_m

where, β₁, β₂…..,β_n have a common factor

α and y₁, y₂,….y_m does not contain a as a prefix, then we replace the production into the form as follows

A → αA¹|Y₁Y₂|…..|Y_M, where

A¹ → β₁|β₂|…..|β_n

Eliminate the Useless Productions/Symbols

The symbols that cannot be used in any productions due to their unavailability in the productions or inability in deriving the terminals are known as useless symbols.

e,g., consider the grammar G with the following production rules

S → aS |A| C

A → a

B → aa

C → ab

Step 1 Generate the list of variables that produce terminal symbols

U = {A, B, S}

Because C does not produce terminal symbols so this product will be deleted. Now the modified productions are

S → aS |A

A → a

B → aa

Step 2 Identify the variables dependency graph

S → AB

In this graph, B variable is unreachable from S so it will be deleted too and Now the productions are as-

S → aS |A

A → a

Eliminate Null Productions

If any variable goes to ∧ then that is called a nullable variable.

e.g., A → ∧, then variable A is called a nullable variable

Step 1 -Firstly, Scan the nullable variables in the given production list.

Step 2 – Then find all productions which do not include null productions.

(Null productions)

e.g., consider the CFG has the following productions

S → ABaC

A → BC

B → b|∧

C → D|∧

D → d

solve step find the nullable variables Firstly the set is empty

N = {}

N = {B, C}

N = {A, B, C}

Due to B, C variables, A will also be a nullable variable.

Step 3 {Null productions}

S → BaC | AaC | ABa | aC | Ba | Aa | a

A → B | C

B → b

C → D

D → d

The above grammar is every possible combination except ∧ Now put this new grammar with original grammar with null.

S → ABaC | BaC | AaC | ABa | aC | Ba | Aa | a

Eliminate the Unit-Productions

Production of type A → B, where A, B are variables is called unit productions.

Step 1 Using productions, we create a dependency graph

S ⇒ B

∵ S B & B A

∴ S A

Step 2 Now the production without unit productions

S → Aa S → bb | a | bc

B → bb + A → bb

A → a | bc B → a | bc

Now the final grammar is

S → Aa | bb | a | bc

B → bb | a | bc

A → a | bc | bb

Normal Forms of CFGs

Ambiguity is the undesirable property of context-free grammar that we may wish to eliminate. To convert a context-free grammar into normal form, we start by trying to eliminate null productions of form A → ∧ and the unit productions of form B → C.

There are two normal forms

Chomsky Normal Form (CNF)
Greibach Normal Form (GNF)

Chomsky Normal Form (CNF)

A context-free grammar G is said to be in Chomsky Normal Form if every product is of the form either A → a, (exactly a single terminal on the right-hand side of the production) or A → BC (exactly two variables on the right-hand side of the production).

e.g., context-free grammar G with productions like S → AB, A → a, B → b is in Chomsky normal form(CnF)form.

Chomsky Normal Form Properties

The number of steps in the derivation of any string ω of length n is 2n – 1, where the grammar should be in CNF.
The minimum height of the derivation tree of any ω of length n is [log₂ n] + 1.
The maximum height of the derivation tree of any ω of length n = n.

Greibach Normal Form (GNF)

A context-free grammar is said to be in Greibach Normal Form if every product is of the form

A → aα

where, a∈Σ, A∈V_N and α∈

Deterministic Context-Free Language (DCFL)

The set of deterministic context-free languages(DCFL) is a proper subset of the set of context-free languages that possess/bears an unambiguous context-free grammar.

Key Points

The problem of whether a given context-free language is deterministic is undividable.
Deterministic context-free languages can be recognized by a deterministic turning machine in polynomial time and O (log² n) space.
The language of this class has great practical importance in computer science as it can be passed much more efficiently than non-deterministic context-free languages.

Pushdown Automata (PDA)

A Pushdown Automata (PDA) is generally an NFA with a stack. A PDA is inherently non-deterministic. To handle a language like {aⁿ bⁿ |n ≥ 0}, the machine needs to remember the number of a’s and b’s. To do this, we use a stack. So, a PDA is said a finite automaton with a stack. A stack is a data structure that can contain a number of elements but for which only the top element may be accessed.

Definition of PDA

A Pushdown Automaton (PDA) is defined as 7-tuple.

M = (Q, Σ, Γ, δ, q₀, Z,F)

where Q is a finite set of states

Σ is the input alphabet

Γ is the stack alphabet

δ is the transition function that maps

(Q × (Σ ∪ {ε}) × (Γ ∪{ε}) = (Q × (Γ ∪ {ε}))

q₀ is the start state and ε denotes the empty string.

q₀∈Q is the start state

Z ∈Γ is the initial stack symbol

F ⊆ Q is the set of final or accepting states.

Acceptance of PDA

Tape is divided into finitely many cells. Each cell contains a symbol in an alphabet L. The stack head always scans the top symbol of the stack as shown in the figure.

It performs two basic operations

Push add a new symbol at the top.

Pop read and remove the top symbol.

δ (q, a, v) = (p, u)

It means that if the tape head reads specific input a, the stack head read v and the finite control is in state q, then one of the possible moves might be that the next state is p, v is replaced by u at the stack, Thus tape head moves one cell to the right.

δ (q, ε, v) = (p, u)

It thus means that this is a ε -move (null move)

δ (q, a, ε) = (p, u)

It means that a push operation performs on a stack.

δ (q, a, v) = (p, ε)

It means that a pop operation performs on a stack.

PDA can accept a string in three ways:

PDA acceptance by Empty Stack: If the stack is empty after reading the entire input string then PDA accepted the given string, otherwise rejected.
PDA acceptance by Final State: If the stack reaches the final state after reading the input string then PDA accepted the given string, otherwise rejected.
PDA acceptance by Final State and Empty Stack: If the stack reaches the final state and also stack is empty after reading the entire input string then PDA accepted the given string, otherwise rejected.

Non-deterministic PDA: Like NFA, Non-deterministic PDA (NPDA) has a number of choices for its inputs. An NPDA accepts input if the sequence of choices leads to some final state or causes PDA to empty its stack.

Deterministic PDA

Deterministic PDA (DPDA) is a pushdown automaton whose action is a situation is fully determined rather than facing a choice between multiple alternative actions. DPDAs cannot handle languages or grammar with ambiguity. A deterministic context-free language is a language recognised by some deterministic pushdown automata.

Following languages are DCFLs

L = {aⁿbⁿ : n ≥ 0}
L = {aⁿcb²ⁿ : n ≥ 0}
L = {ωcω^R : ω∈(a + b) * but not L = {ωω^R : (a + b)*}
For every regular set, there exists a CFG e such that L = L (G).
Every regular language is a CFL.
Let G₁ and G₂ be context-free grammars. Then, G₁ and G₂are equivalent if and only if L (G₁) = L (G₂).

The intersection of a context-free language and a regular language is a context-free language.
The reverse of a context-free language is context-free.
A DFA can remember only a finite amount of information whereas a PDA can remember an infinite amount of information.
For every PDA, there is context-free grammar and for every context-free grammar, there is a PDA.
If L₁ is a DCFL and L₂ is regular then, L₁ ∪ L₂ is also DCFL.
If L₁ is a DCFL and L₂ is a regular language, then L₁ ∩ L₂ is also DCFL.
Every regular language is DCFL.
The power of nondeterministic pushdown automata and deterministic pushdown automata is not the same. But the power of nondeterministic pushdown automata and deterministic pushdown is the same.
An FSM (Finite State Machine) with one stack is more powerful than FSM without a stack.
If left recursion or left factoring is present, it is not sure that the grammar is ambiguous but there may be a chance of ambiguity.

Closure Properties of CFLs

CFLs are closed under the following properties:

Union
Concatenation
Kleene closure
Positive closure
Substitution
Homomorphism
Inverse homomorphism
Reversal
Intersection with regular
Union with regular
Difference with regular

CFLs are not closed under the following properties :

Intersection
Complementation
Difference

Decision Properties of CFL’s

Decidable Problems:

Emptiness Problem: Is a given L(G) accepts empty?
Membership problem: Is a string w accepted by a given PDA?
Finiteness Problem: Is a given L(G) accepts finite language?

Undecidable Problems:

Equivalence problem: Are two CFLs the same?
Ambiguity problem: Is a given CFG ambiguous?
Is a given CFL inherently ambiguous?
Is the intersection of two CFLs empty?
Totality problem: Is a given L(G) equal to ∑*?

You can follow the detailed champion study plan for GATE CS 2021 from the following link:

Download BYJU’S Exam Prep app for GATE CS Preparation here

Study Notes on Context-Free Grammars and Push-Down Automata

Context-Free Language

Definition

Derivations:

Left Most Derivation:

Right Most Derivation:

Derivation Tree (Parse Tree)

Ambiguous Grammar

CFG Simplification

Eliminate the Ambiguity

Left Recursion

Removal of Left Recursion

Left Factoring

Removal of Left Factoring

Eliminate the Useless Productions/Symbols

Eliminate Null Productions

Eliminate the Unit-Productions

Normal Forms of CFGs

Chomsky Normal Form (CNF)

Chomsky Normal Form Properties

Deterministic Context-Free Language (DCFL)

Key Points

Pushdown Automata (PDA)

Definition of PDA

Acceptance of PDA

Deterministic PDA

Closure Properties of CFLs

Decision Properties of CFL’s

Decidable Problems:

Undecidable Problems:

Detailed GATE CSE 2021 Champion Study Plan

Click Here to Avail GATE CSE Test Series! (100+ Mock Tests)

Click here to avail Online Classroom Program for Computer Science Engineering

Recent Posts