# Study Notes on Context-Free Grammars and Push-Down Automata

By BYJU'S Exam Prep

Updated on: September 25th, 2023

**Context-Free Language**

- The languages which are generated by context-free grammars are called Context-Free Languages (CFLs).
- CFLs are accepted by Pushdown Automata.
- CFLs are also called non-deterministic CFL.

Table of content

**Definition**

**Definition**

If v in one-step derivable from u, then written as u ⇒ v. If v is derivable from u, written u if there is a chain of one derivation of the form u ⇒ u_{1} ⇒ u_{2} ⇒ … ⇒ v

Example: Consider the context free grammar G = ({s}, {0, 1}, P, S) where Productions are:

(i) S → 0S1

(ii) S →ε

Derivations are:

**Derivations:**

A derivation of a string is a sequence of rule applications. The language known a context-free grammar is the set of strings derivable from the start symbol S (for sentence).

A string can be derived with any of the given below derivations.

**Left Most Derivation:**

**Left Most Derivation:**

- In each sentential form, the left-most non-terminal is substituted first to derive a string from the starting symbol.
- A derivation is left most if at each step in the derivation a production is applied to the leftmost non-terminal in the sentential form.

**Right Most Derivation: **

**Right Most Derivation:**

- In each sentential form, the rightmost non-terminal is substituted first to derive a string from the starting symbol.
- A derivation is left most if at each step in the derivation a production is applied to the leftmost non-terminal in the sentential form.

Example:

- Every derivation corresponds to one derivation tree.

S ⇒ AB

⇒ aAAB

⇒ aaAB

⇒ aaaB

⇒ aaab

- Every derivation tree corresponds to one or more derivations.

S ⇒ AB S ⇒ AB S ⇒ AB

⇒ aAAB ⇒ Ab ⇒ AB

⇒ aaAB ⇒ aAAb ⇒ aAAb

⇒ aaaB ⇒ aAab ⇒ aaAb

⇒ aaab ⇒ aaab ⇒ aaab

**Derivation Tree (Parse Tree)**

**Derivation Tree (Parse Tree)**

- A derivation tree (or parse tree) can be defined with any non-terminal as the root, internal nodes are non-terminals and leaf nodes are terminals.
- Every derivation corresponds to one derivation tree.
- If a vertex A has k children with labels A
_{1}, A_{2}, A_{3},…A_{k}, then A → A_{1}A_{2}A_{3}…A_{k}will be a production in context-free grammar G.

**Example: **

S → AB, A → aAA, A → aA, B → bB, B → b

**Ambiguous Grammar**

A context-free grammar G is ambiguous if there is at least one string in L(G) having two or more distinct derivation trees (or equivalently, two or more distinct left-most derivations or two or more distinct right-most derivations).

e.g., consider the context-free grammar G having productions E → E + E/a. The string a + a + a has two leftmost derivations.

Let’s see the derivations

E ⇒ E + E ⇒ E + E + E ⇒ a + E + E ⇒ a + a + E ⇒ a + a + a

E ⇒ E + E ⇒ a + E ⇒ a ⇒ E + E ⇒ a + a + E ⇒ a + a + a

and the derivation trees are

**CFG Simplification**

The four main steps will be followed in CFG simplification

- Eliminate ambiguity.
- Eliminate useless symbols productions.
- Eliminate ∧ productions: A → ∧
- • Eliminate unit productions: A → B

**Eliminate the Ambiguity**

We can remove the ambiguity by removing the left recursing and left factoring.

**Left Recursion**

A production of the context free grammar G = (V_{N}, E, P, S) is said to be left recursive if it is of the form

A → Aα

Where A is a nonterminal and

α∈(V_{N} ∪ E)^{*}

**Removal of Left Recursion**

Let the variable A has left recursive productions as follows

(i) A → Aα_{1} |Aα_{2}|Aα_{3}|…|Aα_{n}|β_{1}|β_{2}|β_{3}|…|β_{n}

Where β_{1}, β_{2} ….. β_{n} do not begin with A. Then we replace A production in the form of

(ii) A → β_{1}A^{1} |β_{2}A^{1}|…|β_{m}A^{1} where

A^{1} → α_{1}A^{1}|α_{2}A^{1}|α_{3}A^{1}|…,|α_{n}A^{1}|∧

**Left Factoring**

Two or more productions of a variable A of the grammar G = (V_{N}, E, S, P) are said to have left factoring if the productions are of the form

A → αβ_{1}|αβ_{2}|…αβ_{n} where β_{1},…β_{n}(V_{N} ∪Σ)

**Removal of Left Factoring**

Let the variable A has left factoring productions as follows

A → αβ_{1}|αβ_{2}|…|αβ_{n}|y_{1}|y_{2}|y_{3}|…|y_{m}

where, β_{1}, β_{2}…..,β_{n} have a common factor

α and y_{1}, y_{2},….y_{m} does not contain a as a prefix, then we replace the production into the form as follows

A → αA^{1}|Y_{1}Y_{2}|…..|Y_{M}, where

A^{1} → β_{1}|β_{2}|…..|β_{n}

**Eliminate the Useless Productions/Symbols**

The symbols that cannot be used in any productions due to their unavailability in the productions or inability in deriving the terminals are known as **useless symbols**.

e,g., consider the grammar G with the following production rules

S → aS |A| C

A → a

B → aa

C → ab

**Step 1** Generate the list of variables that produce terminal symbols

U = {A, B, S}

Because C does not produce terminal symbols so this product will be deleted. Now the modified productions are

S → aS |A

A → a

B → aa

**Step 2 **Identify the variables dependency graph

S → AB

In this graph, B variable is unreachable from S so it will be deleted too and Now the productions are as-

S → aS |A

A → a

**Eliminate Null Productions**

If any variable goes to ∧ then that is called a **nullable variable.**

e.g., A → ∧, then variable A is called a nullable variable

**Step 1 -Firstly, **Scan the nullable variables in the given production list.

**Step 2 – Then **find all productions which do not include null productions.

(Null productions)

e.g., consider the CFG has the following productions

S → ABaC

A → BC

B → b|∧

C → D|∧

D → d

solve step find the nullable variables Firstly the set is empty

N = {}

N = {B, C}

N = {A, B, C}

Due to B, C variables, A will also be a nullable variable.

**Step 3 **{Null productions}

S → BaC | AaC | ABa | aC | Ba | Aa | a

A → B | C

B → b

C → D

D → d

The above grammar is every possible combination except ∧ Now put this new grammar with original grammar with null.

S → ABaC | BaC | AaC | ABa | aC | Ba | Aa | a

**Eliminate the Unit-Productions**

Production of type A → B, where A, B are variables is called **unit productions**.

**Step 1 **Using productions, we create a dependency graph

S ⇒ B

∵ S B & B A

∴ S A

**Step 2** Now the production without unit productions

S → Aa S → bb | a | bc

B → bb + A → bb

A → a | bc B → a | bc

Now the final grammar is

S → Aa | bb | a | bc

B → bb | a | bc

A → a | bc | bb

**Normal Forms of CFGs**

Ambiguity is the undesirable property of context-free grammar that we may wish to eliminate. To convert a context-free grammar into normal form, we start by trying to eliminate null productions of form A → ∧ and the unit productions of form B → C.

There are two normal forms

- Chomsky Normal Form (CNF)
- Greibach Normal Form (GNF)

**Chomsky Normal Form (CNF)**

A context-free grammar G is said to be in Chomsky Normal Form if every product is of the form either A → a, (exactly a single terminal on the right-hand side of the production) or A → BC (exactly two variables on the right-hand side of the production).

e.g., context-free grammar G with productions like S → AB, A → a, B → b is in Chomsky normal form(CnF)form.

**Chomsky Normal Form Properties**

- The number of steps in the derivation of any string ω of length n is 2n – 1, where the grammar should be in CNF.
- The minimum height of the derivation tree of any ω of length n is [log
_{2}n] + 1. - The maximum height of the derivation tree of any ω of length n = n.

**Greibach Normal Form (GNF)**

A context-free grammar is said to be in Greibach Normal Form if every product is of the form

A → aα

where, a∈Σ, A∈V_{N} and α∈

**Deterministic Context-Free Language **(DCFL)

The set of deterministic context-free languages(DCFL) is a proper subset of the set of context-free languages that possess/bears an unambiguous context-free grammar.

**Key Points**

- The problem of whether a given context-free language is deterministic is undividable.
- Deterministic context-free languages can be recognized by a deterministic turning machine in polynomial time and O (log
^{2}n) space. - The language of this class has great practical importance in computer science as it can be passed much more efficiently than non-deterministic context-free languages.

**Pushdown Automata **(PDA)

A Pushdown Automata (PDA) is generally an NFA with a **stack**. A PDA is inherently non-deterministic. To handle a language like {a^{n} b^{n} |n ≥ 0}, the machine needs to remember the number of a’s and b’s. To do this, we use a stack. So, a PDA is said a finite automaton with a stack. A stack is a data structure that can contain a number of elements but for which only the top element may be accessed.

**Definition of PDA**

A Pushdown Automaton (PDA) is defined as 7-tuple.

M = (Q, Σ, Γ, δ, q_{0}, Z,F)

where Q is a finite set of states

Σ is the input alphabet

Γ is the stack alphabet

δ is the transition function that maps

(Q × (Σ ∪ {ε}) × (Γ ∪{ε}) = (Q × (Γ ∪ {ε}))

q_{0} is the start state and ε denotes the empty string.

q_{0}∈Q is the start state

Z ∈Γ is the initial stack symbol

F ⊆ Q is the set of final or accepting states.

**Acceptance of PDA**

Tape is divided into finitely many cells. Each cell contains a symbol in an alphabet L. The stack head always scans the top symbol of the stack as shown in the figure.

It performs two basic operations

**Push** add a new symbol at the top.

**Pop** read and remove the top symbol.

δ (q, a, v) = (p, u)

It means that if the tape head reads specific input a, the stack head read v and the finite control is in state q, then one of the possible moves might be that the next state is p, v is replaced by u at the stack, Thus tape head moves one cell to the right.

δ (q, ε, v) = (p, u)

It thus means that this is a ε -move (null move)

δ (q, a, ε) = (p, u)

It means that a push operation performs on a stack.

δ (q, a, v) = (p, ε)

It means that a pop operation performs on a stack.

PDA can accept a string in three ways:

**PDA acceptance by Empty Stack:**If the stack is empty after reading the entire input string then PDA accepted the given string, otherwise rejected.**PDA acceptance by Final State:**If the stack reaches the final state after reading the input string then PDA accepted the given string, otherwise rejected.**PDA acceptance by Final State and Empty Stack:**If the stack reaches the final state and also stack is empty after reading the entire input string then PDA accepted the given string, otherwise rejected.

**Non-deterministic PDA: **Like NFA, Non-deterministic PDA (NPDA) has a number of choices for its inputs. An NPDA accepts input if the sequence of choices leads to some final state or causes PDA to empty its stack.

**Deterministic PDA**

Deterministic PDA (DPDA) is a pushdown automaton whose action is a situation is fully determined rather than facing a choice between multiple alternative actions. DPDAs cannot handle languages or grammar with ambiguity. A deterministic context-free language is a language recognised by some deterministic pushdown automata.

Following languages are DCFLs

- L = {a
^{n}b^{n}: n ≥ 0} - L = {a
^{n}cb^{2n}: n ≥ 0} - L = {ωcω
^{R}: ω∈(a + b) * but not L = {ωω^{R}: (a + b)*} - For every regular set, there exists a CFG e such that L = L (G).
- Every regular language is a CFL.
- Let G
_{1}and G_{2}be context-free grammars. Then, G_{1}and G_{2 }are equivalent if and only if L (G_{1}) = L (G_{2}).

- The intersection of a context-free language and a regular language is a context-free language.
- The reverse of a context-free language is context-free.
- A DFA can remember only a finite amount of information whereas a PDA can remember an infinite amount of information.
- For every PDA, there is context-free grammar and for every context-free grammar, there is a PDA.
- If L
_{1}is a DCFL and L_{2}is regular then, L_{1}∪ L_{2}is also DCFL. - If L
_{1}is a DCFL and L_{2}is a regular language, then L_{1}∩ L_{2}is also DCFL. - Every regular language is DCFL.
- The power of nondeterministic pushdown automata and deterministic pushdown automata is not the same. But the power of nondeterministic pushdown automata and deterministic pushdown is the same.
- An FSM (Finite State Machine) with one stack is more powerful than FSM without a stack.
- If left recursion or left factoring is present, it is not sure that the grammar is ambiguous but there may be a chance of ambiguity.

**Closure Properties of CFLs**

CFLs are closed under the following properties:

- Union
- Concatenation
- Kleene closure
- Positive closure
- Substitution
- Homomorphism
- Inverse homomorphism
- Reversal
- Intersection with regular
- Union with regular
- Difference with regular

CFLs are not closed under the following properties :

- Intersection
- Complementation
- Difference

**Decision Properties of CFL’s**

**Decidable Problems:**

- Emptiness Problem: Is a given L(G) accepts empty?
- Membership problem: Is a string w accepted by a given PDA?
- Finiteness Problem: Is a given L(G) accepts finite language?

**Undecidable Problems:**

- Equivalence problem: Are two CFLs the same?
- Ambiguity problem: Is a given CFG ambiguous?
- Is a given CFL inherently ambiguous?
- Is the intersection of two CFLs empty?
- Totality problem: Is a given L(G) equal to ∑*?

You can follow the detailed champion study plan for GATE CS 2021 from the following link:

**Detailed GATE CSE 2021 Champion Study Plan**

Candidates can also practice 110+ Mock tests for exams like GATE, NIELIT with BYJU’S Exam Prep Test Series check the following link:

## Click Here to Avail GATE CSE Test Series! (100+ Mock Tests)

Get unlimited access to 21+ structured Live Courses all 112+ mock tests with Online Classroom Program for GATE CS & PSU Exams:

## Click here to avail Online Classroom Program for Computer Science Engineering

**Thanks**

**#DreamStriveSucceed**

*Download BYJU’S Exam Prep app for*

**GATE CS Preparation**here