Study Notes on Lexical Analysis

By Mukesh Kumar|Updated : August 12th, 2021

Lexical analyzer reads the source program character by character and returns the tokens of the source program. It puts information about identifiers into the symbol table.

The Role of Lexical Analyzer:

  • It is the first phase of a compiler
  • It reads the input character and produces an output sequence of tokens that the Parser uses for syntax analysis.
  • It can either work as a separate module or as a submodule.
  • Lexical Analyzer is also responsible for eliminating comments and white spaces from the source program.
  • It also generates lexical errors.


  • Lexical Analyzer is also responsible for eliminating comments and white spaces from the source program.
  • It also generates lexical errors.

Tokens, Lexemes and Patterns

  • A token describes a pattern of characters having same meaning in the source program such as identifiers, operators, keywords, numbers, delimiters and so on. A token may have a single attribute which holds the required information for that token. For identifiers, this attribute is a pointer to the symbol table and the symbol table holds the actual attributes for that token.
  • Token type and its attribute uniquely identifies a lexeme.
  • Regular expressions are widely used to specify pattern.
Tokens, Patterns and Lexemes
Lexeme: A lexeme is a sequence of alphanumeric characters that is matched against the pattern for a token. 
Pattern: The rule associated with each set of string is called pattern. Lexeme is matched against pattern to generate token.
Token: Token is word, which describes the lexeme in source pgm. Its is generated when lexeme is matched against pattern. Example: Lexeme: A1, Sum, Total
  • Pattern: Starting with a letter and followed by letter or digit but not a keyword.
  • Token: ID
Lexeme: If | Then | Else
  • Pattern: If | Then | Else
  • Token: IF | THEN | ELSE
Lexeme: 123.45
  • Pattern: Starting with digit followed by a digit or optional fraction and or optional exponent
  • Token: NUM

Counting Number of tokens :

A token is usually described by an integer representing the kind of token, possibly together with an attribute, representing the value of the token. For example, in most programming languages we have the following kinds of tokens.

  • Identifiers (x, y, average, etc.)
  • Reserved or keywords (if, else, while, etc.)
  • Integer constants (42, 0xFF, 0177 etc.)
  • Floating point constants (5.6, 3.6e8, etc.)
  • String constants ("hello there\n", etc.)
  • Character constants ('a', 'b', etc.)
  • Special symbols (( ) : := + - etc.)
  • Comments (To be ignored.)
  • Compiler directives (Directives to include files, define macros, etc.)
  • Line information (We might need to detect newline characters as tokens, if they are syntactically important. We must also increment the line count, so that we can indicate the line number for error messages.)
  • White space (Blanks and tabs that are used to separate tokens, but otherwise are not important).
  • End of file

Each reserved word or special symbol is considered to be a different kind of token, as far as the parser is concerned. They are distinguished by a different integer to represent their kind.

Example :



So that was all about lexical analysis. Now practice the questions from the app.

You can follow the detailed champion study plan for GATE CS 2022 from the following link:

Detailed GATE CSE 2022 Champion Study Plan

Candidates can also practice 110+ Mock tests for exams like GATE, NIELIT with BYJU'S Exam Prep Test Series check the following link:

Click Here to Avail GATE CSE Test Series!(100+ Mock Tests)

Get unlimited access to 21+ structured Live Courses all 112+ mock tests with Online Classroom Program for GATE CS & PSU Exams:

Click here to avail Online Classroom Program for Computer Science Engineering


Download BYJU'S Exam Prep, Best gate exam app for Preparation


write a comment
Load Previous Comments
Aruna D

Aruna DNov 10, 2019

Send Mee all documents sir please
Sanjay Roka
whose notes are better for Gate
cse sir ace academy or engineer's success
Sanjay Roka
please send me all notes for cse sir
Mamta Joshi

Mamta JoshiApr 23, 2020

please send me complete notes on compiler design and operating system
Busetty Sarayu
How no of tokens in it are 11?I don't get it..!?

NitulMay 9, 2020

The main goal of gradup is just to give long lectures and wasting our precious time we are here with the hope that gradup will guide us better to crack the exam but unfortunately we are just getting 2+ hrs lectures without the proper guidance about the type of questions we will face in exam from each topic , Kindly request make the classes question solving oriented so that we get better idea about the questions we will face in exam from particular topic instead of long lectures.
Ram Guditi

Ram GuditiJul 29, 2020

Isn't semicolon(;) a token??
Can i get pdf file of all lectur3 for CSE
Alam Kamal

Alam KamalAug 15, 2021

Sir please send all documents  cse.

VashuJan 6, 2022

Plz make it in usable form

Follow us for latest updates