Specification and Recognition of Tokens in Compiler Design | टोकन की स्पेसिफिकेशन और पहचान - रेगुलर एक्सप्रेशन एवं फाइनाइट ऑटोमाटा सहित

टोकन की स्पेसिफिकेशन और पहचान (Specification and Recognition of Tokens in Compiler Design)

Tokens कंपाइलर के Lexical Analysis चरण का सबसे मूलभूत तत्व हैं। किसी प्रोग्राम में मौजूद सभी अर्थपूर्ण यूनिट्स (identifiers, keywords, operators, constants, delimiters आदि) को Lexical Analyzer द्वारा पहचानने की प्रक्रिया को Token Recognition कहा जाता है। प्रत्येक token की संरचना (structure) Regular Expression द्वारा परिभाषित की जाती है, जिसे हम Token Specification कहते हैं।

📘 टोकन क्या है?

Token किसी प्रोग्राम का सबसे छोटा अर्थपूर्ण घटक होता है। यह source code को logical रूप से विभाजित करने में मदद करता है। उदाहरण के लिए:

a = b + 10;

यहाँ टोकन्स होंगे:

1️⃣ Identifier (a)
2️⃣ Assignment Operator (=)
3️⃣ Identifier (b)
4️⃣ Arithmetic Operator (+)
5️⃣ Constant (10)
6️⃣ Delimiter (;)

🧩 Token के मुख्य भाग:

Token Name: Token की श्रेणी (जैसे identifier, number, keyword आदि)।
Pattern: Token की पहचान के लिए regular expression।
Lexeme: Source code का वास्तविक substring जो token बनता है।

उदाहरण:

Token Name	Pattern (Regular Expression)	Lexeme
ID	Letter(Letter\|Digit)*	sum, total
NUM	Digit+	5, 120
ASSIGN	=	=
PLUS	+	+

⚙️ Token Specification (टोकन की स्पेसिफिकेशन)

Token Specification का उद्देश्य है प्रत्येक token के लिए pattern (या rule) निर्धारित करना। इसके लिए Regular Expressions का उपयोग किया जाता है। यह patterns Lexical Analyzer को यह बताने में मदद करते हैं कि कौन सा substring किस token से मेल खाता है।

सामान्य Token Specifications:

Identifiers: Letter(Letter|Digit)*
Numbers: Digit+
Operators: + | - | * | / | = | ==
Keywords: if | else | while | for | return
Delimiters: ( ) { } ; ,

Regular Expressions Example:

Identifier → [a-zA-Z][a-zA-Z0-9]*
Number     → [0-9]+
Relop      → (< | > | <= | >= | == | !=)

🧠 Token Recognition (टोकन की पहचान)

Lexical Analyzer इन patterns को पहचानने के लिए Finite Automata का उपयोग करता है। इस प्रक्रिया में NFA (Non-deterministic Finite Automata) को DFA (Deterministic Finite Automata) में परिवर्तित किया जाता है।

Token Recognition Process:

Regular Expression को NFA में परिवर्तित करें।
NFA को DFA में बदलें।
DFA के माध्यम से input characters को स्कैन करें।
जैसे ही DFA किसी accepting state में पहुँचता है, एक token पहचान लिया जाता है।

📊 Diagram (Token Recognition using DFA):

Start → (Letter) → ID State → Accept
       ↘ (Digit) → NUM State → Accept
       ↘ (+,-,*,/) → Operator State → Accept

📗 Example (Step-by-Step Token Recognition):

Source: a = b + 25;

Step 1️⃣ → 'a' matches pattern [a-zA-Z] → Token(ID)

Step 2️⃣ → '=' matches pattern '=' → Token(ASSIGN)

Step 3️⃣ → 'b' → Token(ID)

Step 4️⃣ → '+' → Token(PLUS)

Step 5️⃣ → '25' matches pattern [0-9]+ → Token(NUM)

Step 6️⃣ → ';' → Token(DELIMITER)

⚙️ Ambiguity in Token Recognition:

कई बार overlapping patterns होने पर ambiguity उत्पन्न होती है। जैसे, “==” और “=” दोनों patterns overlap करते हैं। Lexical Analyzer ऐसे मामलों में Longest Match Rule और Rule Priority का उपयोग करता है।

📘 Lexical Errors in Token Recognition:

Unknown character → “@”, “#”
Invalid identifier → “1abc”
Unterminated string → “Hello

🚀 आधुनिक Lexical Analyzer में Token Recognition (2025):

🔹 Machine Learning आधारित Pattern Recognition।
🔹 Parallel DFA Traversal for speed।
🔹 Error-tolerant tokenizers (auto-correct suggestion)।
🔹 Incremental scanning in real-time compilers।

📙 निष्कर्ष:

Token Specification और Recognition Compiler Design की रीढ़ हैं। Regular Expressions से token की पहचान होती है और Finite Automata उन्हें recognize करता है। 2025 में, AI-सक्षम Lexical Analyzers ने token recognition को तेज़, सटीक और error-tolerant बना दिया है।

Specification and Recognition of Tokens in Compiler Design | टोकन की स्पेसिफिकेशन और पहचान - रेगुलर एक्सप्रेशन एवं फाइनाइट ऑटोमाटा सहित

टोकन की स्पेसिफिकेशन और पहचान (Specification and Recognition of Tokens in Compiler Design)

📘 टोकन क्या है?

🧩 Token के मुख्य भाग:

उदाहरण:

⚙️ Token Specification (टोकन की स्पेसिफिकेशन)

सामान्य Token Specifications:

Regular Expressions Example:

🧠 Token Recognition (टोकन की पहचान)

Token Recognition Process:

📊 Diagram (Token Recognition using DFA):

📗 Example (Step-by-Step Token Recognition):

⚙️ Ambiguity in Token Recognition:

📘 Lexical Errors in Token Recognition:

🚀 आधुनिक Lexical Analyzer में Token Recognition (2025):

📙 निष्कर्ष:

Specification and Recognition of Tokens in Compiler Design

📘 Token Specification:

Examples:

⚙️ Token Recognition:

Steps:

Example:

🧩 Handling Ambiguity:

🚀 Modern Enhancements (2025):

📙 Conclusion:

Related Post

Join With