badge icon

This article was automatically translated from the original Turkish version.

Article

Parse Tree is one of the fundamental concepts in formal language theory and is a graphical representation that shows, in a hierarchical structure, how a string is generated according to the production rules (grammar) of a language. Parse trees are widely used in computer science, especially in compiler design, natural language processing (NLP), analysis of mathematical expressions, and sentence parsing.


Essentially, a parse tree visually represents whether a given input string can be generated by a particular context-free grammar (CFG), and if so, through which sequence of production steps. This structure also includes the order of application of production rules and intermediate stages.

Parse Tree in the Context of Formal Language Theory

The foundation of parse trees lies in formal language theory. A grammar G = (V, Σ, R, S) consists of four components:


  • V: Contains all symbols (terminal and non-terminal).
  • Σ: The set of terminal symbols.
  • R: The set of production rules.
  • S: The start symbol.


A parse tree:


  • Has the start symbol as its root node.
  • Has internal nodes (non-terminal nodes) that represent the left-hand side of production rules.
  • Has leaves that contain terminal symbols (or the empty symbol ϵ).
  • Has each internal node’s children represent the symbols on the right-hand side of the corresponding production rule.

Structure of a Parse Tree

A parse tree reaches terminal symbols through branches stemming from a root node. Nodes are divided into two types:


  • Internal nodes (non-terminal): Indicate that a production rule has been applied.
  • Leaf nodes (terminal): Indicate that the derivation has ended and the symbol is now terminal.


Each node carries a label. For example, in the expression (x + y) * z, the root node is E. The tree branches depending on the order in which production rules are applied.


Example of parse tree structure (generated by artificial intelligence.)

Derivation Types: Leftmost and Rightmost

A parse tree can arise from multiple different derivations. During derivation:


  • Leftmost derivation: In each step, the leftmost non-terminal is replaced.
  • Rightmost derivation: In each step, the rightmost non-terminal is replaced.


Example: x + y * z

  • Leftmost: E⇒E+E⇒x+E⇒x+E∗E⇒x+y∗E⇒x+y∗z
  • Rightmost: E⇒E∗E⇒E+E∗E⇒x+E∗E⇒x+y∗E⇒x+y∗z


Here, different derivations can produce the same expression. If different derivations correspond to different parse trees, the grammar is ambiguous.

Ambiguity

A grammar is ambiguous if it can generate more than one distinct parse tree for a given string. For example, the expression a + b * c can be parsed differently depending on whether multiplication or addition has higher precedence. This is undesirable in compilers. Therefore, most programming language grammars are unambiguous and explicitly define precedence rules.


Example:

Ambiguous grammar: E→E+E, E→E∗E, E→a∣b∣c


This grammar can parse the expression a + b * c as either (a+b)*c or a+(b*c).

Parse Trees and BNF Notation

Backus-Naur Form (BNF) is commonly used to define parse trees. In BNF, production rules are typically written as:



Example: Parse tree for the expression 4 + 8 * (2 - 3);

Constructing a Parse Tree: Algorithm

Constructing a parse tree is one of the fundamental operations in formal language theory. In particular, parsing fully parenthesized mathematical expressions provides a concrete application of theoretical concepts. During parsing, each symbol is treated as a token, and the parsing process proceeds step by step over this sequence of tokens.


Four basic rules define the construction of a parse tree:


  1. Left Parenthesis ((): Indicates the start of a new sub-expression. A new node is added as the left child of the current node, and the algorithm descends into this new node.
  2. Operator (+, -, *, /, etc.): The operator is assigned to the current node. A new node is then created as the right child of the current node, and the algorithm descends into this right child.
  3. Operand (number): The numeric value is assigned to the current node. The algorithm then returns to the parent node.
  4. Right Parenthesis ()): The sub-expression is complete; the algorithm ascends to the parent node.


This process continues recursively until the entire tree is built. A stack is essential to preserve parent-child relationships. Each time the algorithm descends to a child node, the current node is pushed onto the stack; each time it ascends, the node is popped from the stack.


Application Example:

For the expression ( 3 + ( 4 * 5 ) ):

  • The expression is tokenized as ['(', '3', '+', '(', '4', '*', '5', ')', ')'].
  • The first '(' creates the root node.
  • '3' is added as a leaf.
  • '+' is placed as the operator.
  • A new '(' initiates a sub-expression.
  • '4' is added, '*' is assigned as the operator, and '5' is added.
  • The closing ')' causes the algorithm to ascend to the parent node.
  • The final ')' completes the entire structure.


At the end of this process, a complete parse tree is formed. Each branch corresponds to a parenthesized sub-expression.

Evaluating a Parse Tree

Constructing a parse tree not only provides a structural analysis of a string but also enables the computation of its numerical value. In this context, evaluating a parse tree is a form of tree-based computation algorithm.


The evaluation algorithm is recursive:


  • Base Case: If the node is a leaf (operand), return its value directly.
  • Recursive Case: If the node contains an operator, recursively evaluate its left and right child nodes, then apply the operator to the returned values.


For example, for (3 + (4 * 5)):


  1. 4 * 5 = 20 is computed.
  2. 3 + 20 = 23 is obtained.


This method automatically preserves the correct order of operations even in complex expressions, because the parse tree inherently encodes the precedence relationships between operations.

LL(1) and LR(1) Parsers

LL(1) Parser

LL(1) parsing is a top-down parsing method.

  • First L (Left-to-right): Reads the input from left to right.
  • Second L (Leftmost derivation): Applies leftmost derivation.
  • (1): One token of lookahead is sufficient.


In LL(1) grammars, the production rule to apply at each step can be determined solely by examining the next input token. This property enables the design of grammars that are easily understandable by humans. As in grammar G2, operator precedence in arithmetic expressions is explicitly defined, eliminating ambiguity.

LR(1) Parser

LR(1) parsing is a bottom-up parsing method.


  • First L (Left-to-right): Reads the input from left to right.
  • R (Rightmost derivation): Reconstructs rightmost derivation steps in reverse order.
  • (1): One token of lookahead is sufficient.


LR(1) grammars can define more complex language structures than LL(1) grammars. As in grammar G3, determining which rule to apply in certain language constructs requires considering not only the position of the symbol but also operator precedence. Therefore, LR(1) parsers use a shift/reduce algorithm to process the input and determine which production rule to apply.


For example, in the expression (x + y) * z, the LR(1) algorithm:


  • Starts with the cursor at the leftmost position.
  • Uses shift operations to accumulate symbols.
  • Uses reduce operations to recognize defined patterns and replace them with higher-level symbols.
  • These steps proceed step by step in reverse order of a rightmost derivation.

Author Information

Avatar
AuthorBeyza Nur TürküDecember 3, 2025 at 8:38 AM

Tags

Discussions

No Discussion Added Yet

Start discussion for "Decision Tree" article

View Discussions

Contents

  • Parse Tree in the Context of Formal Language Theory

  • Structure of a Parse Tree

  • Derivation Types: Leftmost and Rightmost

  • Ambiguity

  • Parse Trees and BNF Notation

  • Constructing a Parse Tree: Algorithm

  • Evaluating a Parse Tree

  • LL(1) and LR(1) Parsers

    • LL(1) Parser

    • LR(1) Parser

Ask to Küre