# XPath cheatsheet for Antlr parse trees

Antlr4 contains an XPath engine within the runtimes of each target (Cpp, CSharp, Java, Python2, Python3; the implementations for Dart, Go, JavaScript, PHP, and Swift are missing), and documented here. While this XPath engine is a good start, it is not quite XPath version 1, misses many more features of XPath version2, which is what I feel is necessary for a compiler writer to explore debugging the grammar.

Thus, I have written an XPath2 engine to the CSharp target, which is a port of the Eclipse XPath2 code. This XPath engine offers a more realistic set of operations that compiler writer would need.

This document is a quick “cheatsheet” for this engine.

## Terminology

### Step

- Steps are separated by ‘/’.
- Absolute location path: /step/step/…
- Relative location path: step/step/…

Each step is evaluated against the nodes in the current node-set.

A step consists of:

- an axis (defines the tree-relationship between the selected nodes and the current node)
- a node-test (identifies a node within an axis)
- zero or more predicates (to further refine the selected node-set)

Syntax: axisname::nodetest[predicate]

## Selectors

### Descendant selectors

Expr | Grammar | Result |

//expression | Arithmetic.g4 | select parser rules named “expression” |

/ | Arithmetic.g4 | select the document (not the IParseTree root of grammar) |

/file_ | Arithmetic.g4 | select parser rules named “file_” (root of the parse tree) |

file_ | Arithmetic.g4 | select parser rules named “file_” (root of the parse tree) |

//expression/expression | Arithmetic.g4 | select parser rules named “expression” that must have a parent named “expression” |

//. | Arithmetic.g4 | select all attributes of all nodes |

//* | Arithmetic.g4 | select all children of all nodes (non-attributes) |

### Attribute selectors

Trash provides attributes for Antlr4’s CommonParserRule and TerminalNodeImpl: ChildCount, SourceInterval, Start, End, Text.

Expr | Grammar | Result |

//expression/@Text | Arithmetic.g4 | select text attribute of “expression” |

//expression/@SI | Arithmetic.g4 | select “SourceInterval” attribute of “expression” |

//expression/@ChildCount | Arithmetic.g4 | select number of children attribute of “expression” |

//expression/@Start | Arithmetic.g4 | select start token index of “expression” |

//expression/@End | Arithmetic.g4 | select end token index of “expression” |

//SCIENTIFIC_NUMBER/@Text | Arithmetic.g4 | select text attribute of “SCIENTIFIC_NUMBER” |

//SCIENTIFIC_NUMBER/@SI | Arithmetic.g4 | select “SourceInterval” attribute of “SCIENTIFIC_NUMBER” |

//SCIENTIFIC_NUMBER/@ChildCount | Arithmetic.g4 | select number of children attribute of “SCIENTIFIC_NUMBER” |

//SCIENTIFIC_NUMBER/@Start | Arithmetic.g4 | select start token index of “SCIENTIFIC_NUMBER” |

//SCIENTIFIC_NUMBER/@End | Arithmetic.g4 | select end token index of “SCIENTIFIC_NUMBER” |

//SCIENTIFIC_NUMBER/@* | Arithmetic.g4 | select all attributes of “SCIENTIFIC_NUMBER” |

### Order selection

Expr | Grammar | Result |

/file_/expression[2] | Arithmetic.g4 | select second “expression” child of “file_” (root) |

/file_/expression[last()] | Arithmetic.g4 | select last “expression” child of “file_” (root) |

/file_/*[name()=”expression”][last()] | Arithmetic.g4 | select last “expression” child of “file_” (root) (uses predicate) |

### Predicates

Expr | Grammar | Result |

//SCIENTIFIC_NUMBER[text()=’1’] | Arithmetic.g4 | select all SCIENTIFIC_NUMBER that have text ‘1’ |

//*[not(name()=”expression”)] | Arithmetic.g4 | select all but “expression” |

### Operators

#### Comparison

Expr | Grammar | Result |

//.[name() = “expression”] | Arithmetic.g4 | select all “expression” |

//.[name() != “expression”] | Arithmetic.g4 | select all but “expression” |

//.[@ChildCount > 1] | Arithmetic.g4 | select all nodes that have more than one child (attributes and text are not children) |

#### Logic (and/or)

Expr | Grammar | Result |

//expression[@Start=”0” and @End=”0”] | Arithmetic.g4 | select all “expression” that have Start=0 and End=0 |

//expression[@Start=”0” or @Start>3] | Arithmetic.g4 | select all “expression” that have Start=0 or Start>3 |

### Union

Expr | Grammar | Result |

//expression | //SCIENTIFIC_NUMBER | Arithmetic.g4 | select all “expression” and SCIENTIFIC_NUMBER |

### Using node sets in predicates

#### Use them inside functions

Expr | Grammar | Result |

//expression[count(expression) > 1] | Arithmetic.g4 | select all “expression” that has more than one “expression” children |

//expression[expression] | Arithmetic.g4 | select all “expression” that has an expression child |

#### Functions

Function | Expr | Result |

name() | //*/name() | select the name of the node |

text() | //*/text() | select the nodes with text |

count(x) | count(//*) | count the number of nodes |

position() | //*/position() | select the position of the node in the list |

## Examples

Extracting relationships between classes

## Online XPath

The best XPath engine online is XPather.com. There are others (FreeFormatter.com, w3schools.com) but they are not as good.

A good reference for XPath is on Mozilla. You can also check devhints.io, Wikipedia, w3.org. Mulberry Tech slides on XPath2