XPath (1, 2, 3) is a language for finding nodes in an XML tree, and has a long history in AST search. Maletic et al. (4) is probably the first paper on XPath used on ASTs, using Antlr. It was further researched and is now part of the OSS world (5). In 2014, Parr added to Antlr releases an XPath API to search Antlr-generated ASTs (6). Src-d uses XPath and an engine for “universal” ASTs (7).
Piggy patterns are similar to XPath expressions, and there is a simple grep function to the Piggy Tool. Beyond the superficial difference in syntax, Piggy patterns differ from XPath patterns in two ways.
First, the Piggy and XPath search engines “select” different things. XPath patterns select a list of nodes or attributes in the frontier of a tree. Piggy patterns find a partial subtree of nodes, selecting all nodes found in the pattern. Second, Piggy extends the notion an expression into a pass, which is an order list of expressions to match in the AST. After matching and selecting nodes, further nodes in the tree are considered. However unlike XPath, the pattern matching engine eliminates further matches from that root of the matching sub-tree. In this regard, Piggy patterns are more like “visitor patterns”, but should be extended for “listener patterns” (8), so as to be used in symbol table construction.
XPath and Piggy pattern syntax comparison
|bookstore||( bookstore )||Selects all nodes with the name “bookstore”|
|/bookstore||no equivalent; you must use an explicit top-level node name with Kleene star||Selects the root element bookstore|
|//book||(* book *)||Selects all book elements no matter where they are in the document|
|bookstore//book||( bookstore (* book *) )||Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element|
|//@lang||(* lang=* *)
Note: Piggy cannot select attributes of an AST, only the nodes themselves. However, it is possible to find nodes with specific attribute values (see below), or nodes missing a particular attribute.
|XPath: Selects all attributes that are named lang. Piggy: selects the NODES of the AST that have an attribute lang with any name.|
|//title[@lang]||(* title lang=* *)||Selects all the title elements that have an attribute named lang.|
|//title[@lang=’en’]||(* title lang=”en” *)||Selects all the title elements that have a “lang” attribute with a value of “en”|
|/bookstore/book[price>35.00]/title||No numeric comparison (>) in Piggy, everything is a string. Expressions are RegEx patterns||Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00|
|//book/title | //book/price||(% (* book (title) *) | (* book (price) *) %)||Selects all the title AND price elements of all book elements|
A few other notes. I’ve found the language used to describe “nodes” confusing in XPath. According to the XPath Spec (9), tutorials and Wikipedia page on XPath, XPath is a notation for selecting “nodes”, including “attribute nodes”. But you should be clear: XML never uses the word “node” within the spec (10) for elements, let alone elements. An attribute in XPath sense of the term is an XML element that contains character content; a node in XPath world is an XML element that contains other XML elements in the content section of an XML element. In Piggy, attributes are not “nodes”. A “node” is an aggregate that can contain “attributes”, which is more akin to XML attributes.
XPath has a notation for directly addressing parent and sibliing “axes”. Piggy does not. The reason is that Piggy ties the output and code content to the AST structure, in order, as a tree. Introducing parent accessor functions would complicate what it would mean to insert code or text during the traversal of the AST for code generation.
- XPath, https://en.wikipedia.org/wiki/XPath, accessed Jan 12, 2019
- XPath Syntax, https://www.w3schools.com/xml/xpath_syntax.asp, accessed Jan 12, 2019
- https://www.srcml.org/, accessed Jan 12, 2019
- Parse Tree Matching and XPath, https://github.com/antlr/antlr4/blob/master/doc/tree-matching.md, accessed Jan 12, 2019
- Src-d, https://github.com/src-d/engine/blob/master/README.md, access Jan 12, 2019.
- Antlr4 – Visitor vs Listener Pattern, https://saumitra.me/blog/antlr4-visitor-vs-listener-pattern/, accessed Jan 12, 2019.
- https://www.w3.org/TR/1999/REC-xpath-19991116/, accessed Jan 12, 2019
- https://www.w3.org/TR/REC-xml/#attdecls, access Jan 12, 2019