Testing of Antlr language targets

A few weeks of work, and I finally have a driver generator that can test any grammar with any target.

alias make='mingw32-make.exe'
for i in CSharp Java JavaScript Dart Python3
    echo $i
    rm -rf Generated
    dotnet-antlr -t $i
    pushd Generated
    mingw32-make.exe run RUNARGS="-input 1+2 -tree"
    mingw32-make.exe clean
Posted in Tip

Forking a Git repo more than once

Github does not support forking a repo more than once. If you try to do that, it’ll just redirect to your original fork. That could be a problem if you have an outstanding pull request on a branch you want to modify for an unrelated change. Rather than muck around with the forked repo, create a new for this way.

  • git clone https://github.com/antlr/antlr4.git antlr4-pr2
  • In Github, create a new repo antlr4-pr2.
  • git remote rename origin upstream
  • git remote add origin https://github.com/yourgitid/antlr4-pr2.git
  • git remote -v
  • git push -v origin master

The main problem with this approach is that Github imposes a lot of restrictions on what you can do. Github “pull requests” cannot be performed from repositories that haven’t been explicitly “forked” through the UI. It might be best to follow the normal workflow: fork the repo, clone it, “git branch foobar”, “git checkout foobar”, then make your changes and checkin and publish from the Github Desktop app.

Posted in Tip

System Service Exception BSOD

I’ve been seeing BSOD, and did a dism and sfc. Looks like there were some problems…

C:\WINDOWS\system32>dism /online /cleanup-image /restorehealth

Deployment Image Servicing and Management tool
Version: 10.0.19041.572

Image Version: 10.0.19042.685

[==========================100.0%==========================] The restore operation completed successfully.
The operation completed successfully.

C:\WINDOWS\system32>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection found corrupt files and successfully repaired them.
For online repairs, details are included in the CBS log file located at
windir\Logs\CBS\CBS.log. For example C:\Windows\Logs\CBS\CBS.log. For offline
repairs, details are included in the log file provided by the /OFFLOGFILE flag.


Posted in Tip

Release 8.3 of Antlrvsix

After two months of work, I’m a few days away from making release 8.3 for Antlrvsix. Most of the changes actually pertain to Trash, the command-line interpreter shell contained within Antlrvsix, but there are a few other important changes to the extension itself.

Release 8.3 features an all-new input/output pipe implementation between commands in Trash. It uses a JSON serialization of parse trees between commands. The JSON serialization also contains the parser, lexer, token stream, file contents, and file location. Basically, it includes everything that one would need for a command to manipulate a parse tree. The purpose of the JSON file is so that each command can now be implemented “out-of-process”, meaning that Trash can now be replaced by Bash and each command implemented as an independent program in any language of choice. Nobody wants yet another monolithic program to implement programming language tools. This release works towards an open system. With this release, all the commands and Trash still are implemented in one program, but it will be switched over in a month or two.

I’ve also included a few new goodies:

  • Added a grammar diff program.
  • Added an ISO 14977 parser.
  • Added AGL (Automatic Graph Layout), XML, JSON output.
  • Added Cat, Echo, Wc commands.
  • Adding BNFC’s LBNF and other associated grammars. NB: This work is not yet complete, and only works for the simplest of grammars.

For ISO 14977, it was something that I decided to implement a while ago. But I didn’t know what I was getting into, and really should have read what D. Wheeler wrote about the spec. While it is now almost done, I learned along the way that the spec has several problems. One error is that the symbol meta identifier cannot contain spaces (meta identifier = letter, (letter | decimal digit);), yet throughout the spec–and meta identifier itself–meta identifier should allow spaces! And, as Wheeler pointed out, there are many other problems. Yes, grammars in Iso 14977 are very verbose…”a sea of commas”. But, it does have some interesting features, and so worth adding a parser for it.

The “diff” program I implemented with this release is interesting. I used the Zhang-Shasha tree-edit distance algorithm, extending it to record the actual tree edits that correspond to the minimum tree-edit distance. This algorithm is, unfortunately, for an ordered tree, so it works best for small differences. I will be trying to implement other algorithms in the next month or two. There is certainly a lot that could be done here. One important type of difference is to include no only simple single-node inserts and deletes, but more complex operations like fold and unfold.

In addition, with this release, I’m disabling semantic highlighting for VS2019. This is because it’s buggy and slow, and despite my warning people, they complain about it being buggy and slow! Use the extension for VSCode. It’s really very good. In the next release, I will try to fix Antlrvsix for VS2019, but you never know: Microsoft needs to implement semantic highlighting in its LSP client for VS2019.


Posted in Tip

Antlrvsix and Trash release v8.2

Released yesterday is version 8.2 of Antlrvsix and Trash. This release enhances the Trash shell further, making it look more and more like a full-fledged analogy to the Bash shell. However, Trash uses the lingua-franca of parse trees and XPath. In this release, one can now pipe output–parse trees–between commands. The find command has been renamed to xgrep to further the analogy with grep. Various output commands have been added, such as st and text. File globbing for ls, cd, parse and other commands has been rewritten to be more like what you would see in Bash. I’ve added a run command to generate, build, and run parsers.

All in all, I think Trash is finally becoming the tool I’ve envisioned for language development.

Posted in Tip

Updates to Antlrvsix

Many months ago, I had VSCode working with Antlrvsix. Back then, rather than release what I had, I decided to put it off because I was concentrating my effort on getting the server capabilities expanded. But, the main problem why I didn’t release a VSCode extension was that I could not support “semantic highlighting” in my server in a standardized way because the API I was using did not support it. Since then, the server capabilities have been enhanced. But, more importantly, I changed the API to get semantic highlighting in Antlrvsix to work with VSCode. I have now released Antlrvsix for VSCode to the Microsoft Marketplace for VSCode.

To get semantic highlighting working with VSCode, I decided to write a drop-in replacement for Microsoft’s Microsoft.VisualStudio.LanguageServer.Protocol API. While Microsoft does make a release of the API every three months or so, there are many features missing that have been in the LSP spec for years. Semantic highlighting is a crucial new addition, but I have no confidence that Microsoft will ever implement it based on the changes I’ve seen over the last year. This drop-in replacement is the current version with additions for semantic highlighting.

On the grammar transforms, I have a script for the “Trash” command-line tool of Antlrvsix to optimize the Java9 grammar partially. The transforms for expressions aren’t yet there, but so far, the optimized grammar parser works several times faster.

Posted in Tip

AntlrTreeEditing library added

I’ve pulled all the Antlr parse tree editing routines into its own library yesterday. This library fills a gap between what is offered in the Antlr runtime (a psuedo XPath library, AddChild and Parent accessors for ParserRuleContext), and a full-blown transformation system like ASF+SDF.

This library contains:

  • a beefed-up XPath version 2 library;
  • a tree construction routine from an s-expression notation and an Antlr parser and lexer.
  • Antlr parse tree node replace and delete;
  • an exhanced parse tree node type that supports an observer pattern to keep data structures in sync when you modify a parse tree.

Right now it’s just in C#, but I plan at some point to translate it to Java because it is very useful.


Posted in Tip

A command-line approach to transforms of Antlr grammars

This is just a note to myself regarding some ideas on a command-line tool for Antlr grammars. It should be clear to anyone: grammars, especially Antlr grammars, are first-class objects that can be manipulated and changed to improve readability or performance. Antlrvsix now incorporates almost two dozen transformations, ranging from unfold, fold, sort rules, remove useless rules, reformat rules, split grammars, merge grammars, etc.

The question is how best to structure these transformations for the extension, and more importantly, in a completely automated manner whereby I can read the grammar from a web page containing a spec of a particular language like Java, C#, or what have you.

The main problem in the tool is how to identify the parts of the grammar that I want to change with a transform. Here, it looks like there are two possibilities: (1) a line/column number range; or (2) a handy XPath expression(s) to identify a point(s) or range(s) in the grammar.

So, to make changes to the grammar, I could envision something like this:

cat Grammar.g4 | trash "//ruleSpec[/RULE_SPEC = 'e'] => unfold" | trash "=> split-grammar" 1> GrammarParser.g4 2> GrammarLexer.g4

Afterwards, I can write an online version of the Antlr transformation tool.

Also to note to myself this article by Figueira et al. is the only one that I found that describes the denotational semantics of XPath in an unambiguous manner. Can fold be implemented using XPath, where one entire sub-tree (implemented I suppose as a node-set) can be compared to another sub-tree?


Posted in Tip

Adding text editor highlighting rules in a dynamic manner

This is a note of an idea I posted in two Twitter threads (here, here, and here). I think it’s important to capture the idea before it gets lost when blogs and Twitter disappear.

The problem with “semantic highlighting”, or what I would just call syntactic highlighting because there are really many levels of highlighting based on lexical, CFG, static and dynamic semantics, is that it’s nearly impossible for a programmer to augment his editor with rules to perform the type of check he wants. TextMate highlights the lexical syntax of a program. LSP “semantic highlighting” considers the static semantics of the program. But, if you would like the editor to highlight something more interesting, like the live/dead analysis of a variable, or constant propagation, you’re basically out of luck.

Parsing entire parse trees does not solve the problem of identifying parts of the program that you are interested in. You are only interested in paths through the tree. XPath is the best solution here.

With a grammar and a parse tree decorated with the results of semantic analysis, many types of highlighting are now possible using an XPath-based solution. For example, using Antlr’s notation for lexical and CFG symbols, comments could be tagged with “//COMMENT => green”, keywords tagged with “//keyword => blue”, and fields tagged with “//field_declaration//variable_declarator/identifier => pink”. To employ a new highlighting, one would simply tell the editor to re-tag the text using a new collection of rules.

The only problem with this idea is implementing the static semantics for the problem you are interested in.


Posted in Tip

Tree transformations via XPath and S-expressions

I’ve finally have the right tools to now implement transforms over an Antlr parse tree.

The first part of a transform is identifying what nodes in the tree that are going to be replaced. It turns out that the best tool to do that is an XPath engine, which I’ve rewritten in C# from Java over the last month.

The other part of a transform–and far simpler to implement–is a way to express a tree that is created and spliced into an existing parse tree. Here, the work I did in Piggy for S-expressions comes in handy. For example, the expression “( ruleAltList ( labeledAlt ( alternative { child } )))” identifies the Antlr parse tree to create, and splices in a “child” node into a created “alternative” node. Note, this is a slightly different usage for “S-expressions” that you may be used to, in which a node is an unnamed pair, but conveys the same purpose.

With XPath and S-expressions, I can now rewrite all the transforms that I hardcoded in C# for Antlrvsix. The code implementing both parts is here, but I will be forking this code and placing it under in the Antlrvsix source until I see a need to place this code in a separate Nuget package.

At this moment, I’m not exactly sure what language and control structures to add on top of XPath and S-expressions. For now, these two tools should suffice along with C# to glue the pieces together in order to modify an Antlr parse tree. The other consideration is having intermediate results of an XPath expression. For example, I may want to get all ruleAltList’s but continue down the tree for a particular child. I’ve fixed a bug in the Eclipse XPath library to allow an intermediate result to be used as context for another XPath expression. But I might consider extending XPath to bind the results of an intermediate result into a C# variable.

One other note–Why am I not looking at term rewriting systems? I have. The problem is that they are not practical for two reasons: (1) integrating it with Antlr parse trees would not be easy; (2) most do not express manipulations directly on a tree. XSLT is one example. Here, the language isn’t specifying tree rewrites, but the construction of an entirely new tree. I also looked at TXL. Here, the language isn’t about trees, but term rewrites in the target language. I would need to convert the Antlr grammar into TXL grammar syntax. In all these systems, I would need to fit in Antlr parse trees into the framework. Again, all I want is to manipulate trees.

What of Piggy? Unfortunately, Piggy is not a tree editing library. While it recognizes tree nodes, the problem is that it then executes user code that performs an output on a DFS traversal of the tree. Again, all I want is to manipulate an existing tree, then do something later with that tree.


Posted in Tip