Tagging, TextMate, regular and context-free grammars

This is a question I recently asked on a Gitter conversation about LSP and colorization. It’s worth copying it here because there are a number of problems and solutions.

Hi Folks, I have an LSP client/server for Antlr for VS2019, but it does not perform “colorization” because [server] and [client] do not implement version LSP 3.15 that specifies colorization. In fact, colorization of symbols has been in the LSP spec for a couple of years–since 3.6–yet it hasn’t been implemented, so I doubt it will ever be implemented. The 2-year old documentation ]here] gives a glossy hint to use TextMate. However, TextMate tmlanguage files encode the grammar of a language using an (unreadable) XML format, whereas the grammar of the language is already encoded (in a clean, human-readable grammar using Antlr) in the LSP server. That’s the whole point of LSP. And all I really want is to register with VS a <symbol type, color> list. Does anyone have a simple example of an LSP client for VS 2019 with a minimal TextMate spec that specifies just the colorization? The alternative seems to be to create an MEF classification tagger that communicates directly with my LSP server. Thanks for the info. – Ken

Later on, I note this:

I think I figured out how to add colorized tagging to my LSP client extension. Using TextMate for tagging is probably the wrong idea–all syntax is encoded in the server, and duplication of the grammar (somehow) in a TextMate .tmLanguage file is hard. I tried two existing Antlr .tmLanuguage files ([here] and [here]), and while colorized tagging does occur, they both do a poor job–ANTLRv4Lexer.g4 is all green; in ANTLRv4Parser.g4, some nonterminals are the wrong color. While it is possible to approximate the CFG rules for symbols with a regular grammar, it’s not easy. That is the point of LSP. Indeed, the better solution is to write another ITaggerProvider/ITagger. If I can find an existing tagger in buffer.Properties[] or something in the LSP client API that gives the symbols in the file, that would be great. Otherwise, I will have to directly query the LSP server the symbols for the file.

There is another possibility that could work, but I just don’t have the time for: write a tool to derive the TextMate rules automatically from the Antlr grammar.

Update Feb 1, 2020: The documentation describes a “Middlelayer” that can be used to intercept messages between client and server. It seems possible to write a classifier using the information captured from the server. It turns out that after many attempts to get an ILanguageClientCustomMessage class to work, I just happened to use instead ILanguageClientCustomMessage2, a completely undocumented interface type, and for which there are no examples in Github that use it. But, it works!

Update Feb 3, 2020: I tried ILanguageClientCustomMessage2 with a “Middle Layer”, and that only sporadically calls the CanHandle() listener, and of course, never for “TextDocumentDocumentSymbol”. As a result, I opened an issue on the Github pages for the VS documentation (4735). (I don’t think it’ll ever be fixed b/c it’s “Pri3”. The developers will probably sit on it, low priority.) I’ll next try to create a custom message to send to the server to get symbols. But, this is getting a little ridiculous.

Update Feb 5, 2020: I think I have a solution. Using the ILanguageClientCustomMessage2 interface, I can send a custom message to my server using the supplied JsonRpc. This method would duplicate the TextDocumentDocumentSymbol method. I can then implement GetTags() of a ITagger to colorize the text. The client API seems to work in shunting the results correctly back to my procedure call. I am now in position to implement the tagger. It is important to make sure when implementing the tagger to make sure the JsonRpc has been suppolied due to the non-deterministic order of tagger vs. LSP client creation.

Update Feb 7, 2020: I now have colorized tagging working.

Posted in Tip