If you’ve been programming in C# for a while, at some point you found yourself needing to call C libraries. It isn’t often, but when you have to do it, it’s like pulling teeth. One option is to set up a C++/CLI interface; the other is a p/invoke interface to a DLL containing the C code. It’s relatively easy to set up a p/invoke interface in your C# code for the C code, which you export with a DLL if you only need to call a few C functions. But, if the API is large, you stare at the code for a while, deciding whether it is really worth writing out all the declarations you need to make the calls. Many people throw caution to the wind, write packages for large, popular C APIs so you don’t have to, which you can find on the Nuget website. One example is ManagedCuda, an API for CUDA programming from C#. Unfortunately, people get tired of trying to keep these packages up to date, and so these packages become obsolete. Another approach is through automatic means, whereby a tool reads the C++ headers (or DLL) and output the decls you call. A p/invoke generator reads C header files and outputs C# code with the p/invoke declarations that you can include in your code. These tools sometimes work, but often they don’t.

This blog entry is a note about my thoughts for a new type of p/invoke generator based on templates on abstract or concrete syntax trees.

Existing p/invoke generators

SWIG is one of several p/invoke generators. It reads C header files and generates code that you can add to your program. To use SWIG, you need to specify what functions to generate, and how to map data types between C# and C. Unfortunately, it doesn’t help much if the API you are trying to call changes often, forcing you to re-learn how to use SWIG’s byzantine rules. Alternatives to SWIG are CppSharp and ClangSharp. These tools use Clang and hardcode AST visitor code to output finely tuned C# code. But, they aren’t extensible, and they also often get confused whether a function parameter is “in” or “out” or “ref”. Some of these tools are no longer available, adding to the “fun”.

SWIG

  • http://www.swig.org/
  • Uses a custom compiler, which is often incompatible with newer code.
  • Very general, but mysterious, hard to use rules.
  • Does not seem to handle Clang installed headers on Windows. After using -DCINDEX_LINKAGE -DCINDEX_DEPRECATED, I was able to get it to work.

CppSharp

ClangSharp

  • https://github.com/Microsoft/ClangSharp
  • Uses Clang to construct AST.
  • Uses Clang Visitors to analyze ASTs and output C# type declarations. LlvmSharp contains a copy of ClangSharp which is different from ClangSharp’s official code.
  • Offers dozens upon dozens of command-line args to for custom code generation.

P/Invoke Interop Assistant (defunct)

Pinvoker (defunct)

xInterop C++.Net Bridge (defunct)

  • Written by Shawn Liu.
  • A commercial product.

Samples from current p/invoke generators

Generated Output of Clang’s Index.h

        // Input clang-c/Index.h
        CINDEX_LINKAGE unsigned clang_defaultSaveOptions(CXTranslationUnit TU);
        // CppSharp
        // CS file output
        public static uint ClangDefaultSaveOptions(global::clang-c.CXTranslationUnitImpl TU)
        {
            var __arg0 = ReferenceEquals(TU, null) ? global::System.IntPtr.Zero : TU.__Instance;
            var __ret = __Internal.ClangDefaultSaveOptions(__arg0);
            return __ret;
        }
        // ...
        [DllImport(libraryPath, EntryPoint = "clang_defaultSaveOptions", CallingConvention = CallingConvention.Cdecl)]
        public static extern uint defaultSaveOptions(CXTranslationUnit @TU);
        // SWIG
        // SWIG C#
        public static uint clang_defaultSaveOptions(SWIGTYPE_p_CXTranslationUnitImpl TU) {
           uint ret = ClangPINVOKE.clang_defaultSaveOptions(SWIGTYPE_p_CXTranslationUnitImpl.getCPtr(TU));
           return ret;//unsigned int
        }
        // SWIG C# Pinvoke
        [global::System.Runtime.InteropServices.DllImport("gen", EntryPoint="CSharp_Clang_clang_defaultSaveOptions")]
        public static extern uint clang_defaultSaveOptions(global::System.Runtime.InteropServices.HandleRef jarg1);
        // SWIG C.
        SWIGEXPORT unsigned int SWIGSTDCALL CSharp_Clang_clang_defaultSaveOptions(void * jarg1) {
           unsigned int jresult ;
           CXTranslationUnit arg1 = (CXTranslationUnit) 0 ;
           unsigned int result;
           arg1 = (CXTranslationUnit)jarg1; 
           result = (unsigned int)clang_defaultSaveOptions(arg1);
           jresult = result; 
           return jresult;
        }

Is there a better way to do this?

After almost 15+ years since p/invoke was devised for C# to call C/C++, it’s surprising that there still isn’t a good solution.

Piggy is a tool I am writing to generate p/invoke decls using Clang and AST pattern matching to specify the source-to-source transformations. Like CppSharp and ClangSharp, it uses Clang’s AST to extra the data types to convert to C#. However, Piggy offers one to explicitly write templates for AST pattern matching and output, not hardcoded methods or switches. The context of the template is fully visible so one can understand exactly where it applies.

Source-to-source transformations with Clang is nothing new–there are dozens of citations in Google Scholar. But I don’t think any tool has been written that uses custom-templates on Clang ASTs for generating p/invoke declarations.

Clang-query

As pointed out by others, including VSC++ blog and Bendersky’s blog, Clang-query is a developer tool that implements an AST pattern matching language and automaton to find specific nodes in the AST.

To view a Clang AST, run

clang -cc1 -ast-dump test.h

or

clang -Xclang -ast-dump test.h

To view the reconstructed text from the AST,

clang -cc1 -ast-print test.h

While Clang-query is OK for some tree pattern matching, I found it to be limited in what I can express.

  • How do I express a pattern to find an enumDecl() that has an enumContantDecl() with a specific integer value? There isn’t a clear way of constructing the pattern given an AST dump.
  • There is no clear identification of a node’s attributes. For example, |-LinkageSpecDecl 0x2c1fb29f918 <C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include\sal.h:2361:1, line:2967:1> line:2361:8 C. What is “C” referred to as in Clang-query? Linkage or Name or something else? Going to the AST Matcher specification, you ask “How do I find linkageSpecDecl’s of extern ‘C’?” You can only find nodes with linkageSpecDecl, but cannot make sure it is of that type, e.g., “C”, and not “C++” which is also acceptable! Fundamentally, the AST Matcher language has shortcomings that cannot be overcome in the near term. The only solution is to rewrite the entire output and query of ASTs.

The first part of the problem is to come up with a new serialized Clang AST.

Serializing a Clang AST

Piggy serializes a Clang AST from code derived from ASTDumper.cpp in libclang. Trees are in the form of a parenthesized expression. Attributes of the code are of the form Attribute=Value. After serialization, the tree is parsed to convert it into a normalized form that regular expressions can operate on.

So, for this .h file, we get the following AST.

Piggy Templates

The key to writing p/invoke declarations are AST tree templates. These templates encode the tree match and the substituting expression output. Let’s see how this might work.

Matching and translation occur in a DFS post-order traversal. So, in this regard, it’s like the ClangSharp and CppSharp. Templates are matched in the order they appear. In addition, when the node is matched, the corresponding output is cached, that way results are composed bottom up. In the above example, ParmDecls are rewritten, with an exception first for const wchar_t*.

As I said before, there’s a large body of research on similar problems in source-to-source transformations. Rose is a compiler that rewrites ASTs, then outputs the source for the new AST. But, with Rose at least, there are no output templates a translator requires one to rewrite to a new AST, then the translated source code from an AST traversal. However, we know this problem works in ClangSharp and CppSharp with an AST visitor pattern, which is a DFS visit, and without any AST rewriting.

Thoughts?

If any of you have some thoughts on this problem, let me know. I am interested. As I said, this problem has been going on for far too long and should be solved right. The code is still in the beginnings with an AST serializer, AST parser, template parser, and tree regular expression matcher.