Re-inventing the p/invoke generator

If you’ve been programming in C# for a while, at some point you found yourself needing to call C libraries. It isn’t often, but when you have to do it, it’s like pulling teeth. One option is to set up a C++/CLI interface; the other is a p/invoke interface to a DLL containing the C code. It’s relatively easy to set up a p/invoke interface in your C# code for the C code, which you export with a DLL–if you only need to call a few C functions. But, if the API is large, you stare at the code for a while, deciding whether it is really worth writing out all the declarations you need to make the calls. Many people throw caution to the wind, write packages for large, popular C APIs so you don’t have to, which you can find on the Nuget website. One example is ManagedCuda, an API for CUDA programming from C#. Unfortunately, people get tired of trying to keep these packages up to date, and so these packages become obsolete. Another approach is through automatic means, whereby a tool reads the C++ headers (or DLL) and output the decls you call. A p/invoke generator reads C header files and outputs C# code with the p/invoke declarations that you can include in your code. These tools sometimes work, but often they don’t.

This blog entry is a “heads up” note about my thoughts for a new type of p/invoke generator.

Existing p/invoke generators

SWIG is one of several p/invoke generators. It reads C header files and generates code that you can add to your program. To use SWIG, you need to specify what functions to generate, and how to map data types between C# and C. Unfortunately, it doesn’t help much if the API you are trying to call changes often, forcing you to re-learn how to use SWIG’s byzantine rules. Alternatives to SWIG are CppSharp and ClangSharp. These tools use Clang and hardcode AST visitor code to output finely tuned C# code. But, they aren’t extensible, and they also often get confused whether a function parameter is “in” or “out” or “ref”. Some of these tools are no longer available, adding to the “fun”.

SWIG

  • http://www.swig.org/
  • Uses a custom compiler, which is often incompatible with newer code.
  • Very general, but hard to use rules, often mysterious.
  • Does not seem to handle Clang installed headers on Windows. After using -DCINDEX_LINKAGE -DCINDEX_DEPRECATED, I was able to get it to work.

CppSharp

  • https://github.com/mono/CppSharp
  • Written in C++.
  • Uses clang to build AST.
  • Uses Clang Visitors to analyze ASTs and output C# type declarations.
  • Has some bells and whistles, but does not have type mapping.

ClangSharp

  • https://github.com/Microsoft/ClangSharp
  • Uses Clang to construct AST.
  • Uses Clang Visitors to analyze ASTs and output C# type declarations.
  • LlvmSharp contains a copy of ClangSharp which is different from ClangSharp’s official code.
  • Hasn’t been maintained for a while, and requires LLVM 6.0.1 (the current version is 7.0.0).

P/Invoke Interop Assistant (unavailable)

Pinvoker (unavailable)

xInterop C++.Net Bridge (unavailable)

  • Written by Shawn Liu.
  • Commercial product, no longer available.

Samples from current p/invoke generators

Generated Output of Clang’s Index.h

        // Input clang-c/Index.h
        CINDEX_LINKAGE unsigned clang_defaultSaveOptions(CXTranslationUnit TU);

        // CppSharp
        // CS file output
        public static uint ClangDefaultSaveOptions(global::clang-c.CXTranslationUnitImpl TU)
        {
            var __arg0 = ReferenceEquals(TU, null) ? global::System.IntPtr.Zero : TU.__Instance;
            var __ret = __Internal.ClangDefaultSaveOptions(__arg0);
            return __ret;
        }
        // ...
        [DllImport(libraryPath, EntryPoint = "clang_defaultSaveOptions", CallingConvention = CallingConvention.Cdecl)]
        public static extern uint defaultSaveOptions(CXTranslationUnit @TU);

        // SWIG
        // SWIG C#
        public static uint clang_defaultSaveOptions(SWIGTYPE_p_CXTranslationUnitImpl TU) {
           uint ret = ClangPINVOKE.clang_defaultSaveOptions(SWIGTYPE_p_CXTranslationUnitImpl.getCPtr(TU));
           return ret;//unsigned int
        }
        // SWIG C# Pinvoke
        [global::System.Runtime.InteropServices.DllImport("gen", EntryPoint="CSharp_Clang_clang_defaultSaveOptions")]
        public static extern uint clang_defaultSaveOptions(global::System.Runtime.InteropServices.HandleRef jarg1);
        // SWIG C.
        SWIGEXPORT unsigned int SWIGSTDCALL CSharp_Clang_clang_defaultSaveOptions(void * jarg1) {
           unsigned int jresult ;
           CXTranslationUnit arg1 = (CXTranslationUnit) 0 ;
           unsigned int result;
           arg1 = (CXTranslationUnit)jarg1; 
           result = (unsigned int)clang_defaultSaveOptions(arg1);
           jresult = result; 
           return jresult;
        }

Isn’t there a better way to do this?

After almost 15+ years since p/invoke was devised, it’s surprising that there still isn’t a good solution. Piggy is a tool I am writing to generate p/invoke decls using Clang and AST pattern matching to specify the source-to-source transformations. Like CppSharp and ClangSharp, it uses Clang’s AST to extra the data types to convert to C#. However, Piggy uses templates for AST pattern matching and output, not hardcoded C#/C++ code.

Source-to-source transformations with Clang is nothing new (and many more here), but I don’t think any tool has been written that uses templates on Clang ASTs for the p/invoke problem.

Clang-query

As pointed out by others (VSC++ blog; Bendersky’s blog), Clang-query is a developer tool that implements an AST pattern matching language and automaton to find specific nodes in the AST.

To view a Clang AST, run

clang -cc1 -ast-dump test.h

or

clang -Xclang -ast-dump test.h

To view the reconstructed text from the AST,

clang -cc1 -ast-print test.h

While Clang-query is OK for some tree pattern matching, I found it to be limited in what I can express.

  • How do I express a pattern to find an enumDecl() that has an enumContantDecl() with a specific integer value? There isn’t a clear way of constructing the pattern given an AST dump.
  • There is no clear identification of a node’s attributes. For example, |-LinkageSpecDecl 0x2c1fb29f918 <C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include\sal.h:2361:1, line:2967:1> line:2361:8 C. What is “C” referred to as in Clang-query? Linkage or Name or something else? Going to the AST Matcher specification, you ask “How do I find linkageSpecDecl’s of extern ‘C’?” You can only find nodes with linkageSpecDecl, but cannot make sure it is of that type, e.g., “C”, and not “C++”–which is also acceptable!

Fundamentally, the AST Matcher language has shortcomings that cannot be overcome in the near term. The only solution is to rewrite the entire output and query of ASTs.

The first part of the problem is to come up with a new serialized Clang AST.

Serializing a Clang AST

Piggy serializes a Clang AST from code derived from ASTDumper.cpp in libclang. Trees are in the form of a parenthesized expression. Attributes of the code are of the form Attribute=Value. After serialization, the tree is parsed to convert it into a normalized form that regular expressions can operate on.

So, for this .h file, we get the following AST.

int j();
enum e { a, b, c};
enum f { d, e};
( TranslationUnitDecl Pointer="0x20c48ee2bf8" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>"
  ( TypedefDecl Pointer="0x20c48ee34d0" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" Name="__int128_t" Type="__int128"
    ( BuiltinType Pointer="0x20c48ee3190" BareType="__int128" Sugar=""
  ) )
  ( TypedefDecl Pointer="0x20c48ee3538" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" Name="__uint128_t" Type="unsigned __int128"
    ( BuiltinType Pointer="0x20c48ee31b0" BareType="unsigned __int128" Sugar=""
  ) )
  ( TypedefDecl Pointer="0x20c48ee3898" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" Name="__NSConstantString" Type="struct __NSConstantString_tag"
    ( RecordType Pointer="0x20c48ee3620" BareType="struct __NSConstantString_tag" Sugar=""
      ( CXXRecord Pointer="0x20c48ee3588"Name="__NSConstantString_tag"
  ) ) )
  ( CXXRecordDecl Pointer="0x20c66febcb0" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" KindName=" class" Name="type_info" Attrs=""
    ( TypeVisibilityAttr Pointer="0x20c66febd70" SrcRange="<invalid sloc>" Attrs="Implicit" Value=" Default"
  ) )
  ( TypedefDecl Pointer="0x20c66febdc8" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" Name="size_t" Type="unsigned long long"
    ( BuiltinType Pointer="0x20c48ee2dd0" BareType="unsigned long long" Sugar=""
  ) )
  ( TypedefDecl Pointer="0x20c66febe60" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" Name="__builtin_ms_va_list" Type="char *"
    ( PointerType Pointer="0x20c66febe20" BareType="char *" Sugar=""
      ( BuiltinType Pointer="0x20c48ee2c90" BareType="char" Sugar=""
  ) ) )
  ( TypedefDecl Pointer="0x20c66febec8" SrcRange="<invalid sloc>"  SrcLoc="<invalid sloc>" Attrs="implicit" Name="__builtin_va_list" Type="char *"
    ( PointerType Pointer="0x20c66febe20" BareType="char *" Sugar=""
      ( BuiltinType Pointer="0x20c48ee2c90" BareType="char" Sugar=""
  ) ) )
  ( FunctionDecl Pointer="0x20c66febf78" SrcRange="c:\temp\include\help.h:2:1, col:7"  SrcLoc="col:5" Name="j" Type="int (void)" Attrs=""
  )
  ( EnumDecl Pointer="0x20c66fec058" SrcRange="line:3:1, col:17"  SrcLoc="col:6" Name="e"
    ( EnumConstantDecl Pointer="0x20c66fec120" SrcRange="col:10"  SrcLoc="col:10" Name="a" Type="enum e"
    )
    ( EnumConstantDecl Pointer="0x20c66fec170" SrcRange="col:13"  SrcLoc="col:13" Name="b" Type="enum e"
    )
    ( EnumConstantDecl Pointer="0x20c66fec1c0" SrcRange="col:16"  SrcLoc="col:16" Name="c" Type="enum e"
  ) )
  ( EnumDecl Pointer="0x20c66fec210" SrcRange="line:4:1, col:14"  SrcLoc="col:6" Name="f"
    ( EnumConstantDecl Pointer="0x20c66fec2d0" SrcRange="col:10"  SrcLoc="col:10" Name="d" Type="enum f"
    )
    ( EnumConstantDecl Pointer="0x20c66fec320" SrcRange="col:13"  SrcLoc="col:13" Name="e" Type="enum f"

) ) )

Piggy Templates

The key to writing p/invoke declarations are AST tree templates. These templates encode the tree match and the substituting expression output. Let’s see how this might work.

// () ast match
// <> text
// {} code

// (... (...)) <> vs (... <> (...)). An AST expression matches a set of sub-tree. The
// first matches the entire sub tree. The later matches the node, then matches additional
// sub-tree information (presumably for more template processing). In the implementation,
// the matcher processes in a tree traversal, so templated text is outputed while walking
// the tree.


template (% ParmVarDecl Name=* Type="const wchar_t *"
   {
		System.Console.Write("int " + $1.Name);
   }
   %)
   ;

template (% ParmVarDecl Name=* Type=*
   {
		System.Console.Write(MapDefaultType($1.Type) + " " + $1.Name);
   }
   %)
   ;

template (% FunctionDecl Name=* Type=* (% ParamVarDecl %)*
   {
      System.Console.WriteLine("[DllImport(\" + dll_name + "\", CallingConvention = global::System.Runtime.InteropServices.CallingConvention.ThisCall,");
      System.Console.WriteLine("\t EntryPoint=\"" + Mangled($1) + "\")]");
      System.Console.WriteLine("internal static extern " + Surgery($1.Type) + " " + $1.Name + "(" + $2.Output + ");");
   }
   %)
   ;

template
   (% EnumDecl Name=*
      < enum $1.Name { >
         ( 
            (% EnumConstantDecl Name=* Type=*
               (% IntegerLiteral Value=*
                  < {first?"":","; first = false;} $2.Name = $3.Value >
               %)
            %) |
            (% EnumConstantDecl Name=* Type=*
               < {first?"":","; first = false;} $5 >
            %)
          )*
      < } >
   %)
    ;

Matching and translation occur in a DFS post-order traversal. So, in this regard, it’s like the ClangSharp and CppSharp. Templates are matched in the order they appear. In addition, when the node is matched, the corresponding output is cached, that way results are composed bottom up. In the above example, ParmDecls are rewritten, with an exception first for const wchar_t*.

As I said before, there’s a large body of research on similar problems in source-to-source transformations. Rose is a compiler that rewrites ASTs, then outputs the source for the new AST. But, with Rose at least, there are no output templates–a translator requires one to rewrite to a new AST, then the translated source code from an AST traversal. However, we know this problem works in ClangSharp and CppSharp with an AST visitor pattern, which is a DFS visit, and without any AST rewriting.

Thoughts?

If any of you have some thoughts on this problem, let me know. I am interested. As I said, this problem has been going on for far too long and should be solved right. The code is still in the beginnings with an AST serializer, AST parser, template parser, and tree regular expression matcher.

 

Update — November 26, 2018: After working on this a bit more, I’ve updated the grammar and templates slightly. The code templates use three variables: tree is used to access the AST nodes corresponding to the matching template pattern; vars is an associative array (a Dictionary<>) to get values from one code template match to another; result is a StringBuffer to append output to. An example of the use is in the source code for Piggy.