Mono Mini in a Nutshell

What the hell is Mono Mini?

Mono Mini is the LLVM JIT for Mono. It translates CIL into LLVM IR.


Why is it important?

Mono is a full implementation of C# that is platform independent, i.e., where programs can run on Windows, or Linux. The compiler for C# translates C# into CIL code, and Mono Mini translates CIL into machine code that can be executed. Mono Mini uses the LLVM backend, a project decades in development.

You may not care to know the details, but for my project Campy, which compiles C# code to parallelized GPU code, Mono Mini is highly relevant. Mono Mini has a simple parser of the CIL, and a translator to converted into LLVM. There are other libraries out there to compile CIL, like SharpLang and LLILC, but the source code for either are not easy readings.


Where is the code?

It’s located in the Mono Github repository.

The source for Mini is in the directory mono/mini/.


What is Mono Mini written in?

Good ol’ primitive C!


What documentation is there?

You will note very little information in regards to how to read the code, where to start. The section Architecture describes mostly what happens in LLVM, not the more interesting part of how CIL is converted into LLVM IR.


What LLVM API does Mono Mini use?

First, Mini uses a special fork of LLVM:

Mini uses the LLVM-C API.


Where is the main instruction decode/translate loop to convert CIL into LLVM IR?

The main loop is in the function mono_method_to_ir(), located in mono/mini/method-to-ir.c. The loop decodes each instruction on the fly, switching on the opcode of the CIL instruction. The start of the while-loop occurs after quite a bit of stuff to set up a method. “while (ip < end) {”


How is an instruction first decoded/parsed?

A large switch-statement in the main loop parses bytes for each instruction and creates an internal instruction. Most of the work is done in C macros that eventually call MONO_INST_NEW and MONO_ADD_INST.  For example, EMIT_NEW_ARGLOAD() is called:

#define EMIT_NEW_ARGLOAD(cfg,dest,num) do { NEW_ARGLOAD ((cfg), (dest), (num)); MONO_ADD_INS ((cfg)->cbb, (dest)); } while (0)

Expanding further:

#define NEW_ARGLOAD(cfg,dest,num) NEW_VARLOAD ((cfg), (dest), cfg->args [(num)], cfg->arg_types [(num)])

And further:

#define NEW_VARLOAD(cfg,dest,var,vartype) do { \
 MONO_INST_NEW ((cfg), (dest), OP_MOVE); \
 (dest)->opcode = mono_type_to_regmove ((cfg), (vartype)); \
 type_to_eval_stack_type ((cfg), (vartype), (dest)); \
 (dest)->klass = var->klass; \
 (dest)->sreg1 = var->dreg; \
 (dest)->dreg = alloc_dreg ((cfg), (MonoStackType)(dest)->type); \
 if ((dest)->opcode == OP_VMOVE) (dest)->klass = mono_class_from_mono_type ((vartype)); \
 } while (0)

And further, to define a structure to store the decoded instruction:

#define MONO_INST_NEW(cfg,dest,op) do { \
 (dest) = (MonoInst *)mono_mempool_alloc ((cfg)->mempool, sizeof (MonoInst)); \
 (dest)->inst_c0 = (dest)->inst_c1 = 0; \
 (dest)->next = (dest)->prev = NULL; \
 (dest)->opcode = (op); \
 (dest)->flags = 0; \
 (dest)->type = 0; \
 (dest)->dreg = -1; \
 (dest)->cil_code = (cfg)->ip; \
 } while (0)


Right. Where is the actual CIL instruction converted into LLVM IR?

Well, first off, there is no CIL instruction. It’s actually represented in a peudo CIL instruction set (which I will call pCIL), computed as described above. For example, it contains things like OP_PHI and OP_MOVE, which do not exist in CIL.

The translation of pCIL to LLVM IR is in mini-llvm.c. The main loop for translation to LLVM IR is in process_bb(), , at “for (ins = bb->code; ins; ins = ins->next) {” . In that for-loop, conversion is contingent on the op-code of the abstract CIL instruction, in the fat switch statement “switch (ins->opcode) {”  .  Conversion is performed per method, in mono_llvm_emit_method(). “Emit LLVM IL from the mono IL, and compile it to native code using LLVM.”. That calls a sequence of functions, eventually to process_bb(), which contains that fat switch statement. Like this: mini_method_compile -> mono_llvm_emit_method -> mono_method_inner() -> process_bb().

Eventually, an LLVMValueRef corresponding to the end result of the instruction is created, and placed into stack-like variable, “values[]”. Each instruction works on the stack, as per usually translation to LLVM IR.




Posted in Tip