Series on program transformation systems: Coccinelle (2006)

Due to my work on Piggy, I’m starting to do a thorough review of the literature on program transformation systems, how Piggy relates to prior research, and what improvements I can make to Piggy. Note, a good list to start from is in Wikipedia: ASF+SDF, CIL (for C), Coccinelle (for C), DMS, Fermat, Spoon (for Java), Stratego/XT, TXL). This is the first entry in the series, on Coccinelle, a system that modifies C source code.

Coccinelle has its roots in the analysis of software for detecting bugs in device drivers in Linux [Padioleau, Lawall, Muller 2006]. In this work, the authors were trying to solve the problem of taking a bug fix of some code and applying the fix across the entire source code base. Coccinelle represents the program as a control-flow graph (CFG), and a language used to make transformations, SmPL, which represents computational tree logic (CTL). The pattern language uses a syntax that specifies the pattern and fix for source code transformations together. Patterns are written in C language, and the replacement code with “+” and “-“. So, patterns should be easy to construct. Note, Coccinelle does not guarantee semantics are preserved [Jones, Hansen 2007], but this isn’t usually a problem in practice. It also represents the C program prior to preprocessing in order to preserve the original look of the program.

Taking a closer look at their example, the bug is in pcf8563_remove(), where the function fails to clear the client data before calling kfree(). This is a dangling pointer, which is usually fixed by just clearing the buffer that contains the pointer so that it is never used. Patch can’t be used because the format of the code may interfere with the problem being recognized. Further, it doesn’t allow one to express missing function calls and various ways of writing the code.

    static int pcf8563_remove(struct i2c_client *client)
        struct pcf8563 *pcf8563 = i2c_get_clientdata(client);

        if (pcf8563->rtc)


        return 0;

The pattern to correct the problem detects whether there isn’t a call to i2c_set_clientdata() before the kfree(). If the pattern is detected, the call to
i2c_set_clientdata () is inserted.

    type T;
    identifier client, data;

    // Check if function uses clientdata
        i2c_set_clientdata(client, data);
        data = i2c_get_clientdata(client);
        T data = i2c_get_clientdata(client);
        // Anything in between is OK
        // If this pattern is found, clientdata is set to NULL before data is freed.
        // Do nothing and skip the rest of the alternation
        i2c_set_clientdata(client, NULL);
        // If this pattern is found, clientdata is set to NULL after data is freed.
        // Move it to the front and skip the rest of the alternation
    +	i2c_set_clientdata(client, NULL);
    -	i2c_set_clientdata(client, NULL);
        // Otherwise apply a fix if kfree() has been found in some code path
        // (doesn't need to be in all paths).
    +	i2c_set_clientdata(client, NULL);
    ? 	kfree(data);

Observations and notes

  • Coccinelle uses a syntax that intermingles the pattern and the fix, which is also how Piggy patterns are specified. But, pattern and fix seem to be only for C code. Piggy’s program transformation language is for ASTs, not CTL. Coccinelle uses an ellipsis (“…”) to denote control flow patterns where arbitrary code may appear. Piggy does not have the equivalent, because it is assumed that all AST nodes can have arbitrary content between siblings in an AST. However, Piggy uses Kleene star to denote items at an arbitrary depth in the subtree. Coccinelle’s language, SmPL, is large and complicated. There is little explanation of many constructs, e.g., what dep && dep means within the metavariable section. Indeed, they remark in one paper the value in not adding features for every damn thing.
  • Coccinelle declares sub-expressions using metavariables within the “@@ … @@” metavariable sections. Variables can be used in the replacement code. Piggy doesn’t provide that, but provides access to the AST through the Tree type, and allows one to specify any number of C# fields in the generated class.
  • Coccinelle alters the source code and provides next-in-chain pattern matching. Piggy currently does not provide a clear path to alter the AST.
  • Coccinelle is written in OCAML.
  • Coccinelle parses source code using a hand-written, top-down parser, then performs data flow analysis to derive a CTL. The system performs pattern matching on this graph, then converts it back to source code. It looks like it can handle #define’s but this was only added in the last few months.
  • The source file rtc-pcf8563.c no longer contains pcf8563_remove().
  • I really like the way Coccinelle allows you to use C source for the patterns. It makes sense to incorporate in Piggy C source code patterns in the language.


Padioleau Y, Lawall JL, Muller G. Understanding collateral evolution in Linux device drivers. In ACM SIGOPS Operating Systems Review 2006 Apr 18 (Vol. 40, No. 4, pp. 59-71). ACM. (DOI, pdf, GS)

Jones, N.D. and Hansen, R.R., 2007, November. The semantics of “semantic patches” in coccinelle: Program transformation for the working programmer. In Asian Symposium on Programming Languages and Systems (pp. 303-318). Springer, Berlin, Heidelberg. (DOI, pdf,GS)

Additional Reading

Stuart, H., 2008. Hunting bugs with Coccinelle. Master’s Thesis. (pdf), accessed Feb 27, 2019., accessed Feb 27, 2019., accessed Feb 27, 2019.

Lawall, J. and Muller, G., 2018. Coccinelle: 10 years of automated evolution in the Linux kernel. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18) (pp. 601-614). (pdf)

Open Source (GPLv2) at Github

Posted in Tip