Starting up...

Killing Off Wasabi - Part 2

Last time, I talked about our motivations for creating Wasabi, maintaining it for almost a decade, and finally killing it off. Today I would like to talk about the new piece of software that enabled the final phase, Roslyn. Roslyn is Microsoft's open source compiler platform for .NET. Roslyn includes a full implementation of both the C# and the Visual Basic.NET compiler. I am also releasing Wasabi's new code generator as open source software. I believe this is the first real-world transpiler targeting C# with Roslyn to be open sourced. It is my hope that this article and associated source code will serve as a rough guidepost for anyone who wants to write a .NET code generator using Roslyn.

The Previous Generator

When Wasabi 3.0 was first created, Fog Creek specified .NET as the new backend target. There are two common ways to generate a .NET assembly. The first approach to .NET is to create a backend that generates Common Intermediate Language (Formerly known as the MSIL), the bytecode format for the .NET virtual machine. The advantage of creating a CIL backend is that compilation only has one step: from Wasabi directly into the final DLL shipped to customers. There are a few disadvantages, however. First, CIL is a binary format, so it requires special tools to be able to read it. Second, we wanted to be assured that the output of Wasabi was valid .NET, which would give us hope that the output was not corrupted in some weird way. Finally, there is the simple conceptual difference between Wasabi, the high-level language, and CIL, the virtual assembly language. The second option was to write a transpiler. We preferred this option over generating raw CIL because we already had experience writing transpilers. Wasabi (née Thistle) started as a transpiler to PHP, then added VBScript and JavaScript transpilers with Wasabi 2.0. One downside is that a transpiler to a compiled language adds another full compilation phase on top of the resources used to compile your own language. But we would benefit from the additional verification provided by those phases: FogBugz would have to survive type checking in two different languages, and we trusted csc.exe to generate correct bytecode for us. Also, we were able to emit #line pragmas to be able to step through Wasabi code using the Visual Studio debugger. The transpiler also made XML documentation comments very easy to generate. We decided on C# as our target language, because Wasabi is written in C#, and semicolons are cool. Once we had decided on transpilation, the next step was choosing a generation method. Once again, there were two options available to us. The first was to hand-roll the code generator, passing bare strings to TextWriter.Write. Our previous three backends used this method, so we were already quite comfortable with it. The second option, which we ended up using, was to use a code generation library like Microsoft CodeDOM, which represents common source code elements as an object graph, similar to how the browser DOM represents HTML elements. We anticipated that CodeDOM would provide some level of static typing to our code generation process, as our previous generators always included a period of fixing obvious typographical errors (a missing semicolon here, an extra open-bracket there). CodeDOM also exposed a handy method for compiling directly to C#. However, we were slightly disappointed when we discovered that CodeDOM simply writes to disk and shells out to csc.exe. Another irritation was that word "common" I included in my description of CodeDOM: it is a lowest-common-denominator tool. Basically, the output was intended for a compiler, not human consumption. I will show some examples later. But we didn't need humans to be able to modify the generated C# code - it was just important that the C# was there, as a reassuring option if something went wrong.

Roslyn

In early December 2014, I visited NYC HQ and chatted with John Daniels about some of the friction Wasabi was causing. Everyone who worked on FogBugz had to know two server languages - Wasabi and C#. New development had shifted into C#, but minor modifications and bug fixing had to be done in the older Wasabi portion of FogBugz. Also, as mentioned earlier, Wasabi had two separate compilation steps, which meant changes took over twice as long as we believed they should. John told me he had experimented with writing a cleanup tool using Microsoft's new Roslyn library that removed some of the weird cruft from the CodeDOM output. We mused over the possibility of cutting out CodeDOM entirely and using Roslyn for code generation. I was encouraged by him and several FogBugz developers to follow that thought. On Monday, February 9, I copied the CodeDOM generator file to RoslynGenerator.cs, removed CodeDOM's using statements and started replacing each method with the Roslyn equivalent. After a few days of struggling, the generator compiled but crashed on execution. Debugging and bug fixing continued for a few more days. Once I had any output at all, my next goal was to generate identical source code from the two generators, so I added a command-line flag that let me switch generators and initialized a Mercurial repository to check the diff between the two. There were two major roadblocks to identical generation. First, there was an impedance mismatch between the two generation libraries' philosophy. CodeDOM's representation is a mutable graph, so I would often pass node references as parameters and mutate them in other methods. Roslyn represents source code as an immutable graph of nodes, so the correct way to generate a given node is from the bottom up. However, because I was trying to modify a copy of the CodeDOM generator into shape, I had to change the calling convention of some of the methods which depended on mutation. There are many convenience methods in Roslyn to allow you to create a mutated copy of your node, but it wasn't always clear where the result should be stored. The symptom of storing it in the wrong place (or forgetting entirely) was that some output was missing. Usually, I succeeded at rewriting the code to be generated in an immutable-friendly order, but I had to create some Lists and instance fields where the mutability was difficult to avoid. The second roadblock was comment generation. Internally, Wasabi represents comments as statements (they inherit from CStatement), as does CodeDOM (which uses CodeComments wrapped in CodeCommentStatements). However, Roslyn is designed to be able to represent any valid C# program in a lossless way, so it uses a much more flexible idiom, known as SyntaxTrivia. Both whitespace and comments can be represented this way, but trivia is a property of a node, not a node itself: nodes have collections of leading trivia and trailing trivia. As an example, the expression 1 + 2 could be represented with either of the following pseudocode trees:
(binary-expression
    operator: +
    lhs: (constant-expression value: 1)
    rhs: (constant-expression value: 2)
    leading-trivia: (trivia-list ' ')
    trailing-trivia: (trivia-list ' ')
)
Or
(binary-expression
    operator: +
    lhs (constant-expression value: 1 trailing-trivia: (trivia-list ' '))
    rhs (constant-expression value: 2 leading-trivia: (trivia-list ' '))
)
Either way, the result is a single space between the 1 and + and another between the + and 2. Comments are similarly stored as properties on any node, allowing inline comments (/* ... */) and end-of-line comments, or comments on lines of their own. I really enjoy the idea of syntax trivia. I wish I could go back in time and add a similar feature to Wasabi. However, instead of trying to replicate the exact weirdness of CodeDOM comment generation, I switched comment generation off in both generators for the purposes of correctness checking, then added all comments to the Roslyn generator later. One complaint I have about Roslyn is with the SyntaxKind type - for performance reasons, it is implemented as a gigantic enum. However, this means it is not type safe - you can pass any SyntaxKind to any method that takes one, even where doing so will cause an inevitable crash. I ended up having to fix dozens of cases where I passed the wrong SyntaxKind (for example, passing a QuestionToken where I meant to pass a ConditionalExpression), and Roslyn would throw an exception. Having a type-safe version of this API available would have been appreciated, even if it was a plain wrapper around an integral enum SyntaxKind under the covers.

Idioms and Readability

As I discussed before, the whole reason we were switching to Roslyn from CodeDOM is that the quality of the generated code, while it preserved meaning, was not good for human consumption. The first week of March, when the generated code finally matched the output of CodeDOM, was when Roslyn really began to show its value. I took a week off in the middle of the code generator rewrite, so it took about 12 developer-days to reimplement the CodeDOM generator in Roslyn. Once the reimplementation was done, I pressed hard on the abilities of Roslyn to make the output of Wasabi much more idiomatic:
  • Boolean negation uses the !x operator instead of comparing x == false
  • Similarly, not-equals changed from (a == b) == false to a != b
  • Negative numbers are represented as -n instead of (0 - n)
  • While loops now use a while(cond) loop, instead of using an initializer-less for( ; cond ; )
  • Likewise with do { } while(cond); loops
  • Classes with no base class no longer explicitly inherit from object
  • else-if clauses no longer have to be generated as else { if }, reducing the dreaded Arrow Anti Pattern
  • switch statements also no longer have to be generated as a nested chain of if, else-if, else-if, else-if, else...
  • Lambdas use C# lambdas instead of Wasabi-generated closure classes. (I had to make sure nobody was depending on the weirdness of Wasabi lambdas. Luckily, nobody was.)
  • Removed unnecessary parentheses, e.g. return (x - 1); and if ((s == z)) { }
  • Correctly translate Wasabi to C# precedence rules. (CodeDOM wrapped every single binary expression in parentheses.)
  • Replace Wasabi runtime functions for Xor, Imp, and Eqv with inline equivalents in C#
  • Rewrite almost all gotos as the equivalent break or return
  • Don't generate empty files, namespaces, or partial class Global : GlobalBase { }
  • Improve string literal generation
  • By request, developed a heuristic for "obnoxious parameters" so that functions passed many literal or null parameters have those parameters named
I also fixed several things that could have been fixed in the previous generator, but were ignored because the readability of the output had not been a concern:
  • References to constants are generated instead of outputting the literal value of the constant
  • Hexadecimal numbers are output as hexadecimal
  • for loops with constant bounds no longer generate local variables.
  • Wasabi arrays start at index 1. Do this calculation at compile time. For example, write rg(1) as rg[0] Instead of rg[1 - 1].
  • Initialize most fields inline instead of in the constructor
  • Reduced the number of unnecessary cast expressions (the majority of the casts generated were being added by a bug in the type checker!)
  • Rename the Wasabi thread-static singleton from __Global.__Current to Global.Current
  • Don't assign null to optional parameters -- that's guaranteed to be the default value.
  • The default value for ints is 0, not new int(). Similarly with longs (0L) and Booleans (false).
  • Automatically rewrite VBScript-style returns (assigning to the name of the function) as immediate returns
  • Removed some ancient profiling code generation
  • Got rid of the very few union types
Thankfully, those bugs were even easier to fix in the new Roslyn generator than they were in the old CodeDOM generator. On March 27, the FogBugz team and I decided that I was hitting the point of diminishing returns and closed the project. I demonstrated my results to the founders on the 30th, and they asked me to get the developer lead and two most prolific FogBugz contributors to work exclusively in the generated C# branch for a while to see how it felt before merging. Once the developers added their approval, I merged my changes into FogBugz integration in the late evening of April 21, and FogBugz Version 8.13.104 went out to 100% of FogBugz On Demand customers on May 21. I'm happy to announce that Wasabi's Roslyn generator is now available on GitHub. It's certainly not a beautiful piece of code: it only needs to run one time, on one piece of code (FogBugz) that I can modify as needed. But I think it could be useful for anyone who is writing a C# code generator using Roslyn.
GitHub Stack Overflow LinkedIn YouTube