Compilers are important, but most people go day by day using their favorite programming language and tools without
thinking too much about them, ignoring what happens under the covers.
However peeking into that black box and learning to write a compiler gives you super powers. It will allow you to write custom tools, min languages/DSLs, make your own fully fledged language, or as in “Create Your Own Compiler”, transform one language to another!
But a compiler’s most popular application is for programs to translate from a higher language to a lower-level language in order to create an executable program; see C.
Each compiler works by executing several well defined phases, each phase taking the input of the previous one until it finally produces runnable code.
The first phase is tokenization by the part called the lexer. It takes a stream of characters and uses regexes groups them according to the language syntax into what’s called the tokens – keywords functions, operators, etc.
The next phase is parsing. The parser takes the stream of tokens made by the lexer and represents them in a structure, the abstract syntax tree, something much easier to work with.
The next phase is the semantic analysis where the compiler considers the language’s syntax constraints and the data types. It makes sure that the code is well-formed and well-typed.
The next phase is to optimize the AST – eliminating dead code using techniques like Tree shaking for example. The result of this phase is the Intermediate Representation or IR. IR does itself undergo optimizations specific to the target CPU architecture to produce machine code.
The last step is to produce a standalone executable (runtimes that work with bytecode like the JVM, work with IR instead of creating an executable), something usual in C programming but with the new tools now available, even high level languages like Java, under GraalVM, can compile to native executables.
The above list is simplified of course but in general the steps you have to take in order to take an input source and transform it to the desired output are
- Building up an Abstract Syntax Tree (AST)
- Generating IR code for the given AST
- Optimizations on the generated IR code
- Generate machine code
Add to those the steps of defining the syntax of your new programming language, if you want to go that way.
Each stage is broken into multiple steps and each step comes with the annotated code interactively. It’s a great way to get your feet wet and to grasp the bare concepts.
The other, post-modern way of building compilers is by going the Roslyn way. Write a compiler for the language in that language? Microsoft has done that with the state of the art compiler platform, Roslyn.
As for the question of what Roslyn actually is, what is better than getting an authoritative answer than by a member of the Roslyn team, the renowned C# Guru himself, Eric Lippert? The opportunity came about in the form of an interview that he gave us back in 2014:
NV: Roslyn’s official definition states that it is a “project to fully rewrite the Visual Basic and C# compilers and language services in their own respective managed code language; Visual Basic is being rewritten in Visual Basic and C# is being rewritten in C#.”
How is C# being rewritten in C# ?
EL: When I was at Microsoft I saw so many people write their own little C# parsers or IDEs or little mini compilers or whatever, for their own purposes. That’s very difficult, it’s time-consuming, it’s expensive, and it’s almost impossible to do right. Roslyn changes all that, by giving everyone a library of analysis tools for C# and VB which is correct, very fast, and designed specifically to make tool builder’s lives better. I’m so excited that it’s almost done! I worked on it for many years and can’t wait to get my hands on the released version.
Click on this link to read the rest of Eric’s comments.
Building your compiler using Roslyn gives you distinct advantages:
- Massive performance improvement and built-in mechanism for handling dynamic objects. Crucial functionality for code emitting, parsing assemblies and the structure of the compiler itself that results in assemblies portability and the possibility of integrating it with tools available only for C# (code analysis, VS extensions).
- Cross platform capability since Roslyn produces portable class libraries compatible with Mono and the . NET Core.
- Visual studio integration and other functionality including code colourization, syntax highlighting and IntelliSense.
Create your own compiler
C# Guru – An Interview With Eric Lippert
Fable – Write Front-End Apps For The Web In F#
Sorbet – Making Ruby Statically Typed
How To Create Pragmatic, Lightweight Languages
Take Cornell’s CS 6120 Advanced Compilers For Free
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
or email your comment to: [email protected]