<< Home

Before I start diving into the deep sea of compiler internals, I would like to familiarize you with the Candor programming language and its Virtual Machine.

This is the thing I was working on last 10 months, and one of the most wonderful and complex things I've been working on since the start of my software development career.

Candor is an Ecmascript-inspired language, but while the newer versions of the Ecmascript standard are adding new functionality and syntax features, my language aims to make the syntax as simple as possible.

No exceptions

Caller can always be sure that function will return after the call. You should either invoke a callback with an error argument, return negative number on error, or do anything else to let caller know about errors that has happened.

No undefined and null

There is the only one value and type that represents undefined value - nil. Thus, less checks and a more understandable behaviour of your application.

No implicit global variables

Every global variable access should be done explicitly, by loading/storing properties of the global object. To my mind, it's the most simplest and powerful way to prevent global leaks.

No default runtime

Candor has no default APIs that are doing 'high-level' things with objects and arrays. These routines should be implemented by embedder (like candor.io).

Removing runtime from VM is good in terms of support, less dependencies - less things to care about, and leaving things out of the core keeps it compact.

No prototype chains

Objects are just magic-less hash-maps without special properties like toString or __proto__. Additionally you can have both numeric and string keys in objects (in other words, a[0] and a['0'] are not the same thing).

Also there're no length property of array, it's replaced by sizeof keyword. Example: sizeof [1,2,3] == 3 or even sizeof "string".

No complicated type coercion

Objects, arrays and nil are always converted either to empty string or to zero, depending on type of another argument. For example, this lets you increment uninitialized variables without getting any errors or unexpected behaviour: nil + 1 == 1.

Dart-like function syntax

No function keyword, yay! Just write:

function_name(arguments) {
  //body
}

Syntax

You can learn more about syntax and play with it on the official website.

Compiler

Since the start of this year I have been working on delivering very primitive JIT compiler and VM for Candor. The first version was generating pretty ugly machine code, which was ineffective and massive.

It was using the following algorithm:

  1. Visit AST node.
  2. Generate all it's children, and place their results into rax, rbx, rcx (depending on child's index). (Just in case - x86-64)
  3. Generate code that calculates the result of operation and return value in rax.

Pros - fast compilation, easy to understand algorithm. Cons - hard way to deal with different CPU architectures (i.e. it needed more than 6 registers), dumb generated machine code.

Thanks to v8 and Dart hacker Vyacheslav Egorov and Andy Wingo's blog, I've figured out that there're much better ways to do JIT code generation, but it was too complex for me to understand at that time. And despite I've created new branch feature-ssa and written tons of code, I've never got something truly working.

I got stuck at implementing registry allocator, mostly because of wrong design decisions that I made before, and continuing development of this branch in this form was impossible.

That's why I took a long break (for almost 6 months) and worked on other projects, until I realized how this thing should be implemented.

Candor returns

After this pause I've considered many things and finally did it. Even more, Candor now has two compilers: non-optimizing and optimizing. The non-optimizing is used where it needs to compile a lot of source as fast as possible, and the optimizing compiler is used for small functions that might be quickly optimized.

Main things that helped me to got to this state:

  1. Understanding how CFG and SSA should be really handled and represented. CFG is a way to represent tree of input source code (AST) in a linear form, by placing instructions in blocks and connecting them with the control-flow edges like: goto and branch (which is used in if and while statements). What I was missing is that the instruction and it's value should be the same object, otherwise it's very problematic to exploit def-use chains, which are very useful for getting type information and performing dead code elimination.
  2. I was detecting variable conflicts in the blocks with two incoming edges in a over-complicated way. I was using lists of active variables and performing very complex analysis to propagate them to blocks that needed them. Apparently, it's very cool and simple to do it in a way v8 does it. By creating environment for each basic block in CFG, placing variables into it and copying it as-it-is when adding successor to the block.
  3. I didn't understand that low-level intermediate representation should operate on uses which a parts of variable's liveness intervals... Previous version was doing simplified linear-scan register allocation without holes in variable's liveness intervals, which isn't resulting in good allocation.

The main difference between optimizing and non-optimizing compiler is that the former is trying to place everything in registers, while the latter operates only on the stack slots (i.e. doing memory access on every variable load and store).

By having a register allocator that's capable of allocating registers in very generic terms, it was really straightforward to add support for a 32bit code generation. And now Candor is officially running on two platforms: ia32 and x64.

Plans

Now that there are two brand new compilers, I'm going to work on adaptive optimization/deoptimization for it. Candor should be capable of optimizing hot functions on the fly and inlining small functions into their callers. Also, it's quite practical to generate code that's very fast in common cases, and falls back to unoptimized code in all other cases.

ARM support is also the part of my future plans for Candor, and I'll start working on it as soon as I'll receive my Raspberry PI.

More info

If you want to ask questions and/or learn more about Candor you can subscribe to our google group or join the #candor IRC channel on freenode.

Created at: Wed Nov 21 2012