Reference Documentation

Here's some other related documentation that you're probably interested in when hacking on Jato:

Java Virtual Machine

Intel i386/x86-64


Dynamic Compilation

Method Invocation

Register Allocation

Garbage Collection

Exception Handling


Compiler Design Overview

The Front-End

The front-end is responsible for parsing bytecodes and generating expression trees for them to be consumed by the instruction selector. However, you're strongly encouraged to write the back-end passes (instruction selection and code emission) for them at the same time to make sure the high-level intermediate representation makes sense.

For the front-end, we use a high-level intermediate representation (HIR) that is a forest of expression trees. That is, a compilation unit (a method) is divided into basic blocks that contain a list of statements and each statement can operate on an expression tree. Examples of statements include STMT_STORE that stores an expression to a local variable and STMT_IF that does conditional branch. The simplest form of expression is EXPR_VALUE which represents a constant value but there are more complex types of expressions including binary operations (EXPR_BINOP) and method invocation (EXPR_INVOKE). The relationships between a compilation unit, basic blocks, statements, and expressions are illustrated in Figure 1.

The individual bytecodes are converted either to statements or expressions, depending on whether they have side-effects or not and how the results of the operations are used by other bytecodes (see include/jit/statement.h and include/jit/expression.h for further details). You can find more information about the bytecode instruction set in Chapter 6 of the Java Virtual Machine Specification.

conceptual model

Figure 1: Conceptual model of the Compiler

The Back-End

The back-end is responsible for instruction selection, register allocation, and code emission. The compiler doesn't do any optimizations yet. Both instruction selection and code emission are architecture specific whereas register allocation only has some per-architecture parts. The instruction selector takes the HIR as an input and outputs a list of instructions for each basic block as a low-level intermediate representation (LIR) as illustrated in Figure 1. The per-architecture LIR is very similar to the target machine code with the exception of branch instructions for which we need to calculate branch target offsets very late in the code emission phase.

The architecture specific instruction selector is generated with Monoburg, a code generator generator that produces tree-pattern mmatchers from a Burg-like specification.