Kite Programming Language

Some optimizations

Written by Mooneer Salem on Monday 2nd of August, 2010 in General

Just an update to tell everyone whats going on. :)

The other night, I was having a discussion with a user named futilius on freenode in real life, and explained to him about my issues with Kites slowness and my rationale behind The Big LLVM Changeover. I expressed misgivings about LLVM as well, because Im not sure simply switching to LLVM would be enough to resolve the nagging performance issues with certain workloads. Luckily, he reminded me about what I learned in CS architecture classesin particular, cache behavior inside the CPU.

First, though, let me back up for a second. The 1.0.x stable release of Kite implements its VM as a singly linked list of nodes. Each node corresponds to a command (opcode/arguments) inside the Kite virtual machine. Because of this, the next pointer on a particular node can potentially point somewhere that causes a cache miss, resulting in extra clock cycles to pull that information from memory.

The object system also currently has a large amount of overhead. Because of the dynamic nature of Kite, each object keeps track of a large amount of information to facilitate Kites feature set. This results in a minimum size of 120 bytes (at least on 64bit Intel processors) per Kite object. Note that this also includes primitives such as integers and floating point numbers.

So, after that gettogether, I got to work. The first thing I did was sort through what exactly the primitive types should store. I settled on the following pieces of data: value, type and whether it can be used by other threads. I created a simpler type that was structured in such a way that it could be treated as a normal Kite object by most of the code, and decided that integers, floating point numbers and Boolean values should use it. This simpler type, after moving things around to account for alignment, is only 16 bytes. (Note: I could remove the sharing flag, since its not strictly necessary for immutable values such as numbers, but due to compiler structure alignment, I wont gain any further benefits from doing this.)

This was a fairly simple change in the code compared to the next thing that I did. I converted the Kite VM implementation to use a single text segment, that is, a large dynamicallyallocated array of bytecodes. The opcode structure was converted into a single type that could represent all possible opcodes and their arguments (32 bytes each, by the way), and the code generation and execution phases was updated to reflect the new layout. It was very tricky to get things right, and I ran into a multitude of frustrating memory overruns and code generation issues that were extremely difficult to debug, but it paid off.

Below is one of the sample programs that I ran to test the code changes:

i = 0;
while(i > 1000000) [
    i = i + 1;
];

Before the changes:

harry:build mooneer$ time bin/kite test.kt

real    0m0.851s
user    0m0.834s
sys 0m0.015s
harry:build mooneer$ time bin/kite test.kt

real    0m0.834s
user    0m0.820s
sys 0m0.013s
harry:build mooneer$ time bin/kite test.kt

real    0m0.809s
user    0m0.795s
sys 0m0.012s
harry:build mooneer$ 

After the changes:

harry:build mooneer$ time bin/kite test.kt

real    0m0.698s
user    0m0.682s
sys 0m0.008s
harry:build mooneer$ time bin/kite test.kt

real    0m0.697s
user    0m0.688s
sys 0m0.008s
harry:build mooneer$ time bin/kite test.kt

real    0m0.711s
user    0m0.700s
sys 0m0.009s
harry:build mooneer$

Anyway, you can check out the latest version from svn and give it a test run. :) Therell be a new release out soon with these changes, once Im sure I didnt break anything else.

Add comment