Kite Programming Language

On debug symbols

Written by Mooneer Salem on Sunday 2nd of September, 2012 in General

LLVM has a facility to emit debug symbols. As you may know, debug symbols are what allow debuggers such as gdb to work. Im happy to report that kitellvm now emits debug symbols and allows gdb to have basic functionality:

(gdb) break exception.cpp:49
Breakpoint 1 at 0x144e609: file src/stdlib/System/exceptions/exception.cpp, line 49.
(gdb) run test_exc.kt 
Starting program: /home/mooneer/kite-llvm/kite test_exc.kt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, kite::stdlib::System::exceptions::exception::throw_exception (this=0x215a2d0)
    at src/stdlib/System/exceptions/exception.cpp:49
49                      int num_traces = backtrace(buf, NUM_TRACE);
(gdb) bt
#0  kite::stdlib::System::exceptions::exception::throw_exception (this=0x215a2d0)
    at src/stdlib/System/exceptions/exception.cpp:49
#1  0x0000000001421d7f in kite::stdlib::System::exceptions::exception::s_throw (exc=0x215a2d0)
    at src/stdlib/System/exceptions/exception.h:59
#2  0x00007ffff7f48168 in __static_init____o () at test_exc.kt:2
#3  0x000000000144a31b in kite::stdlib::language::kite::kite::ExecuteCode (ast=..., context=0x215afa0, 
    suppressExec=false) at src/stdlib/language/kite.cpp:265
#4  0x0000000001449ff1 in kite::stdlib::language::kite::kite::ExecuteCode (ast=..., suppressExec=false)
    at src/stdlib/language/kite.cpp:201
#5  0x00000000013ea54b in main (argc=1, argv=0x7fffffffe6a0) at src/apps/kite.cpp:98
(gdb) up
#1  0x0000000001421d7f in kite::stdlib::System::exceptions::exception::s_throw (exc=0x215a2d0)
    at src/stdlib/System/exceptions/exception.h:59
59                      static void s_throw(exception *exc) { exc->throw_exception(); }
(gdb) 
#2  0x00007ffff7f48168 in __static_init____o () at test_exc.kt:2
2       (make System.exceptions.TypeMismatch())|throw;
(gdb) list
1   run [
2       (make System.exceptions.TypeMismatch())|throw;
3   ] catch [
4       __exc|print;
5   ];
(gdb) 

Unfortunately, this works only on Linux (Apples gdb is too old, as it turns out). Eventually I want to be able to show function names/file/line numbers in exception stack traces on both OSX and Linux, but this may involve some work. Alternatively, would it be possible to automatically start gdb with the applications current process ID immediately upon getting an unhandled exception? This might require additional debug info to be output, though.

Interesting LLVM "bug" found

Written by Mooneer Salem on Monday 16th of January, 2012 in General

Today I added parser and codegen support for the return keyword. As in other languages, its designed for the developer to exit a method early. Unfortunately, it didnt work right away:

$ ./kite
method x()
[
    return 1;
];

x()|print;
^D
System.exceptions.NotImplemented: Could not find method print that takes 0 argument(s).
    in (main program) + 0x79
$

Investigating further, I noticed that it was generating the following LLVM intermediate code:

; ModuleID = '__root_module'

@0 = internal constant [5 x i8] c"x__o\00"
@1 = internal constant [6 x i8] c"print\00"

define i32* @__static_init____o(i32* %this) {
entry:
  %0 = alloca i32*
  store i32* %this, i32** %0
  %1 = call i32* @kite_method_alloc(i32* bitcast (i32* (i32*)* @x__o to i32*), i32 1)
  %2 = load i32** %0
  %3 = call i32** @kite_dynamic_object_get_property(i32* %2, i8* getelementptr inbounds ([5 x i8]* @0, i32 0, i32 0), i1 true)
  store i32* %1, i32** %3
  %4 = load i32** %0
  %5 = call i32* @x__o(i32* %4)
  %6 = call i32* @kite_find_funccall(i32* %5, i8* getelementptr inbounds ([6 x i8]* @1, i32 0, i32 0), i32 1)
  %7 = bitcast i32* %6 to i32* (i32*)*
  %8 = call i32* %7(i32* %5)
  ret i32* %8
}

define i32* @x__o(i32* %this) {
entry:
  %0 = alloca i32*
  store i32* %this, i32** %0
  %1 = call i32* @System__integer__obj__i(i32 1)
  ret i32* %1
  ret i32* %1
}

declare i32* @System__integer__obj__i(i32)

declare i32* @kite_method_alloc(i32*, i32)

declare i32** @kite_dynamic_object_get_property(i32*, i8*, i1)

declare i32* @kite_find_funccall(i32*, i8*, i32)

Note the part in bold. Two ret statements are being outputthe one from return, and the normal one thats generated at the end of every method. Normally LLVM would eventually generate machine code that simply skips over the second ret statement. Or so youd think.

What I noticed is that System__integer__obj__i() would do the correct thing and return a valid System::object pointer, while kite_find_funccall() would get a totally different pointer. Even though according to the above, the return value from obj__i() gets fed directly into kite_find_funccall(). I confirmed this time and time again in gdb.

The fix? Using the same code suppression logic that I implemented for break/continue to suppress the last ret. Now it works as intended:

$ ./kite
method x()
[
    return 1;
];

x()|print;
^D
1
$

EDIT: this was on LLVM 2.9. I have yet to try 3.0 to see if this issues fixed there. If not, a bug report might be pending in the future.

EDIT 2: Link to LLVM bug report. Was able to duplicate the problem in 3.0 as well.

Long overdue update!

Written by Mooneer Salem on Saturday 11th of June, 2011 in General

Hi everyone! I know I havent updated in a while, but Ive had life go on since my last update. I was able to get back to working on the LLVM port of Kite, though, so I implemented a few things:

  1. I moved kitellvm development to a Git repository. This is mostly an experiment to see how it goes. I like it so far, though.
  2. Implemented basic exceptions in kitellvm (no stack trace yet):
    harry:kite-llvm mooneer$ ./kite
    run 
    [ 
        (make this.System.exceptions.exception())|throw; 
    ] 
    catch 
    [
        __exc.message|print; 
    ];
    ^D
    Exception thrown
    harry:kite-llvm mooneer$ 
  3. Implemented an ikt port:
    harry:kite-llvm mooneer$ ./ikt
    Interactive Kite console
    ikt> "hello world"|print;
    hello world
    ---> hello world
    ikt> ^D
    harry:kite-llvm mooneer$  
  4. Class definition and instantiation support:
    harry:kite-llvm mooneer$ ./kite
    class X
    [
        method __init__() [ this.elite = 1337; ];
    ];
    
    (make X()).elite|print;
    ^D
    1337
    harry:kite-llvm mooneer$  

Anyway, more updates soon :)

Boost Spirit experimentation

Written by Mooneer Salem on Saturday 25th of December, 2010 in General with 3 comments

Ive been playing with Boost Spirit as an alternative to Bison/Flex lately for Kite. So far, its actually pretty nice in terms of syntax; you only need to use C++ to describe your language. Ive already found a few disadvantages, as shown below:

harry:kite-llvm mooneer$ time sh compile.sh 
Compiling src/apps/kite.cpp...
Compiling src/codegen/llvm_compile_state.cpp...
Compiling src/codegen/llvm_node_codegen.cpp...
Compiling src/codegen/syntax_tree_node_printer.cpp...
Compiling src/codegen/syntax_tree_printer.cpp...
Compiling src/parser/parser.cpp...
Linking...

real    0m34.831s
user    0m32.598s
sys 0m2.048s
harry:kite-llvm mooneer$ ls -lh
total 12240
-rwxr-xr-x@ 1 mooneer  staff   238B Dec 25 00:42 compile.sh
-rwxr-xr-x  1 mooneer  staff   4.0M Dec 25 15:02 kite
-rw-r--r--  1 mooneer  staff   9.2K Dec 25 15:01 kite.o
-rw-r--r--  1 mooneer  staff    10K Dec 25 15:01 llvm_compile_state.o
-rw-r--r--  1 mooneer  staff    72K Dec 25 15:01 llvm_node_codegen.o
-rw-r--r--  1 mooneer  staff   1.8M Dec 25 15:02 parser.o
drwxr-xr-x  6 mooneer  staff   204B Dec 23 21:34 src
-rw-r--r--  1 mooneer  staff   5.7K Dec 25 15:01 syntax_tree_node_printer.o
-rw-r--r--  1 mooneer  staff   9.6K Dec 25 15:01 syntax_tree_printer.o
harry:kite-llvm mooneer$

(tl;dr: very large binaries, even with g++ Os, and very long compile times.) Ive only implemented the math operations so far for the above test.

Anyway, Ill put up the code when I have a bit more to show. :)

Some optimizations

Written by Mooneer Salem on Monday 2nd of August, 2010 in General

Just an update to tell everyone whats going on. :)

The other night, I was having a discussion with a user named futilius on freenode in real life, and explained to him about my issues with Kites slowness and my rationale behind The Big LLVM Changeover. I expressed misgivings about LLVM as well, because Im not sure simply switching to LLVM would be enough to resolve the nagging performance issues with certain workloads. Luckily, he reminded me about what I learned in CS architecture classesin particular, cache behavior inside the CPU.

First, though, let me back up for a second. The 1.0.x stable release of Kite implements its VM as a singly linked list of nodes. Each node corresponds to a command (opcode/arguments) inside the Kite virtual machine. Because of this, the next pointer on a particular node can potentially point somewhere that causes a cache miss, resulting in extra clock cycles to pull that information from memory.

The object system also currently has a large amount of overhead. Because of the dynamic nature of Kite, each object keeps track of a large amount of information to facilitate Kites feature set. This results in a minimum size of 120 bytes (at least on 64bit Intel processors) per Kite object. Note that this also includes primitives such as integers and floating point numbers.

So, after that gettogether, I got to work. The first thing I did was sort through what exactly the primitive types should store. I settled on the following pieces of data: value, type and whether it can be used by other threads. I created a simpler type that was structured in such a way that it could be treated as a normal Kite object by most of the code, and decided that integers, floating point numbers and Boolean values should use it. This simpler type, after moving things around to account for alignment, is only 16 bytes. (Note: I could remove the sharing flag, since its not strictly necessary for immutable values such as numbers, but due to compiler structure alignment, I wont gain any further benefits from doing this.)

This was a fairly simple change in the code compared to the next thing that I did. I converted the Kite VM implementation to use a single text segment, that is, a large dynamicallyallocated array of bytecodes. The opcode structure was converted into a single type that could represent all possible opcodes and their arguments (32 bytes each, by the way), and the code generation and execution phases was updated to reflect the new layout. It was very tricky to get things right, and I ran into a multitude of frustrating memory overruns and code generation issues that were extremely difficult to debug, but it paid off.

Below is one of the sample programs that I ran to test the code changes:

i = 0;
while(i > 1000000) [
    i = i + 1;
];

Before the changes:

harry:build mooneer$ time bin/kite test.kt

real    0m0.851s
user    0m0.834s
sys 0m0.015s
harry:build mooneer$ time bin/kite test.kt

real    0m0.834s
user    0m0.820s
sys 0m0.013s
harry:build mooneer$ time bin/kite test.kt

real    0m0.809s
user    0m0.795s
sys 0m0.012s
harry:build mooneer$ 

After the changes:

harry:build mooneer$ time bin/kite test.kt

real    0m0.698s
user    0m0.682s
sys 0m0.008s
harry:build mooneer$ time bin/kite test.kt

real    0m0.697s
user    0m0.688s
sys 0m0.008s
harry:build mooneer$ time bin/kite test.kt

real    0m0.711s
user    0m0.700s
sys 0m0.009s
harry:build mooneer$

Anyway, you can check out the latest version from svn and give it a test run. :) Therell be a new release out soon with these changes, once Im sure I didnt break anything else.