Pre-announcing VaLLVM

Curently Vala simply generates C code and let the C compiler do its job. While in many cases it produces good code there are some cases which where compiler stack could use more domain-specific knowledge. The problem is that C was constructed as human-readable language and it’s optimization keywords are directed for writing in it directly, while Vala have more highier level knowledge and it needs to make the compiler deduce properties, which might be trivial for the code reader/writer.

For example consider the code below. This is very simple code, but the Vala compiler will treat the lambda as a full scale event. As the result instead of allocating the data on stack it will be allocated from heap. All the accesses will be done by pointer indirection after the lambda as data have escaped and evil stdout.printf might access it (for all compiler knows – it does). Furthermore the widget and this will be referenced during the cast and unreferenced afterwards – unnecessary as both values should be live the whole time.

class Test {
    void foo(Gee.List<Gtk.Widget> list) {
        double powersum = 0;
        int no = 0;
        int x = 0;
        list.foreach ((widget) => {
            no++;
            i++;
            Gtk.SpinButton? spin = widget as Gtk.SpinButton;
            if (spin != null) {
                double val = spin.get_value ();
                powersum += val * val;
                stdout.print("%d: %d, %d\n", no, powersum, x);
            }
        });
        for (int i = 0; i < 1000; i++) {
            if (x != 0) {
                stdout.printf("An error\n");
            } else {
                stdout.printf("");
            }
        }
    }
    private int i = 0;
}

Solving the problem on Vala side is possible but it would require, in long term, writing whole optimization compiler in it. What worst it would require debugging it and maintaining it while being left behind in speed by large projects like LLVM and GCC. While I heard propositions of writing Vala frontend for GCC it would take significant time and it would force Vala to release in GCC cycles. LLVM backend for Vala, as proposed by one of GSoC would also take significant time.

VaLLVM

Initially (i.e. a month ago) I planned to write a special middle-end to be put between Vala and C. However as more and more features were added I run into the same problem – the design was similar to general purpose optimizer (and probably given my experience and skill it had square wheels). However LLVM do allow to write plugins with additional passes. It even have (some) documentation and tutorial – and what’s more important IRC Channel with friendly people, who helped me understand how the passes operates.

The VaLLVM is suppose to be a collection of passes to optimize Vala-produced code with possibly trivial changes to the compiler. Vala Compiler would produce C code, as usual. The required information would be passed by intrinsic functions which for non-VaLLVM accelerated operations could be (ideally) implemented in C. All additional information would be passed by YAML file (LLVM have a library to read it) and non-Vala generated code should not be altered in any way.

Currently there is very little code (I started just a simple passes to get the information out of functions for sub-project described later), no build system (I thought using the CMake to leverage the LLVM macros) and no license so this is pre-announcing.

Local closures

While it is not the only optimization that is planned it is the first I’m planning to implement (and illustrates the change from C to ‘VaLLVM C’). Local closure (this is my name – sorry if I renamed an existing concept) is a closure which does not access the data outside the data’s scope. This allows for several optimizations, the most important one might be that data can be allocated on stack. If we know when the closure is called we can only flush the data to starage when it is required etc.

While the fine details of design are not finalized yet the basic idea is to add following intrinsic functions:

  • void block_new(Block **) – creates a new scope
  • T *block_get_variable(Block *) G_GNUC_CONST – gets a pointer to the variable
  • block_free(Block *) – marks an end of scope. Due to escape analysis it might not necessary free the block
  • DelegateData *closure_new(Delegate, Block) – creates a new lambda parameters.
  • GDestroyNotify *magic_prefix_disown(DelegateData *) – denotes that lambda will be passed as owned parameter, stored in owned variable etc.
  • void magic_prefix_escape(DelegateData *) – denotes that lambda have escaped the control and might not be considered local. Depending on the actual usage of returned value various optimizations might be performed (if it is detected the result value is always called, for example, within live of scope then scope can be allocated on stack).
  • void magic_prefix_free(DelegateData *) – denotes that closure was freed locally.
  • DelegateData *magic_prefix_import(void *, GDestroyNotify) – allows to map an external closure to VaLLVM ones. External closures (i.e. any passed though parameters) cannot be as easily optimized but this function allows to create DelegateData * for them.
  • void *magic_prefix_get_data(DelegateData *) – get the user_data parameter. This could not be strictly required for LLVM but it is required for an easy pure C implementation (and for that reason it needs to be called before disowning or escaping).

The C implementation of those functions closely match what Vala compiler is doing now so switch should be relatively easy. Not everything will be done at the same time so I presume an ‘easy’ code (for example passing lambda directly to function) might be fastest for some time. In addition Vala would need to generate YAML file describing the hierarchy of scopes, variables in them etc. as this information might have get lost during previous LLVM passes.

In addition the Vala language might consider adding [CCode (scope = "local")] attribute to arguments as otherwise all lambdas would need to be considered as potentially escaping. Local lambdas could, in principle, also access ref and out arguments but probably some clear and simple rules would be good (for example only when lambda is passed directly to the function).

Final words

While writing software is fun I wonder if there are no other projects regarding clsoures in the wild (either applicable to Vala so I would not need to write this one or similar so I could built upon them). If anyone knows one – please let me know (research literature showing how deeply misguided and naïve my approach is also welcomed).

This entry was posted in Programming, Vala and tagged , . Bookmark the permalink.

11 Responses to Pre-announcing VaLLVM

  1. I think this is a great idea. Ultimately, Vala shouldn’t really be involved: what one would really want is for LLVM to be able to optimise C code written in the GLib style, which Vala just happens to produce. That could mean other of things too: any reads from a class definition can be considered constants. There’s probably a lot of optimisation that could be done around the way asynchronous methods are handled. Having LLVM able to optimise those would mean that all the C code written in the GLib could be optimised with the same compiler.

    Obviously, the first step is to figure out how bad the existing output of LLVM is when compiling that sort of code.

    • There are several problems with optimizing just the GObject style:

      – Writing in C allows much more trivial optimizations (in some cases it is even easier to write optimized code) compared with Vala autogenerated code.
      – On the other hand there are ways to sneak informations about code into the LLVM from Vala (say – mentioned before YAML files with description of hierarchy). Part of the nature of intrinsics is that they allow to rearrange certain parts without worrying about breaking something.
      – C code can contain much more implicit invartiants compared to Vala. I prefer to choose conservative approach – escape on the unowned methods is very unlikely – but it is possible nonetheless, so the default must be to preserve the old behaviour, modify only the code I was instructed to modify. I suspect that LLVM is close to optimal for GObject style code moving within those invariants.

      While optimizing GObject style would be grate I try to simplify problem as much as possible. If VaLLVM will be released Vala compiler will need to be modified to take advantage of it (what I targeted was, however, to be able to double-target both the VaLLVM and normal compilers such as plain clang or gcc).

      Regarding asynchrounious methods – that’s one of future extentions I was thinking of. For now I will be happy when I’ll finish the local closures optimization in some reasonable time.

      • Perhaps the right approach then is the `__attribute()__` approach. The same way that functions can be decorated with the information that they are `printf`-like could be extended.

        The idea of having to pass secondary information into the compiler is extremely weird.

      • That was my original plan but it looks like the clang does not allow ad-hoc passing of information to LLVM passes and unknown attributes are simple ignored. There are also problems with potentially over-eager optimization passes.

        In any case the design should be sufficiently flexible to be adaptable and the change of behaviour should be as easy as changing one of subpasses. I’ve chose the method as it was a quickest one to start hacking even if it wasn’t the less dirty one.

      • Okay. Then I guess the better question is: how do you intend to measure improvement as a function of “typical” Vala code?

      • Most of the optimizations are planning are local rather then global and are rather allowing for LLVM to optimize it further rather then rearrange low level issues in code. So I would not expect any slowdown – allocation on stack should always be faster then on heap and heap allocation probably outweights any possible cost.

        Regarding ‘typical’ Vala code – so far I mostly used Vala compiler as reference. My initial goal was to move Vala to gee-0.8 instead of using it’s own, built-in gee-0.0.9. This would have several benefits including access from Vala to such advanced features as sorting or sorted sets and maps, not having 2 divergent codebases etc.. However for various reasons this involved slight penalties (see Vala ABI and branch prediction for details as well as Perform more sophisticated escape analysis). Part of those penalties involve GObject implementation so they won’t be solved on LLVM level and need GLib alterations – some optimizations (like local closures for example) are better to be performed in middle-end.

      • I might get in trouble for suggesting this, but it seems like Vala’s code gen has become a labyrinth of special cases. I think it would be highly beneficial to separate out code gen into a predicate dispatch system so that new optimisations can be added cleanly. (as an aside, having predicate dispatch be a Vala language feature might be interesting.)

      • Does it mean you’re volunteering to do it? 😉

        Jokes aside – it would probably require some thought about how to make predicate dispatch ABI stable (if it is suppose to cross the ABI boundary – could be interesting for the plugins). It would probably introduce new wonderful regressions. First and foremost – Vala devs probably simply don’t have time to do it (I have unreviewed bugfixing patches for Vala – I prefer not to think how would they react to patch rewriting the codegen 😉 ). Personally I would push to get co(ntra)variance generic support (it’s not further related to this discussion).

        And while I consider Vala codegen scary at least part of it is necessary complexity. To be honest – I know too little about codegen in Vala to attempt to either criticize it or praise it (the only think I can say I dislike the large chain of classes).

        PS. If you really want to do it than probably you should talk to Jürg about it.

      • Co/contra-variance support would be nice.

        I know that codegen is the way it is because it has grown to be the way it is because it has to handle all these special cases. It’s not that I dislike it; just that I know it makes adding things difficult. I have been working on a Vala compiler for AVR microcontrollers, and making asynchronous methods work without GLib has been rather trying.

        If LINQ ever were to happen in Vala, it seems like having a more pluggable code-gen would be extremely helpful.

      • Better asynchronous codegen would be nice. I guess part of the problem can be pushed down the stack to operate on lower level which would allow for better optimizations.

        If you are willing to do something with codegen that would be cool – Vala’s, as any other FLOSS project, understaffed so extra keyboard will help.

  2. I believe messing with llvm at this point is not cost-free, I’d rather invest in some optimization in vala itself. The wip/transform branch is already on the path of moving information from the codegen up to vala multipass by rewriting vala in other vala, whereas stuff in the codegen would have too little information.
    That means, a for loop is no longer going to be transformed in a while loop from the ast. Rather, the transformations will decide whether to use a while loop for more complex stuff, or a straight for loop in case of simple stuff. The codegen would then simply transform ast into plain C, without much cleverness.

Leave a comment