Futures and promises in libgee

In recent unstable releases of libgee contains Futures and Promises (please note that API is still not frozen so the exact details may change).

Future in a simples case is a value which might not be currently available. From the interface point of view it does not matter how exactly the value is received (it can be computed in another thread, got through the RPC, lazily computed etc.). Setting aside that a value it is functional, and functional programming is cool, it have concrete benefits over the callback such as composability.

While Vala have syntax support for asynchronious programming and it was tuned to I/O operations there are still a bit of missing pieces to fully use it in that way (including but not limited to synchronisation).

Motivational example

Consider very simple example – we want to fetch two pages and count number of certain characters on them and print the output. Since pages might be long we don’t want to stall the main loop for too long. Sample code to do it is presented below (I haven’t test it):

internal async uint64 count(Soup.MessageBody body, uint8 byte) {
	uint64 count = 0;
	for (uint64 i = 0; i < body.length; i++) {
		if (google.response_body.data[i] == byte) {
			g_count++;
		}
		if (i % (4 * 1024 * 1024) == 0) {
			yield;
		}
	}
	return count;
}

int main() {
	GLib.MainLoop loop = new GLib.MainLoop();

	Soup.Session session = new Soup.Session ();
	Soup.Message g_message = new Soup.Message ("GET", ...);
	Soup.Message b_message = new Soup.Message ("GET", ...);

	uint64 g_count = 0;
	uint64 b_count = 0;
	uint64 done = 0;

	session.send_async (g_message).begin ((obj, res) => {
                session.send_async.end (res);
		g_count = count (g_message.response_body, 'g');
		if (++done == 2) {
			stdout.printf ("g + b: %llu\n", g_count + b_count);
			loop.quit ();
		}
	});
	session.send_async (b_message).begin ((obj, res) => {
                session.send_async.end (res);
		b_count = count (b_message.response_body, 'b');
		if (++done == 2) {
			stdout.printf ("g + b: %llu\n", g_count + b_count);
			loop.quit ();
		}
	});

	loop.run ();
}

However it has several problems:

It depends on ‘trick’ counter and the count would need to be change everywhere to be usable. While it can be changed to factor it out it still needs to be updated depending on number of tasks.
It computes everything in thread that runs main loop. While care was used to ensure that it won’t case latency problems it would be nice to parallelize if we had more compute-intensive application then character counting, more I/O tasks etc.
The code flow is not coherent. Even when the common parts were factored out the dependencies are implicit rather then explicit.

Assuming soup would feature a future interface the code below would be how it could be written:

internal Future<uint64?> count (Soup.MessageBody body, uint8 byte) {
	return task<uint64?> (() => {
		uint64 count = 0;
		for (uint64 i = 0; i < body.length; i++) {
			if (google.response_body.data[i] == 'g') {
				g_count++;
			}
		}
		return count;
	});
}

int main() {
	GLib.MainLoop loop = new GLib.MainLoop();

	Soup.Session session = new Soup.Session ();
	Soup.Message g_message = new Soup.Message ("GET", ...);
	Soup.Message b_message = new Soup.Message ("GET", ...);

	Future<uint64?> g_count = session.send_future (g_message).flat_map ((val) => {
		return count (g_message.response_body, 'g');
	});
	Future<uint64?> b_count = session.send_future (b_message).flat_map ((val) => {
		return count (b_message.response_body, 'b');
	});
	Future<uint64?> sum = g_count.join<uint64?>((g, b) => {return g + b;}, b_count);

	sum.when_done ((val) => {
		stdout.printf ("g + b: %llu\n", g_count + b_count);
		loop.quit ();
	});

	loop.run ();

	return 0;
}

This code have none of the problems listed before. The code have no explicit synchronization so the problem of hand-writing it is avoided. It have semantic flow – get the count of ‘g’s, count of ‘b’s and add it. Finally it uses threads from thread pool (by task) avoiding computation in main loop.

As a side note – I hope to provide compatibility layer for futures with GIO async. For it the Vala would need to support async delegates:

Future<uint64?> g_count = future_async(() => {
	yield session.send_async (g_message);
	return count (g_message.response_body, 'g');
});
Future<uint64?> b_count = future_async(() => {
	yield session.send_async (b_message);
	return count (b_message.response_body, 'b');
});

Existing solutions

Currently I know of 2 similar things in GObject ecosystem. One of them is GTask and Gee.Lazy values.

While GTask perform similar role to function task in libgee it is nonetheless much lower-level. It ultimately written to be a GAsyncResult completed in separate thread. The futures take a different approach while they concentrate of values then callbacks and providing slightly different approach (values vs. callbacks).

On the other hand GTask do more then currently Gee.task. Currently the GTask do the additional bookkeeping including the priority, canceling etc. For that reason it might be worthwhile to use GTask as backend for Gee.task.

The lazy values have similar interface but different semantic. By contract the Future’s are thread safe while lazy values are not. Adding additional guarantees to lazy values would inhibit their usefulness as they would be larger and they would have higher overhead. It might be useful to implement a lazy future, or transform the lazy value into future.

Push or pull

One of the design decisions was if the futures should implement push or pull. From higher level perspective the difference is if the future A depends on future B should it request notification when B is done (push) and B have a list of the callbacks or should A call B when it thinks it needs a value.

In pure functional programs with tracing garbage collection the pull scheme have advantages. If the value is not needed nothing keeps (non-weak) reference to it and therefore it can be collected. No effects will be lost as the computation is not run as there is no side-effects and no one refers to the value.

In imperative programs the situation is not so simple. In our example after line 27 the sum variable is not referenced anymore and it could be (assuming tracing garbage collection) not present on stack anymore. With deterministic GC as in Vala the example would be slightly more complicated as it would involve going out of scope. In any case the computation do have effects even if it is not referenced.

Furthermore the GLib implementation of weak references uses the the global mutex to protect internal data structures and prevent race conditions which is probably a bit problematic for use in parallelisation primitive.

Error handling

One of the open questions currently is how to handle the errors. By concept the futures are simply a values and getting value should not raise an error. On the other hand such approach might be impractical.

I still consider exact way it should be handled. The possible solutions would be:

Move error handling to user (return either value or error, nullable value with separate error future etc.), which is currently implemented.
Use C++-style exception handling when the wait & co. can simply throw any error.
Use Java-style exception handling when the wait & co. can throw an error indicating that something have went wrong.

Side note: Overhead & light maps

Most futures require at least mutex, conditional variable and a result value. Possibly it will be improved in later versions but it is good-enough for heavy-weight tasks. However it would be an overkill if we had a future getting temperature and we needed to convert it from Fahrenheit to Celsius or other way round – especially if we wanted to use larger number of maps which are chained.

To overcome this limitation the light maps were introduced. They have much lower memory consumption and don’t require any synchronization in them. On the other hand the function might be recalled any number of times.