Pogoscript Notes

What it means to be async

One of the things that really appealed to me about Node.js was it’s single-threaded non-blocking approach to IO. It must have been around the time I read the paper about the next generation of SQL databases: they described a database architecture that had dozens of single-threaded nodes executing queries one-at-a-time. This meant they didn’t have to worry about all the problems that came with programming in a multithreaded environment - the code was less buggy, and because of the absence of locks, it faster too.

This made a lot of sense. Especially if you can keep the duration of a transaction pretty short, you can trick the user into thinking that the server is actually parallel. A bit like how time-sharing in an OS tricks user-programs into thinking that they’re executing in parallel with other programs, but for users, the time-slice can be much bigger before they notice, 10s of milliseconds is ok. 140ms seems to be a magic number: if a user presses a button, you have about 140ms to show some feedback before the user suspects it isn’t working.

Aside from async IO, Node.js was based upon one of the fastest dynamic languages known to man. I was hooked. But async IO was seriously complicated. I mean, complicated in a way that made you think, which is not usually a bad thing, but eventually you just want to get stuff done. Kind of like how I used Ruby. Ruby for me was a workhorse. It was the tool I used to put stuff together without having to think about the technical details. So when I wanted to use Node.js as my workhorse it didn’t work out. In fact, I struggled to do even the simplest of operations, like for example, stat files in a directory.

In ruby, it looks a bit like this:

files = Dir['*']
files.each do |file|
    puts "#{file} #{File.stat(file).size}"
end
puts "count: #{files.size}"

In Node.js, its like this:

var fs = require('fs');

fs.readdir('.', function (err, files) {
    var outstanding = 0;

    for (var n = 0; n < files.length; n++) {
        var file = files[n];
        (function (file) {
            outstanding++;
            fs.stat(file, function (err, fileStat) {
                console.log(file + " " + fileStat.size);
                outstanding--;
                if (outstanding==0) {
                    console.log("count: " + n);
                }
            });
        })(file);
    }
});

You could get lost in that. Granted, there are easier ways to do this with the abundance of async control flow libraries that now exist. But this code, and even the libraries that tried to fix it was a sign of a deeper problem. One that I beleived could be fixed at the programming language level.

So I was compelled to designed a new programming language: Pogoscript. Pogoscript contains an operator that can be used to make asynchronous calls without having to think about it. The above code can now be rewritten like this:

fs = require 'fs'

files = fs.readdir! '.'

for each @(file) in (files)
    console.log "#(file) #(fs.stat! (file).size)"

console.log "count: #(files.length)"

Thankfully more akin to our beloved Ruby.

Now you may argue that Ruby is still cleaner, not least because we don’t need ! operators everywhere we make a calls to IO. But I want to argue that this is to Pogoscript’s advantage.

It all comes back to the idea that IO must be asynchronous to be performant. In Ruby, IO is not asynchronous and performance is terrible. This is not just because Ruby has a slower runtime. Asynchronous IO means that you can process other things while an IO operation completes, and more, you can make further IO requests too. This has major implications in web servers and is the primary reason why Nginx is faster than Apache. So to optimise the ls operations above in Node.js, you have the option to execute all the stats concurrently. This is an extremely efficient use of resources, and far better than executing stat on multiple threads, as you might be tempted to do in Java or C#. I can’t speak freely on Java, but I know that C# has reasonably good support for async IO, but it’s still in a tricky multithreaded environment and it fails to show the same performance as Node.js (in my tests, about 3 times slower.)

I say you have the option because sometimes you don’t want things to be run concurrently. Even though concurrency in a single-threaded application is orders of magnitude more predictable than concurrency in a multithreaded application, it’s still not something you want to do blind. You still need to know that IO requests can complete in an order different to the order you made them.

A very interesting programming language is Scheme. Among many other things, Scheme included the notion that all function calls are executed by passing their continuation as an argument. Now this statement may seem a little foreign to many, but if you’re familiar with Javascript, or more generally with asynchronous IO, then you’ll see some parallels. The idea is that when you call a function, you pass another function that is to be called with the result of the original called function. In Javascript, you could see it a bit like this:

var add = function(a, b, continuation) {
    continuation(a + b);
};

add(3, 4, function(result) {
    console.log(result);
});

But imagine that done for all functions in your program. And then imagine that you don’t actually write it like that, but you write it normally, like this:

var add = function(a, b) {
    return a + b;
};

console.log(add(3, 4));

And the compiler rewrites it into the first form.

This form is known as CPS or Continuation Passing Style. Programs written in CPS allow for some very funky semantics. In Scheme, you can access the current continuation, when you call it, the program execution jumps back to where you accessed the continuation. Continuations are like program wide goto statements, but they’re far more powerful. Instead of just jumping back to somewhere in your code, you can pass values to the jumped-to code. You can rewind your program’s history, this time with a slight correction in time. This is pretty much how Prolog works, it tries all kinds of program branches, backtracking when it fails and eventually finding results that work. Lambda: the ultimate mind fuck.

So naturally I was tempted to think that Pogoscript could be the same. Why not make everything asynchronous? CPS all the way down. I was tempted, but I knew that compatibility with existing Javascript libraries would be nightmarish. But it turns out to be far more fundamental than that: with CPS, you can’t really know what’s going to happen to your state between calls.

Consider the following classic concurrency problem, you need to transfer money from Jeff’s account to Mary’s account.

transfer (amount) from (fromAccount) to (toAccount) =
    fromAccount.withdraw (amount)
    toAccount.diposit (amount)

transfer 50 from (jeffsAccount) to (marysAccount)

Ordinarily this would work in a single-threaded environment, every time. There are some simplicities here, we’re assuming that the accounts exist in memory so there’s no IO for loading and saving them. Now imagine if the withdraw and diposit methods were asynchronous, so we add the ! operator:

transfer! (amount) from (fromAccount) to (toAccount) =
    fromAccount.withdraw! (amount)
    toAccount.diposit! (amount)

transfer! 50 from (jeffsAccount) to (marysAccount)

Which would probably also work all the time too. But because those operations are asynchronous, all kinds of other things can happen immediately after the withdraw!, in fact, we can’t be sure that withdraw! will ever return. Indeed, we can’t even be sure that withdraw! will return just once, it could return multiple times. Also, if by chance you calculated the total balance of all accounts between the withdraw! and the diposit! you’d be short. These are the same risks you run in multithreaded code, all bets are off, and you’d be tempted to wrap these statements in mutexes, or use STM or other such obtuse mechanisms. Now imagine if we used CPS implicitly like in Scheme, where every function or method call was asynchronous. It would look like the first example, but behave like the second. That’s where single-, or cooperatively-threaded programs are at an advantage, you can know that whole sections of code will be executed at once, without other code being scheduled in to mess with your inconsistent state.

The concurrency combination that Node.js offers - cooperative on the inside, message passing on the outside - is very exciting, and I wonder if it might just get us past the end of Moore’s Law.

The right approach for languages on this platform is to make asynchronous calls easy, but explicit. It strikes a good balance I think, Pogoscript is a language that lets you play with continuations for fun, has non-blocking asynchronous IO, but also operates in a single-threaded, predictable environment. It’s fast, concurrent, powerful and frees you up to think about the big things.

comments powered by Disqus