Thursday, June 11, 2015

Uniform Call Syntax in Parles

After a long hiatus, I've come back to do a little more work on Parles. The GitHub repository now has a very simple but functioning compiler and VM that can run very simple programs. Now, I've got some thoughts on the next feature I'd like to add. Uniform Function Call Syntax is a concept implemented in several modern programming languages (like D, or Magpie) which helps make the core language smaller by treating object-oriented method calls, of the form foo.bar(baz), as simple syntactic sugar for function calls of the form bar(foo, baz). If you remove the parentheses, it becomes obvious that this is a fairly simple reordering rule: foo . bar baz gets rewritten as bar foo baz, with bar and foo switching places.

Parles already has two similar bits of token-reordering syntactic sugar (the ; and | operators), so adding a third one should really be no big deal. Since parentheses are already allowed in exactly the positions that you would put them in a method call in a language like C++ or JavaScript, this minor addition would allow Parles to simulate OO method call syntax exactly. We just need to figure out the rules for what exactly gets swapped with what.

The obvious simple solution is to just swap expressions (individual words or blocks). But, what if the expression preceding . produces multiple values? Calling a method on multiple values doesn't make much sense. It would be nice to define the . operator so that it requires the programmer to use it in a way that "makes sense" in terms of OO method calls, rather than as an arbitrary re-ordering operator. To handle that, we'll need to look at type information to guarantee that the output of foo is in fact a single value. It's also important that foo not require any arguments- i.e., that it actually be a value, not a partially-applied function call.

Additionally, we need to ensure that the method name (whatever comes after a .) is in fact a name - a single word - not an arbitrary expression. In other words, we want to make sure that programmers can write things like foo.bar, but not foo."hello world" or foo.(+ 1 2). That's a very simple syntax check that doesn't require any type information. It may also be a good idea to ensure that bar is a function that takes at least one argument (namely, foo).

These typing rules are similar to the restrictions that come with parens, and can be handled in a similar way. So, we'll start out by simply swapping the expressions on either side of a ., but adding implicit type annotations to the AST so that the type checker can ensure that a method call "make sense".

Method Chaining

Swapping single expressions on either side of a . lets us do simple method calls like foo.bar(baz), which gets re-written as bar foo (baz). But what if we want to chain method calls together, writing something like foo.bar(baz).qux(fred)? If . associates to the left, this gets re-written in two steps as bar foo (baz).qux(fred) and then bar foo qux (baz) (fred), which is not what we want! It should be qux bar foo (baz) (fred). We don't want to use just the first expression to the left as the first argument to qux- we want to use the whole preceding method call. There are several possible approaches to fixing this.

First, we could try using type information, just like we would for auto-parenthesization. There is some circularity involved if we use type information to figure out how to re-order things, because we can't actually do type unification until we know the proper order for all expressions, but there is a way around that: only the left argument is potentially ambiguous (since we already said the right argument must always be a single word), so the type checker just needs to keep track of the types of all contiguous subsequences of expressions up to the point where it encounters a .. That takes O(n^2) time for arbitrarily long subsequences, but we can skip sequences that cross block boundaries or ; or | operators. In that case, splitting a long program into two halves with a single block or re-ordering operator also cuts the typechecking time in half. As long as the basic chunks are small enough, this should be reasonable fast, close to O(n) time to typecheck a complete program. With progressive typechecking, every time a . is encountered, the compiler can then select some previous sequence that takes no arguments and returns a single value, and use that sequence as the left argument for the . reordering operator. If the Parles implementation does auto-parenthesization, the necessary type information will have to be calculated anyway, and re-using it for method call left-argument identification adds no additional typechecking overhead. Given that there may be more than one type-compatible left subsequence, though, we still need a rule for how to consistently select exactly one of them. There are two sane options: take the longest type-appropriate sequence, or the shortest.
The longest possible sequence is unlikely to be what you really want in most cases, and it potentially requires programmers to hold a lot more stuff in their heads to understand how any particular program is put together. Using the shortest possible sequence seems like a much better choice; it keeps the scope of reordering smaller and easier to think about, and in most cases will identify just a single expression anyway. Unfortunately, that is also a problem. For example, if baz returns a single value, then foo.bar(baz).qux(fred) will end up being re-written as bar foo qux (baz) (fred), which is not what we wanted! This approach would work fine for chaining method calls with functions that take zero, two, or more arguments, but fails for methods of one argument, which is not an uncommon case.

So, it seems that relying on the typechecker to tell us how to rewrite method calls is not going to work. The second option is to simply ignore the problem, continue to only swap single expressions on either side of a ., and make programmers explicitly mark off whatever they want to function as the left argument in a method call with a block of some sort whenever it is larger than a single word. In that case, we'd just require foo.bar(baz).qux(fred) to be written instead as (foo.bar baz).qux(fred), or {foo.bar baz}.qux(fred). That fails to faithfully simulate how method chaining works in other languages, though, and is thus rather unsatisfying.

The third option is to still require explicit delimiters for left method arguments, but in a slightly more subtle way: we can simply always collect everything to the left of a . up to the last enclosing block boundary, variable binding, |, or ;, regardless of type information. That means that, if you want to embed foo.bar(baz) in the middle of a larger line, you'll probably have to explicitly mark it off as {foo.bar(baz)} so that the . operator doesn't suck up anything more to the left of foo. But honestly, if you're writing Parles code in an imperative, OO style where method call syntax makes sense, you're not likely to want to embed {foo.bar(baz)} in the middle of a line- you'll stick it on a line by itself terminated by a semicolon and beginning with a variable binding and/or preceded by a block boundary or another line terminated by a semicolon.

After all that consideration, then, that's what Parles will do: . becomes a straightforward re-ordering operator working purely at the surface syntactic level, like ; and |, which swaps everything to the left up the the last enclosing block, variable binding, semicolon, or pipe, with a single following word, and which introduces implicit type annotations similar to those implied by parentheses.

Addendum: the git commit which implements method call syntax can be seen here on GitHub.