Sunday, March 9, 2014

Parles: A Language With Less Irritating Superfluous Parenthesization

Well, I haven't blogged in rather a while. But, what with being in a Linguistics MA program, I've been feeling the need to take some me-time and do some Computer Science recently, and I might as well tell the world about it.

Building off my thoughts from Lisp Without Parentheses and Pointful Concatenative Programming, I've started working on a new language called Parles.

Most programming languages seem to me to have a rather inconsistent idea about evaluation order within a line- it usually goes left-to-right, the direction you read in, except when you have to jump backwards to the left to figure out what function to apply to all those argument expressions you just read left-to-right. Most concatenative languages are more straightforward- everything always goes left-to-right; but despite the wonderful consistency, this seems to cause Discomfort and Confusion to people coming from traditional Algol-style languages beyond what comes from just dealing with weird syntax (or the utter lack thereof- even less syntax than LISPs!). I have tried to resolve this issue in the design of Parles.

At its most basic level, Parles is essentially FORTH, read backwards: evaluation proceeds from right to left, with arguments evaluated before function calls (ignoring for a moment that all arguments *are* function calls), and appearing after them in the program text.

There are two ways to alter this basic behavior: semicolons and pipes.
The semicolon is the sequence operator; it is eliminated at parse time and causes everything parsed prior to the semicolon to be post-fixed to whatever comes after, up to the next semicolon or end-of-block. If you end every line with a semicolon, this will cause lines to be evaluated top-to-bottom, just as pretty much everybody ever expects them to be. Thus,

+ 1 2;
* 3 4;

is equivalent to

* 3 4 + 1 2

The trailing semicolon is irrelevant- it causes the entire program to be postfixed to the empty string, which is a no-op.

The pipe (|) operator behaves like a unix shell pipe- take the output from the stuff on the left, and feed it to the stuff on the right. It's essentially identical to the semicolon, except that it binds more tightly (re-ordering occurs up to the next pipe, semicolon, or block boundary) and encodes a type assertion that the output row type of the left side exactly matches the input row type of the right side. E.g., while

1 2 3; +

is totally valid, and results in "3 3" on the stack (with + popping 1 and 2 off the stack and replacing them with 3, and the original three coasting underneath untouched)

1 2 3 | +

is a type error, as the compound expression "1 2 3" has output type "int int int" while + expects only two ints as input. Pipe can be used to simulate infix operators, as long as there is a block boundary or semicolons to properly delimit the expressions whose types are to be matched. E.g.,

1 | + 2

is a valid infix version of

+ 2 1

There are three kinds of blocks, using three kinds of brackets: [], (), and {}.
[] blocks are called quotations and are taken pretty much straight from FORTH: they produce a function value whose type is equal to the type of the body. These are Parles's primary means of abstraction and delayed computation. In addition to all of the things that you would usually use functions for, they are necessary for using control-flow operators. For example, "if" in Parles is a function which takes a boolean and two quotations, and applies one or the other of them depending on the value of the boolean. Its usage looks like this:

if < 2 3
[print "true"]
[print "false"]

This is an excellent example of a circumstance where you do not want to end a line with a semicolon. Inserting semicolons would require re-ordering lines as follows (so that they will be properly un-reordered by the parser):

[print "false"];
[print "true"];
if < 2 3

which is the scary weird order that most FORTHs will make you write it in (modulo each line still being written backwards).

() blocks are argument blocks. They are evaluated immediately, but introduce a type assertion that the contained expression requires no arguments itself- i.e., takes nothing off the stack. Thus, they represent things that you would put parentheses around if you were writing in a traditional LISP. Indeed, you can write Parles to look like Scheme if you feel like putting parentheses everywhere you possibly can and not doing any interesting tricks with floating extra arguments on the stack. I considered extending the type assertion to require that the body terminate having placed exactly one value on the stack to serve as the return value, but plenty of LISPs already allow multi-value return, and it'd be a shame to eliminate that feature from a language to which it belongs so naturally.
Aside from organizing and documenting things for the sake of future readers of the code, the primary purpose of argument blocks is to annotate sections of code that can be executed in parallel due to a lack of input dependencies. A smart compiler would be able to identify such regions automatically, rather than just verify them when pointed out by a human, but that's Hard, so I probably won't implement it for a while, and putting in some clues to make your code easier to read is a good thing anyway.

{} blocks are just regular blocks, and they have no particularly special function except delimiting scope. As mentioned above, block boundaries limit semicolon and pipe re-ordering. In addition, however, all three kinds of blocks allow for an optional argument declaration that binds local variables by popping values off the stack. Thus, binding local variables is likely to be the most common use for regular blocks. That usage looks like this:

{a: str -> print print a a} "waka"

which binds "waka" to the string variable "a" and prints it twice ("wakawaka"). (This is slightly different from the syntax I played with in Pointful Concatenative Programming because I ended up needing the pipe character for something else.) All arguments (at least for now, until the type inference system is more advanced) must have a type declaration, and variables listed in the same order as the expressions which produced the values they bind to. E.g.,

{a: str b: num -> b a} "hi" 1

will bind "hi" to "a" and "1" to "b" (and swap their positions on the stack when it's done).

Quotations are a little bit special. Their argument binding behavior is just like what one would expect from anonymous functions in any other language: arguments are taken from the stack at the point of application, not at the point of creation (technically, this is true for all blocks, but for the others those are the same time). Quotations also form closures over their free variables that are bound in higher scopes.

So far, all functions (except built-in ones like "if") are anonymous. There is an "apply" operator for executing quotations stored on the top of the stack, but it is also possible to give quotations names by binding them to variables (as in pretty much every other language with first-class functions). In order to ensure equality between built-in and programmer-defined functions, using a variable that holds a quotation does not place that quotation value back on the stack, like accessing another value would; rather, it results in the application of the stored quotation. This means you can bind quotations to variable names and then use them just like the built-in words, without needing to use "apply". If you don't want them to be executed, just re-quote.

There is also one additional way to bind variables: using a lambda binding. This consists of the unicode lowercase letter lambda (λ) or a backslash (\) followed immediately by a name and type assertion. This is translated at parse time into a block binding a single variable whose scope extends left until the next explicit block boundary (or EOF). The intent is that you put these things at the beginning of a line terminated by a semicolon, kind of like a regular variable declaration in an Algol-style language, but you can put them anywhere you like. So, one might decide to declare a new function like this:

\sayhi [print "!" print print "hi "];
sayhi "Bob"

or

\sayhi [name: str -> print "!" print name print "hi "];
sayhi "Bob"

which will both print out "hi Bob!"

All variables are (for now) immutable, but can be shadowed as much as you like. That includes all of the built-in functions; so far, the only keywords or reserved symbols are: the various brackets, the argument arrow "->", the colon for introducing type assertions, the backslash and lambda characters, and the semicolon and pipe characters. Everything else is free game for redefinition by shadowing. Use this power wisely.

As of now, you sadly can't actually run any Parles programs; there's an outline of a virtual machine for it, but no code-generator. The parser and type-checker (a critical component of the compiler, as you can't figure out what hooks up with what unless you can figure out the in- and out-arity of every word) are working though, so you can at least parse and type-check Parles programs with the Python scripts in the Github repo.