Thursday, June 22, 2017

Language & Storytelling

Or, Why Authors Should Care About Conlangs

"Show, don't tell."

This is possibly the most oft repeated, best known bit of writing advice ever uttered. But what does it actually mean? Aside from the occasional illustration or diagram, writing is not a primarily visual medium. You are very restricted in what you can actually show. Typically, it breaks down into a metaphor for things like "give good description" and "let the reader make their own inferences"; i.e. "don't be straightforward and boring".

But there is one thing that you actually can literally show in writing, without resorting to illustrations or diagrams--one thing which nevertheless is fairly consistently ignored by most proponents of the "show, don't tell" mantra: The language itself. The language you are writing in, and the language your characters speak.

There is, however, another bit of advice which seems to preclude showing off the manipulation of language itself as a literary device in prose: that being that the language should be invisible. Ideally, the reader forgets that they are reading, becoming immersed in the story. And it turns out that things like accurately transcribed, realistic dialog, accurate phonetic representations of dialectal speech, and similar uses of showing language are super annoying and hard to read! But that does not mean that you can never show language at all--it just means that you have to be careful. When appropriate, doing so can be very powerful.

Showing and even calling attention to features of language itself is powerful because language is intimately tied up with personal and group identity. At the simplest level, this is why good characters have distinct voices; personal identity comes through in the sometime tiny idiolectal differences in how people use their language, making them identifiable even when you can't hear their literal voice. Being able to write distinct voices, different the authors own natural voice, such that the reader can tell who is speaking without the need of dialog tags is a relatively rare but extremely valuable skill, and it is all about literally showing rather than telling.

But, the significance of language to identity goes much deeper than that. Beyond individual voice, peoples' choice of style, dialect, and even what language to speak in often hinges on establishing, or refuting, membership in a group. This is a large part of the function of slang--using it to show that you're part of the "cool" crowd, watching who doesn't use it, or who uses it incorrectly to identify outsiders to your peer group, or even refusing to use it to distance yourself from that group.

One of the best examples I know of of using language to establish identity comes from the film Dances With Wolves. At one point our protagonist, John Dunbar, has been taken prisoner by the army he used to be a part of, and is being interrogated--in English, of course. Now, John is a native speaker of English, and he is clearly capable of answering in English. He could simply refuse to speak. He could say "I refuse to answer your questions", or any number of other things. But what he actually says is:


"Sugmanitutanka ob washte..."

(Followed by some additional words of Lakota which I have not been able to find transcribed.)

Through subtitles, the audience knows that he is telling his captors "My name is Dances with Wolves. I will not talk to you anymore. You are not worth talking to."

Now, imagine if he simply said that, in English. Would it have nearly the same emotional impact for the viewer? Would it have the same effect on the other characters? No, of course not! By his choice of language, John is communicating his changed sense of identity--that he now considers himself not an American soldier, but a Lakota.

In films and TV shows, if you are going to portray linguistic diversity at all, you basically have no choice but to, well, actually portray it. I.e., have the actors actually speak a different language! (Even in Star Trek, with the conceit of the universal translator in effect, they at least have the alien characters speak gibberish in those rare instances when the translator breaks.) The complexity of ensuring the audience remains engaged, and the extra costs involved, more often than not result in simply avoiding or ignoring issues of linguistic diversity, but if you're going to do it, you have to do it. And when the details of the story make it implausible or politically unwise to use a real human language, you get someone to make a conlang--hence Klingon, Na'v, Dothraki, etc.

Writers, on the other hand, have a cheat. You can always just write, "My name is Dances with Wolves," he said in Lakota, and never have to learn or invent a single word of another language, or figure out how to keep the reader engaged. The language remains invisible. But you have then committed the grave error of telling when you could have shown! Indeed, of telling in the one singular situation where literal showing is actually possible!

"But," you say, "then the reader won't understand what's going on!"

Well, for one thing, maybe that's the point. Sometimes, the narrator or viewpoint character won't understand what's being said, and the reader doesn't need to understand it either. There may be legitimate disagreement on this one, but consider: is it better to write something like:

She said something I couldn't understand, but seemed to be in distress.

or something like

She said something like "Pomogitye! Moya sobaka gorit!", which of course I didn't understand a word of, but I could tell by her tone she was distressed.

Personally, I'm inclined to go with option 2. And if the reader is supposed to understand what's going on? As a writer, you have it so much easier than the film makers! They have to resort to subtitles, which distract the viewer from the scene. But your readers are already reading! Sticking in a subtitle costs you nothing!

"Sugmanitutanka ob washte..." John said. My name is Dances with Wolves.

And you don't need to be J.R.R. Tolkien to do a good job with a conlang in a story. In fact, while Tolkien was a fantastic conlanger and worldbuilder, and while I may be metaphorically burned at the stake for daring to say this... Tolkien wasn't really the best at integrating different languages into his stories. Dropping in a page of non-English poetry every once in a while, while exciting for language nerds like me, generally results in readers just skimming over that page, even when the reader is a language nerd like me.

So, writers: consider the languages your characters use. Consider actually showing them. And, if appropriate, consider a conlang. If you're not up to conlanging or translating yourself, help is easy to find.

Tuesday, June 20, 2017

Stochastic Democracy

Or, a method for the fair and equitable distribution of political power.

Much thought has been given to methods of ensuring fair and accurate representation in a representative democracy. Ideally, if, for example, 40% of the population supports one party and 60% of the population, then 40% of the representatives in any given governing body should be from the first party and 60% from the second (or at least, 40% and 60% of the votes should come from each party, if we relax the assumption that one rep gets one vote), within the limits of rounding errors introduced by having a small finite number of representatives.

Let us assume, however, that you have instituted a perfect system of ranked-choice voting (or something similar) and a perfect and unbiased system for drawing districts from which representatives will be selected, so that you always have such perfect representation. Or, presume that you have a perfect direct democracy, so issues of proportional representation never come up in the first place. At that point, there is still another problem to be solved: the tyranny of the majority.

The trouble is that what we really want is not proportionate representation at all; it is proportionate distribution of power. Representation is merely a poor, but easier to measure, approximation for power. And in a perfect representative government where 60% of the constituency supports one party, and 60% of the governing votes are controlled by that party, they do not have 60% of the power- they have all of it, presuming you go with a simple majority voting scheme to pass legislation. If you require some larger plurality, however, the problem still does not go away; supermajority voting requirements simply mean that it takes a larger majority to become tyrannical, and in the meantime you have a roadblock: the minority party may not be able to pass anything, but it can keep the other side from passing anything either! Nobody getting anything done may be an equitable distribution of power, but only because any percentage of 0 is still 0.

It would be better if we could somehow guarantee that a 60% majority party would get what they want 60% of the time when in conflict with the minority, and a 40% minority would get what they want 40% of the time when in conflict with the majority. It turns out that there is a remarkably simple way to guarantee this result!

Rather than allowing the passing of legislation to be decided by a purely deterministic process, we introduce an element of randomness. When any issue is voted upon, the action to be taken should not be determined simply by whichever side gets the most votes; rather, the result of a poll is a random selection from a distribution of options, weighted by the number of votes cast for each one. If all representatives always vote strictly along party lines, then over a large number of votes on issues over which the two parties disagree, 60% will be decided in favor of the majority party, and 40% in favor of the minority party- but with no way to predict ahead of time which 60 or 40% they will be, such that it is impossible to game the system by strategically timing the introduction of certain bills.

Even if we imagine a more extreme split, like 90% vs. 10%, it is still impossible under this system for the majority to run away with tyrannical measures that harm the minority for very long. Should they try, it incentivizes the minority party to introduce ever more extreme countermeasure proposals more and more frequently, one of which is guaranteed to eventually pass! The majority party therefore has its own incentive to attempt to build consensus even with small minority factions, creating legislation that will benefit everyone.


Of course, it seems highly unlikely that any country could be convinced to run their government this way in real life! But, there are I think two good reasons for considering the idea. First, this could make for an interesting political backdrop in a sci-fi or fantasy story; perhaps your fantasy society finds the inclusion of randomness in their political process to be a religious imperative, as it is through the interpretation of random events that the will of the gods is revealed. Second, it highlights to mostly-overlooked but very important distinction between equitable representation, and equitable distribution of power. Hopefully, having looked a this simple solution to the problem will help someone to discover another, perhaps more practically tenable, one.

Wednesday, January 4, 2017

Building a 4D World

In my last post, I described how to orient a virtual camera in four dimensions. Now, we're going to give that camera something to look at.

Making art assets for regular games is hard enough; building full 4D models - things which allow you to look at them from any arbitrary vantage point in 4 dimensions - is just ridiculous. So procedural generation and simple geometric objects will be our friends. In fact, let's just do nothing but try to look at a randomly-generated 4D maze, constructed entirely out of right-angled, unit-sized hypercubes, as viewed through a 3D hyperplane slicing through the maze at an arbitrary angle and position.

Maze generation algorithms are all basically variations on finding spanning trees for the graph of maze cells, and as such are not strongly tied to any particular dimensionality; any of them can be fairly easily converted to work on arbitrary graphs, or in grids of arbitrary dimension. So, we'll leave maze generation as a problem space aside, and just assume that we can get a good maze map.

One way to render our view of the maze would be to represent the walls of the maze with polygons in 4 dimensions, and calculate the intersection of every edge with the currently-visible hyperplane to generate a 3D model that can be rendered in The Usual Way, using your 3D rendering library of choice.

Calculating all of those intersections, though, seems really complicated, and there's a much simpler option that takes advantage of the fact that our maze is restricted to a hypercubical grid: we can use raycasting!

Raycasting is more typically used to give the illusion of 3D with a 2D map, in games like Wolfenstein 3D. It involves marching a ray from the camera's position through the grid until you find a wall, and then drawing a column of pixels with height scaled based on the length of the ray, so that walls appear to recede into the distance. The code for this is fairly simple, and when you only have to cast one ray per pixel of horizontal resolution, its fast enough that you can even do it in pure JavaScript in a web browser. In order to render a 3D slice of a 4D map, we'll have to instead cast one ray for every pixel on the whole two-dimensional screen. That's significantly slower, but the code for casting each individual ray is not much more complex- it's merely about twice as long as that for a traditional raycaster, due to handling twice as many cases.

Casting Rays & Finding Walls

To get reasonable framerates, I implemented my 4D raycaster in GLSL to run on a GPU. The first step is calculating the appropriate ray given a pixel position:
uniform float u_depth; uniform vec2 u_resolution; uniform vec4 u_origin; uniform vec4 u_rgt; uniform vec4 u_up; uniform vec4 u_fwd; void main(){ vec2 coords = gl_FragCoord.xy - (u_resolution / 2.0); vec4 ray = u_fwd*u_depth + u_rgt*coords.x + u_up*coords.y; gl_FragColor = cast_vec(u_origin, ray, 10.0); }
(For the uninitiated, "uniform" variables have their values passed in from outside the shader program.)

This code converts the pixel location into a new coordinate system with the origin at the center of the field of view, instead of one corner. Then it calculates a a direction in which to cast a ray based on the camera's egocentric coordinate system; the depth of the virtual camera in front of the screen specifies how much of the forward direction to use, which constrains the viewing angle, while the x and y pixel coordinates indicate how far to go along the right and up axes. Note that, so far, aside from the fact that all of the vectors have four components, this looks entirely like 3D code; there is no mention of the fourth, ana, axis. This is because we only care about casting rays contained in the hyperplane defined by the F, R, and U axes, so we only need to construct rays with components from those basis vectors. If we wanted to view a projection of the world into a 3D hyperplane, rather than a slice, then we could define an arbitrary w-resolution, in addition to the x and y resolutions, and cast additional rays with components from the ana basis vector of the camera's coordinate system. But multiplying the number of rays like that not do good things to your framerate!

(Note that, although we only use a 3D subset of the camera's egocentric coordinate system, the resulting ray can, and usually will, have non-zero components in all four of the x, y, z, and w axes. This will only fail to be the case if the egocentric coordinate system is exactly parallel to the map grid.)

After calculating the ray corresponding to a particular pixel, we pass the origin point, the ray, and a distance limit into the raycasting alorithm, which returns a pixel color. So, let's look at how that works. First, a useful utility function:
// Find the distance to the next cell boundary // for a particular vector component float cast_comp(vec4 v, float o, out int sign, out int m){ float delta, fm; if(v.x > 0.0){ sign = 1; fm = floor(o); delta = fm + 1.0 - o; }else{ sign = -1; fm = ceil(o - 1.0); delta = fm - o; } m = int(fm); return length(vec4(delta,delta*v.yzw/v.x));
} This does three things:
  1. Calculates the distance you have to move in the direction of a ray, starting from a particular origin point, before you hit a cell boundary in a particular grad-axis-aligned direction.
  2. Calculates the sign of the ray's propagation along a particular grid-axis direction.
  3. Calculates the integer position of the grid cell in which the ray originates along a particular grid-axis.
All of this information is necessary to initialize the main raycasting algorithm.
// Starting from the player, we find the nearest gridlines // in each dimension. We move to whichever is closer and // check for a wall. Then we repeat until we've traced the // entire length of the ray. vec4 cast_vec(vec4 o, vec4 v, float range){ v = normalize(v); // Get the initial distances from the starting // point to the next cell boundaries. int4 s, m; vec4 dists = vec4 cast_comp(v.xyzw, o.x, s.x, m.x), cast_comp(v.yxzw, o.y, s.y, m.y), cast_comp(v.zxyw, o.z, s.z, m.z), cast_comp(v.wxyz, o.w, s.w, m.w) ); // Inverting the elements of a normalized vector // gives the distances you have to move along that // vector to hit a cell boundary perpendicular // to each dimension. vec4 deltas = abs(vec4(1.0/v.x, 1.0/v.y, 1.0/v.z, 1.0/v.w));

The deltas give us the last bit of information we need to initialize the algorithm- how big the steps are in the direction of the ray in order to move one full grid unit in a grid-axis-aligned direction. This is different from the initial distance needed to get from an origin point somewhere inside a cell to the cell boundary, calculated above.
// Keep track of total distance. float distance = 0.0; // Keep track of the dimension perpendicular // to the last cell boundary, and the value of the // last cell the ray passed through. int dim, value; // while loops are not allowed, so we have to use // a for loop with a fixed large max number of iterations for(int i = 0; i < 1000; i++){ // Find the next closest cell boundary // and increment distances appropriately if(dists.x < dists.y && dists.x < dists.z && dists.x < dists.w){ dim = 1*s.x; m.x += s.x; distance = dists.x; dists.x += deltas.x; }else if(dists.y < dists.z && dists.y < dists.w){ dim = 2*s.y; m.y += s.y; distance = dists.y; dists.y += deltas.y; }else if(dists.z < dists.w){ dim = 3*s.z; m.z += s.z; distance = dists.z; dists.z += deltas.z; }else{ dim = 4*s.w; m.w += s.w; distance = dists.w; dists.w += deltas.w; }

In this section, we keep track of the length of the ray that it would take to get to the next cell boundary in any of the four grid axes, stored in the dists vector. Whichever one is smallest becomes the new length of the ray. Additionally, we record which axis we stepped along (in dim), update the coordinates of the cell through which the ray will pass next (in the m vector), and then increment the length of the theoretical ray which would hit the next cell boundary along the same grid axis.

After that, we just check to see if we've hit a wall yet. If there were any other objects in the game, this is where we'd add more detailed intersection checks for the objects within a particular cell, but with just a simple maze we just have to check the value of the cell to see if it is a wall or not:
value = get_cell(m.x, m.y, m.z, m.w); // If the cell is a wall, or we've gone too far, terminate. if(value == 255 || distance >= range){ break; } } // Calculate the actual intersection coordinates // and use them to look up a texture vec4 ray = o + distance * v; vec3 tex = calc_tex(dim, ray); return vec4(tex, 1.0); }

Once the casting loop terminates, we can use the coordinates of the end of the completed ray to look up the color of the bit of wall we ran into
The conversion of the vec3 color to a vec4 in the last line is just to add in an alpha channel, which is always 1, and is thus left out of the procedural texture code below for simplicity.

3D Textures

Once we've found a wall at the end of a ray, the coordinate for the axis perpendicular to that wall (identified by the value of dim), will have an integer value, while the fractional parts of the remaining three coordinates describe the position inside a cube which forms the boundary of one side of the hypercubical grid cell. This is exactly analogous to the 3D situation, where we would have two non-integer coordinates describing a location inside a square that forms the boundary of a cubical grid cell, or the 2D situation (where raycasting is more typically applied) where we have one non-integer coordinate identifying a position along a line forming the boundary of a square.

In order to figure out the color of a particular spot on the 3D wall of a 4D cell, we'll need three-dimensional textures. Say that we want our walls to be 512 pixels wide; for a 2D wall around a 3D cell. With one byte for each of three color channels, that means we'd need a texture taking up a little over 93KB in memory. That's entirely reasonable. We could easily have different textures for walls facing in all six directions around a cube, to help orient you in the maze. But for a 3D wall around a 4D cell, we'd need over 50 megabytes for each texture. Even if I had the digital artistic talent to create full 3D textures like that, that's rather a lot of memory to devote to textures. Once again, we can turn to procedural generation to create interesting wall textures.

There are lots of ways to do procedural texture generation, but the algorithm I picked looks basically like this:
uniform vec3 u_seed; const vec3 grey = vec3(0.2); const vec3 red = vec3(1.0,0.5,0.5); const vec3 green = vec3(0.5,1.0,0.5); const vec3 blue = vec3(0.5,0.5,1.0); const vec3 yellow = vec3(0.71,0.71,0.5); vec3 calc_tex(int dim, vec4 ray){ ray = fract(ray); vec3 coords, tint; if(dim == 1 || dim == -1){ coords = ray.yzw; tint = red; } else if(dim == 2 || dim == -2){ coords = ray.xzw; tint = green; } else if(dim == 3 || dim == -3){ coords = ray.xyw; tint = blue; } else if(dim == 4 || dim == -4){ coords = ray.xyz; tint = yellow; } float h = julia(coords, u_seed); if(h == 0.0){ return mix(tint/16.0, grey, layered_noise(coords, 3, 3)); } vec3 base = texture2D(u_colorscale, vec2(h, 0.5)).rgb; return mix(tint/8.0, base, layered_noise(coords, 5, 3)); }

It starts with a basic case analysis to pick some parameters based on which direction we hit the wall from, and thus which kind of hypercube boundary cell we're calculating the texture for.

Then, the basic texture is drawn from a 3D generalization of a Julia fractal. In order to break up the psychedelic color bands produced at the edges, we layer on top of this a bunch of 3D simplex noise at multiple frequencies; the layered_noise function (implementation not shown) takes a start and a length for the range of octaves of basic noise that should be added together.

In principle, we could produce a different 3D texture for each of the 8 cubes that form the boundary of each 4D cell. In fact, with the full coordinates of the ray endpoint, we could even produce different textures for every cube in the entire maze. In practice, however, that turns out to be more code for little benefit. Since the appearance of any particular wall depends heavily on your precise 4D position and orientation, revealing a very specific 3D slice of the 4D whole, having unique textures for every cube in the maze doesn't really seem to help that much with identifying your position.

So, to help with orientation, instead of calculating a completely different texture for every 3D face of a hypercube, we just add a tint of a particular color to the basic texture to identify the orientation of each cube with respect to the grid.

That ends up producing views something like this:


Note that not all of the walls in this scene seem to be at right angles! This despite the fact that the world consists of nothing but make walls aligned with a perfect right-angled hypercubical grid.

The reason for this is fairly straightforward; the viewing volume isn't parallel with the wall grid. Just like an oblique 2D slice through a 3D cube can end up with a hexagonal outline, a 3D slice through a 4D cube can also end up producing an outline with 2D segments meeting at all sorts of weird angles.

Tuesday, January 3, 2017

Implementing 4D rotation

In my last post, I explored various options for allowing a player to control their orientation in 4D space. Now, I want to look at how 4D rotations can be implemented, regardless of the control scheme.

For my 4D maze game, the only thing that need to be rotated is the virtual camera that provides the player's first-person point of view. Dealing only with that eliminates a lot of accidental complexity that would be involved in applying arbitrary rotations to models in the world, but the procedures described here can be extended for that purpose.

The first thing to decide is how to represent the player's current orientation. This could be done with hyperspherical coordinates, which work pretty much just like regular spherical coordinates but with some extra angles added. However, working with hyperspherical coordinates gets really confusing really fast, and they suffer from the same limitations as spherical coordinates regarding gimbal lock. So, we'll use a different representation: a set of four unit vectors which define a basis for an egocentric coordinate system, in terms of the game map coordinate system.

While the game map coordinates are given in terms of x, y, z, and w axes, the egocentric coordinate system we use for orienting the camera translates these into forward, right, up, and ana (or F, R, U, A) components. It's important to make sure these basis vectors for the egocentric coordinate system always remain mutually perpendicular (i.e., that they form an orthonormal basis); that means there is some redundancy in this system. It would be possible to use a leaner representation, such as 3 basis vectors where the 4th can be recovered computationally as needed, or a pair of right-isoclinic and left-isoclinic quaternions, but despite the redundancy the use of explicit basis vectors makes the algorithms for arbitrary rotations, and the math employed therein, much easier to follow, and thus less prone to implementation error and easier to debug.

The basic 3D rotations- pitch, roll, and yaw- correspond to movement in the forward-up (FU), up-right (UR) and forward-right (FR) planes, respectively. In order to execute, for example, a pitch up, we would want to rotate the forward and up vectors in the FU plane so that the new forward vector points a little more in the old up direction, and new up vector points a little more in the old negative forward direction. Thus, rotation in a plane can be expressed in terms of rotating each of the basis vectors (or any vector that falls in he same plane) towards or away from another vector in that plane. Vectors in the perpendicular plane are unaffected by rotations, so the remaining bases of our egocentric coordinate system can just be left alone, and there aren't any other vectors to worry about!

Therefore, the basic building-block we'll need to implement arbitrary rotations of our egocentric coordinates is a function that can rotate some vector in the plane defined by itself and a second, perpendicular vector. It turns out that this is much simpler than the general formula for rotating an arbitrary point parallel to an arbitrary plane. The code looks like this:

// Rotate a vector v in the vk plane, // by angle t, assuming v & k are orthogonal function vec_rot(v, k, t){ let cos = Math.cos(t); let sin = Math.sin(t); return { x: v.x*cos + k.x*sin, y: v.y*cos + k.y*sin, z: v.z*cos + k.z*sin, w: v.w*cos + k.w*sin }; }

We gain quite a lot of simplifications when we can assume that all of our vectors are normalized and orthogonal!

A complete rotation of the camera requires two calls to this function- one for each of the basis vectors in the plane of rotation. If we did just one, they would no longer be orthogonal, and that would cause all kinds of problems! The code for executing a complete pitch rotation thus looks like this:

Player.prototype.pitch = function(t){ let fprime = vec_rot(this.f, this.u, t); let uprime = vec_rot(this.u, this.f, -t); this.f = fprime; this.u = uprime; };

Note that the first basis vector moves towards the second, while the second basis vector moves away from the first; thus, the second half-rotation uses a negative angle. Also note that this implementation re-calculate the same sine and cosine values (up to a sign difference) twice, so a slight efficiency improvement can be gained by inlining vec_rot and eliminating the extra calculations, at the cost of some code duplication and a little loss of clarity. The implementations for roll and yaw are the same, simply replacing f and u with u and l or f and l, respectively.

Having defined our 3D rotations in terms of planes rather than axes, the generalization to 4D rotations is trivial: just add three more rotation functions that act in the ana-axis planes: FA, UA, and LA. Arbitrary rotations in non-basis planes can then be achieved simply by calling these basic functions in series. It's not quite as efficient as building a 4D rotation matrix to do everything at once, but it's good enough most of the time, and very easy to understand and hack on.

There is, however, one problem with the system as it stands- due to the accumulation of floating-point errors, after a large number of rotations the basis vectors for our egocentric coordinates can become denormalized and non-orthogonal, which can cause a variety of strange rendering errors. Thus, it is prudent to periodically re-normalize our coordinate system. This consists of two parts: normalizing each basis vector, and ensuring that the basis vectors are all orthogonal to each other.

Normalizing a single vector is fairly easy:

function normalize(v){ let {x,y,z,w} = v; let len = Math.sqrt(x*x+y*y+z*z+w*w); return {x:x/len, y:y/len, z:z/len, w:w/len}; }

And it's also not that hard to take a vector and get the next closest vector to it that is perpendicular to some other vector:

// subtract the projection of v onto k from v function orthogonalize(v,k){ let {x:vx,y:vy,z:vz,w:vw} = v; let {x:kx,y:ky,z:kz,w:kw} = k; let kk = kx*kx+ky*ky+kz*kz+kw*kw; let vk = vx*kx+vy*ky+vz*kz+vw*kw; let scale = vk/kk; return { x: vx - kx*scale, y: vy - ky*scale, z: vz - kz*scale, w: vw - kw*scale }; }

If we know that k is pre-normalized, this can be further simplified:

// subtract the projection of v onto k from v, // where k is assumed to be normalized function orthogonalize(v,k){ let {x:vx,y:vy,z:vz,w:vw} = v; let {x:kx,y:ky,z:kz,w:kw} = k; let vk = vx*kx+vy*ky+vz*kz+vw*kw; return { x: vx - kx*vk, y: vy - ky*vk, z: vz - kz*vk, w: vw - kw*vk }; }

When combining these, it is important to remember to orthogonalize first, then normalize. Otherwise, orthogonalization will undo normalization. Orthonormalizing any two vectors is thus made very simple:
let fprime = normalize(this.f); let rprime = normalize(orthogonalize(r, fprime));

But what about the remaining plane? In 3D, we could just generate a new one by calculating the cross-product, but a binary cross-product doesn't exist in higher dimensions, for the simple reason that there is no longer a unique (up to a sign) direction perpendicular to any two vectors. In the general case, we would need an algorithm for computing the null space of a matrix to figure out what the perpendicular vectors should be... but once again, we can take advantage of the fact that the vectors we need to be perpendicular to are themselves already orthonormalized to do some major simplifications.

Fixing the last two axes just consists of repeatedly calling orthogonalize for each of the previously-orthonormalized axes, and then normalizing the result.

// Ensure the egocentric basis is orthonormal Player.prototype.renormalize = function(){ let (f, r, u, a} = this; let fprime = normalize(f); let rprime = normalize(orthogonalize(r, fprime)); let uprime = normalize(orthogonalize(
orthogonalize(u,fprime),rprime)); let aprime = normalize(orthogonalize(
orthogonalize(
orthogonalize(a,fprime),rprime),uprime)); this.f = fprime; this.r = rprime; this.u = uprime; this.a = aprime; };

The fact that this works may not be completely obvious at first glance. After all, how do we know that orthogonalizing the basis vector for the U axis against rprime won't mess up the previous orthogonalization against fprime? Well, we know that rprime is guaranteed to be orthogonal to fprime, because we constructed it to be so. That means that the projection of u onto rprime will also be orthogonal to fprime. Adding or subtracting a component orthogonal to fprime clearly cannot introduce any components to uprime that are are parallel to fprime, and so we can conclude that orthogonalizing against rprime, or any vector which is also orthogonal to fprime, cannot cause uprime to become non-orthogonal to fprime. The same reasoning can be extended to prove the validity of the orthogonalization procedure for the A axis, and this procedure trivially generalizes to arbitrary higher dimensions as well.

As with the rotation itself, it is possible to gain some efficiency here at the cost of some code duplication, but since this operation need be performed only very rarely it has yet to become a bottleneck in my code.

Next: Building a 4D World

A Series of Options for 4D Game Controls

For several years, I have been working on-and-off on a 4D maze game. At any given point, your view is restricted to a 3D hyperplane sliced through the complete 4D world, such that the next part of the path may be out of sight due to its lying in a direction perpendicular to your current viewing volume. Successfully navigating the maze therefore requires the ability to perform rotations in 4 dimensions, to bring the extra coordinate axis into view.

As you might expect, there are a lot more ways to rotate in 4D than in 3D, which presents a problem in designing sensible & ergonomic video game controls.

How Rotation Works

Before getting to how a player might control their 4D orientation, it will be useful to go over some basics of how rotation actually works. In three dimensions, we are used to thinking of rotations as occurring about an axis. That, however, is a simplification unique to 3D space. In the generic n-dimensional case, rotations are defined by a set of planes.

In one dimension, there are no planes, so rotation is impossible. In two dimensions, there is exactly one plane, and rotations in that plane can be defined simply by their magnitude left or right. In 3 dimensions, however, you suddenly have 3 possibilities for basic planes of rotation, and an infinite number of linear combinations thereof. Specifically, one can rotate in the XY plane, the XZ plane, and the YZ plane. Whichever plane you choose, there is one axis left over, and the unique one-to-one pairing of axes and their perpendicular planes in three dimensions is what lets us describe rotations with axes. In 4D, however, there are six basic planes (XY, XZ, XW, YZ, YW, and ZW), and there is a whole additional plane perpendicular to any chosen plane of rotation- not a single unique axis. The simplest way of describing a rotation in 4 dimensions thus involves directly specifying the plane in which it occurs,

(Incidentally, the fact that any chosen plane is perpendicular to an entire additional plane means that 4D objects can have two independent rates of rotation in perpendicular planes, something not possible in 3D. But, that's a complication I can safely ignore for maze-navigating purposes.)

Straightforward Control Schemes

In 3D, we can often just assign one key to each direction of rotation in each basic plane; e.g., using WASD+QE for pitch, yaw, and roll. With six basic planes, however, that would require 12 keys. Now, that's doable; we could, for example, mirror the WASDQE arrangement on the other side of the keyboard, assigning the XW, YW, and ZW planes to UIOJKL. As far as I can tell, this is the maximally flexible scheme, allowing for linear combinations of any set of basic rotations.

But, we can cut down the number of keys used to just six again by exploiting the nature of 4D rotations: with one key for each of the four coordinate axes, and two keys for positive and negative rotation, the player can directly select which plane they want to rotate in. This (with some modifications to allow traditional controls for purely 3D motion) was the original scheme I implemented for my game.

The downside of the simple plane-selection scheme, however, is that it requires simultaneously pressing three keys to perform any rotation- two axis keys to define the plane, and a sign key for which way to rotate in that plane.


Alternatives for Reducing the Number of Keys


Another option is simply to assign one key to each of the six basic planes, again with two keys to determine the sign of the rotation. That's a total of 8 keys, requiring two key presses for each rotation. I worry, however, that remembering the key mappings for each plane might get annoying.

We can improve things a little more by shifting some of the responsibility for selecting planes onto the sign keys. If we have two sets of sign keys (say, the up/down and left/right arrow keys), then we only need half as many plane keys; a single key selects one plane to be controlled by the up/down arrows (or whatever else you pick for the sign keys), and another (preferably perpendicular) plane to be controlled by the left/right keys. That's a total of 7 keys, again requiring two simultaneous key presses for any rotation, and regaining some of the ability to compose basic rotations. If you choose a default set of planes for the sign keys, however, then you need only six keys total, as in the straightforward plane-selection scheme, but only 2/3 of all rotations require holding multiple keys, and you get the ability to compose some rotations.


Hybrid Approaches


In order to reduce a player's learning curve, it would be nice to ensure that normal, purely 3D rotations still use traditional controls. It turns out there's a very simple way to achieve that, extending the same logic used above. If we have six sign keys, and and a default selection of planes, then only one plane-selection key is required to switch the default three rotation planes out for the other three rotation planes. It is then trivial to choose the default planes to match 3D pitch, roll, and yaw. All 3D rotations then require only a single keypress, with an average of 1.5 simultaneous keys required for all rotations.

The current scheme I have implemented for my game is a slight expansion on this idea, with 3 plane-selection keys (for a total of 9 keys) that map the sign keys (implemented with WASDQE) to different sets of planes. The idea behind this design is to make it possible to create rotations in combinations of any two planes simultaneously, without requiring the full 12-key arrangement. Whether this is the best approach, however, I am still uncertain. It may be that, at least for this kind of game, there really isn't any great need for the ability to compose rotations in multiple basic planes, in which case the 7-key hybrid scheme may be better. On the other hand, if that is a desirable property, it might be best to simply go all-out with the full 12-key scheme.

Conclusion: More experimentation is required!

Next: Implementing 4D Rotation

Saturday, October 15, 2016

A Phonology Without Phonemic Consonants

There are languages that have been analyzed as lacking phonemic vowels, with all vowels being completely predictable from the consonant string. That doesn't mean that they aren't pronounced with vowels, merely that vowels serve no contrastive function.

So, how about a phonology that does the exact opposite: packs all of the contrast into underlying phonological vowels, with phonetic consonants being completely predictable from the vowel string?

Now, there are lots of ways to do this in an ad-hoc manner. Say, an /i/ and an /a/ always have a [t] inserted between them, while a [u] and an [a] get a [ʒ], just because. But I'm gonna look at something that is potentially naturalistic, where the choice of consonant phones and where they get inserted is a reasonable consequence of vocalic features. There are probably lots of ways to do that, too, but here's just one of them:

For simplicity, we'll start with just /i/, /a/, /u/ as the basic, plain vowels. You could use more, but these are sufficient to demonstrate the rules I have in mind, which I will describe in terms of generic vowel features such that one could add more basic vowels and already know exactly how they would behave. Each of these can come in plain, breathy, rhotic, and nasal varieties, or any combination thereof; i.e., one could have a breathy nasal rhotic vowel, with all three extra qualities at once. I'll assume that any vowel can have any combination of these qualities, and there are no phonotactic restrictions on the underlying vowel string (although certain combinations might require sticking in a syllable boundary to break them up). Changing either of those assumptions could introduce further structural interestingness in other similar phonologies.

All of these vowels can also be long or short, with syllables being maximally 3 morae; thus, one can have one short vowel, two short vowels, three short vowels, one long vowel, or one short vowel and one long vowel per syllable, where all of the vowels in a single syllable must share all of their voicing, rhotic, and nasal features. For consonant-induction purposes a "long syllable" is any syllable containing a long vowel, or a triphthong (long, long+short, short+long, and short+short+short, but not short+short). Ignoring length, this results in 8 possible versions of every basic vowel, which can be transcribed as V, hV, Vn, Vr, hVn, hVr, Vrn,  and hVrn. That results in a total of 24 phonemes:

i /i/ a /a/ u /u/
hi /i̤/ ha /a̤/ hu /ṳ/
in /ĩ/ an /ã/ un /ũ/
ir /i˞/ ar /a˞/ ur /u˞/
hin /ĩ̤/ han /ã̤/ hun /ṳ̃/
hir /i̤˞/ har /a̤˞/ hur /ṳ˞/
irn /ĩ˞/ arn /ã˞/ urn /ũ˞/
hirn /ĩ̤˞/ harn /ã̤˞/ hurn /ṳ̃˞/

Or 48, if we count the long versions as separate phonemes.

Tautosyllabic vowels can turn into glides. An /i/ becomes [j], while short /u/ turns into [w]. In long syllables, medial vowels are glided first, such that, e.g., /uia/ becomes [uja], not [wia]. Sequences of /iii/ become [ji:] and /uuu/ become [wu:]' sequences of /aaa/ must be broken into two syllables, either [a:.a] or [a.a:]. Since all vowels in a syllable must have matching features, we can romanize these by grouping the vowels together within one set of voice/breathy/nasal letters. E.g., huin /ṳ̃͡ĩ̤/ [w̤̃ĩ̤], or iar /i˞͡a˞/ [ja˞].

That provides us with two phonetic consonants so far: /j/ and /w/.

Other consonants are induced when transitioning from a vowel that has a certain quality to one that doesn't, or at syllable or morpheme boundaries.

Breathy-voiced vowels basically induce an onset [h] (hence the romanization convention) morpheme-initially or after a non-breathy vowel, but in certain situations this can be mutated into aspiration or lenition of a previous phonetic consonant instead (see below). (A reasonable phonotactic restriction on the vowel string might be that plain vowels can't follow breathy vowels in the same morpheme, just because I find it difficult to perceive the transition from breathy to plain voice. But I'll ignore that for now.)

Plain vowels induce plain coda stops. High front vowels (/i/) induce [t], high back vowels (/u/) induce [k], and low vowels (/a/) induce [ʔ]. These all become aspirated when followed by unstressed breathy voice, absorbing the [h] induced by the following vowel unless it crosses a morpheme boundary; if the breathy syllable is stressed, then [t] becomes [t͡s] and [k] becomes [x], again replacing the [h] induced by the following vowel unless it crosses a morpheme boundary, while [ʔ] is unaffected.

Now, we have 8 more phonetic consonants: [t], [tʰ], [t͡s], [k], [kʰ], [x], [ʔ] and [h].

Since plain vowels already lack other properties, so there are none to lose, these consonant sounds will not occur every time there is a transition to a different class of vowel. Instead, they will only occur at non-utterance-final morpheme boundaries if the syllable is short, and at any non-word-final syllable boundary if the syllable is long; additionally, the consonants will be geminated in long syllables, stealing duration from the vowel. Thus, something like <uuha> /u:a̤/ or <iiha> would be rendered phonetically as [uk:ʰa̤]/[ukxa̤] or [it:ʰa̤]/[itt͡sa̤] respectively, depending on stress placement, and assuming it's monomorphemic.

Nasal vowels induce nasal stops, with non-back vowels (/i/ and /a/) inducing a coda [m] at the end of a word and [n] in other positions, and back vowels (/u/) inducing /ŋ/. Nasalization also interacts with syllable length; like the induced plain stops, induced [n] and [ŋ] will steal length from a long nucleus and become geminated.

Successive non-breathy nasal vowels in different syllables induce an epenthetic [ʁ]. Why? Because that's what I discovered I naturally do when trying to pronounce them! I don't know what exactly the articulatory phonetic justification, but there must be one! Thus, something like <unan> /ũã/ comes out as [ũʁãm], while monosyllabic long /ĩĩ/ (romanized <iin>) is distinguished from disyllabic /ĩ.ĩ/ (romanized <inin>) by the phonetic realizations [ĩ:n] and [ĩʁĩn], respectively. The results of a genuinely rhotic initial vowel (<irnin> /ĩ˞.ĩ/) look different still, as described below.

So far, that adds another 4 phonetic consonants ([m], [n], [ŋ], and [ʁ]), for a total of 16.

Rhotic vowels get a little complicated, due to interaction with other qualities. With combined rhotic+nasal vowels, the coda consonants are ordered R-N.
High front non-nasal vowels (/i/) induce [ɾ] word-medially, and [ɹ] word finally or with nasalization. Low non-back non-nasal vowels (/a/) induce [ɹ], which is ambisyllabic word-medially unless it is produced by a non-breathy vowel followed by a breathy vowel (with an induced [h] onset pushing the ambisyllabic [ɹ] out of the way). Nasal mid vowels induce [ʐ] or [ʒ] in free variation. Back vowels (/u/) induce [ʀ] or [ʁ] in free variation, which like [ɹ] is ambisyllabic unless followed an induced onset [h] or paired with nasalization.

That's another 3-ish phonetic consonants, leaving us with a total inventory looking something like this:

Stops: [t], [tʰ], [k], [kʰ], [ʔ]
Fricatives/affricates: [h], [x], [ts], [ʐ]/[ʒ]
Nasals: [m], [n], [ŋ]
Rhotics: [ʀ]/[ʁ], [ɹ], [ɾ]
Glides: [w], [j]

which really doesn't look that bad! It's sorta-kinda Iroquoian-looking, if you squint, with extra rhotics. Several natural languages get along with fewer consonant phones than that. But, it can still be written mostly-unambiguously (save for specifying morpheme/syllable boundaries) purely as a string of vowels from a 24-character all-vowel alphabet; or perhaps a featural script with three basic vowels and diacritics for the various combinations of nasal, rhotic, and breathy features, and maybe length.

Of course, there are other possible re-analyses of words generated this way. The romanization scheme already embodies one: a three-vowel, three-consonant analysis, where the consonants and vowels have some fairly complex interactions generating a lot of allophones of each, and some particular strange distributional restrictions (like, /h/ is the only consonant that can start a word!) A native speaker of such a language might, however, go for a four-consonant analyses, adding /t/ → [t], [tʰ], [k], [kʰ], [ʔ], [ts], [x]; or even breaking things down further, with no realization of the significance of the extremely limited distribution of these sounds. Speakers might also group things like /t/ →
[t], [tʰ], [k], [kʰ], [ʔ], [ts]; /h/ → [h], [x]; /z/ → [ʐ], [ʒ]; /r/ → [ʀ], [ʁ], [ɹ], [ɾ]; based on perceptual similarity, thus confusing the disparate origins of [h] vs. [x] and masking the commonality of [ʐ] and [ʒ] with the rhotics.

If one were to start with something like this and then evolve it historically, one could easily get a more "normal"-looking inventory (e.g., maybe that tap [ɾ] ends up turning into an [l], and maybe [t͡s] simplifies to plain [s]) with a steadily more opaque relationship to the underlying vocalic features, despite still being regularly predictable from them.

If one were to do an intrafictional description of the language, such as might be written by native linguists, I would be somewhat inclined to go with one of these alternative analyses as the standard native conception, and then dive in to the argument for why it should be re-analyzed as consisting purely of underlying vowels instead. Although, it would be a shame to miss out on the opportunity for a native writing system consisting of a 24-vowel alphabet.

Friday, September 9, 2016

Thoughts on Sign Language Design

Previously: General Thoughts on Writing Signs and A System for Coding Handshapes

One of the problems with designing a constructed sign language is that so little is actually known about sign languages compared to oral languages. For many conlangers and projects (e.g., sign engelangs or loglangs, etc.), this isn't really a big deal, but it is a serious problem for the aspiring naturalistic con-signer, ascribing to the diachronic or statistical naturalism schools.

I have, however, come across one account of the historical development of modern ASL from Old French Sign Language. While it is hard to say if the same trends evidenced here would generalize to all sign languages, they do seem pretty reasonable, and provide a good place for con-signers to start. Additionally, it turns out that many of these diachronic tendencies mesh rather well with the goal of designing a language with ease of writing in mind.

Unsurprisingly, despite the relative ease of visual iconicity in a visual language, actual iconicity seems to disappear pretty darn easily. But I, at least, find it difficult to come up with totally arbitrary signs for things - much more difficult than it is to make up spoken words - and the Diachronic Method is generally considered a good thing anyway, so knowing exactly how iconicity is eroded should allow a con-signer to start with making up iconic proto-signs, and then artificially evolving them into non-iconic "modern" signs.

The general trends in this account of ASL evolution can be summed up as follows:
  1. Signs that require interaction with the environment (like touching a table top) either disappear entirely, replaced by something else, or simplify to avoid the need for props. That seems pretty obvious.
  2. Signs that require the use of body parts other than the hands for lexical (as opposed to grammatical) content tend to simplify to eliminate non-manual components. E.g., facial expressions may indicate grammatical information like mood, but won't change the basic meaning of a word.
  3. Signs tend to move into more restricted spaces; specifically, around the face, and within the space around the body that is easily reached while still keeping the elbows close in. This is essentially a matter of improving ease of articulation.
  4. Signs that occur around the head and face tend to move to one side, while signs occurring in front of the torso tend to centralize. This makes sense for keeping the face in view, especially if facial expressions are grammatically significant.
  5. Two-handed signs around the head and face tend to become one-handed signs performed on just one side. In contrast, one-handed signs performed in front of the torso tend to become symmetrical two-handed signs.
  6. Asymmetrical two-handed signs tend to undergo assimilation in hand shape and motion, so that there is only one hand shape or motion specified specified for the whole sign, though not necessarily place or contact. This is a matter of increasing ease of articulation (reduction how much different stuff you have to do with each hand), as well as increased signalling redundancy.
  7. Signs that involve multiple sequential motions or points of contact "smooth out".
  8. There is an analog to "sound symbolism", where, if a large group of signs in a similar semantic domain happen to share a particular articulatory feature (similar shape, similar motion, etc.), that feature will be analogically spread to other signs in the same semantic domain.
And, of course, multiple of these can apply to a single proto-sign, such that it, for example, eliminates head motion in favor of hand motion, loses a hand, and smooths the resulting hand motion.

Most of the time, all of those trends reduce iconicity and increase arbitrariness of signs, but iconicity increases in cases where it does not contradict those other principles. Thus, a lot of antonyms end up being dropped and replaced by reverse-signs- e.g., you get morphological lexical negation by signing a word backwards, and temporal signs move to end up grouped along a common time-line in the signing space.

Symmetrization makes writing easier because you don't have to encode as much simultaneous stuff. Even though two hands might be used, you don't have to write down the actions of two simultaneous hands if they are doing the same thing. Reduction of the signing space also means you need fewer symbols to express a smaller range of variation in the place and motion parameters, and smoothing simplifies writing essentially by making words shorter, describable with a single type of motion.

Many two-handed ASL signs are still not entirely symmetric. Some, like the verb "to sign", are anti-symmetric, with circling hands offset by 180 degrees. One-handed signing is, however, a thing, and communication can still proceed successfully if only the dominant hand performs its half of the sign, while the other hand is occupied. (I imagine there is some degradation, like eating while talking orally, but I don't know enough about ASL to tell exactly how significant that effect is or how it qualitatively compares to oral impediments.) Thus, it appears that it would not be terribly difficult to make the second hand either completely redundant, or limited in its variations (such as symmetric vs. antisymmetric movement, and nothing else) to make two-handed signs extremely easy to write, and minimize information loss in one-handed signing.

Given the restriction of two-handed signs to particular places (i.e., not around the face), it might even make sense to encode the action of the second hand as part of the place. One could imagine, for example, a non-symmetric case of touching the second hand as a place specification (which would typically remain possible even if that hand is occupied), as well as symmetric second hand and anti-symmetric second hand.

I have no idea if native signers of ASL or any other sign language actually think of the second hand as constituting a Place, just like "at the chin," or "at the shoulder," rather than a separate articulation unto itself, but treating the secondary hand as a place does seem like very useful way to think for a con-sign-lang. Not only does it significantly reduce the complexity required in a writing system, it also ends up smoothing out the apparent surface differences between near-face signs and neutral space signs; in each case, there is underlyingly only one lexical hand, with the second hand filling in when the face is absent to better specify Place information.