Experience Report: Bidirectional type checking of GraphQL

The core idea of GraphQL is this: let clients have a small, typed, total, functional (declarative) language where they can push programs to the server side. The server then interprets these programs and provides data to the client. The natural core for this is a lambda calculus: a GraphQL fragment is a lambda over the free variables of that fragment, for instance. A query is also a lambda, but no variables are free, and it is “exported” for others to use. The only reason it isn’t a full functional core is because the current language is more familiar to typical client programmers.

Aside: GraphQL is the ultimate function-as-a-service implementation. In typical serverless implementations of the FaaS scheme a client can execute a singular function on the server side at a time, without caring about the underlying server infrastrucutre and its maintenance. GraphQL amends this by having the client push a program to the server side, so part of the client runs server-side, using the predetermined functions in the GraphQL Schema. How the functions are implemented, executed and maintained is ignored by the client. It only knows of the GraphQL endpoint as a factor.

This empowers clients and makes software be driven by client code. It also improves latency since the program is executed server-side with locality. And it empowers clients with flexibility insofar the server can evolve and adapt without clients having to change. End of Aside

Because we are working with a language, the best way to approach the problem is to treat it as such. Processing GraphQL requests runs the normal gauntlet of a interpreter:

Scan incoming byte stream into tokens
Parse tokens into an abstract syntax tree
Elaborate the tree and type check it as well
Execute: use an interpreter to execute the request

It is important to stress interpretation is often enough. In a typical implementation, the work done in the execution phase is not where time is spent. Usually, we measure the time in μs for the interpretation step, whereas reading data is usually more costly in the ms-range. If you have any distributed access in the query, that is going to be where time is spent. The only counter-example is when you have all data in memory on the Erlang node, readily available. In principle, you could pre-compile an execution to Erlang code, and then directly execute said code if this proved to be too slow. But we aren’t there yet in the current implementation.

The Erlang implementation currently uses interpretation directly on the AST and doesn’t compile via a Lambda calculus core. This, I think, is mostly a mistake to be fixed at a later point in time.

Type checking

In the GraphQL specification, there are two sections of interest: Type System and Validation. Where you can execute a GraphQL request without worrying about its well-typedness, it seems rather futile to do so. So we decided to bake the validation into a proper checking phase in the engine.

In hindsight, we think this has been the correct choice. The language is quite amenable to a classic operational semantics on its types. The transformation into a proper logic uncovered many corner cases in the specification which needed clarification. Also, we’ve given opinions from time to time on the specification based on what is possible in a logic, and also what is easy to implement in a logic. This helps coherence of the language quite a bit.

The first type checker we wrote were based on a classic algorithm. It had many bugs, mostly because it was built and extended as we went along implementing more and more language features. So with that experiment behind us, we embarked on the idea to rewrite it using a more modern style, hopefully simplifying and squashing further bugs in the process. At this point, we had a sizable test suite, so a rewrite would not introduce too many futher faults.

I’m pretty sure bidirectional type checking is part of the folk-lore, but there are some really good expositions on them. One is written by David R. Christensen, and another by Frank Pfenning:

Once you start using this style of type checker, you will want to write every type checker in this style.

The core idea

The key observation is that type checking can be split into a couple of different building blocks. Once these blocks are established, we can recombine them, as you would in typical algebraic fashion. There are obvious similarities to attribute grammars, as we will see.

We split type checking into three building blocks. We annotate each entry with a + if they are given (have positive mode in the relation), and - if they are returned by the algorithm (have negative mode):

Synthesis/Inference: G+ |- e+ ⇒ t- given an environment G and an expression e, we figure out a type for it, t.
Checking: G+ |- e+ ⇐ t+ check that the expression has given type t
Subsumption: sub t+ s+ check that the type t is a subtype of s. That is that t “fits inside” s in an obvious fashion.

The reason one wants to split into these building blocks are that the type checker becomes simpler. You so-to-speak use checking to push down information you have into subexpressions, and then use synthesis on the way to switch judgement mode and gather knowledge about types. This pendulum of back-and forth between the two building blocks makes a lot of type checking rules simple. The subtype relation is used to verify you don’t break rules along the way.

Example: If we have a type which is non-nullable in GraphQL: t! and we are checking it against a type s which is nullable. Then this amounts to ignoring the non-nullability and check t against s. This is because a non-nullable value is “stricter” than a nullable one and fits nicely inside it.

A typical pendulum operation when checking a request is that of a schema-lookup. To check against a given type we synthesize a schema-lookup, hence obtaining a proper type for the subexpression, which we then proceed to check.

What is omitted in the above, compared to the real checker, is that we also elaborate the AST. We annotate the tree with the types we found. This eliminates a lot of later hash-table lookups because the needed type information is already in the tree.

Also, literal values are input-coerced in the type checker. This is done as a partial execution of the query for the parts which are constant.

Flow Polarity

The second observation is that in GraphQL, there are two major flows:

Input: Client → Server (positive flow)
Output: Server → Client (negative flow)

In turn, every GraphQL term is either present in one or both of these flows. For instance, an input type has positive flow, whereas an object type returned has negative flow. There are also nonpolar flows in scalars and enums.

The flows guide how one should check against types. In the positive flow direction, we cannot trust the client, but the GraphQL schema on the server side can be trusted. Hence, we should recurse over the types of the schema, not what the client provided. This guards against the client omitting a required argument for instance.

In contrast, the negative flow reverses the recursor: here, we are only interested in what the client wants, so we only check those values. If the schema/server provides more values but they are illegal, they are ignored since the client did not request them.

Using these observations, it is possible to figure out many questions one might have when writing the type checker. Most loops write themselves based on the flow rules. And most checking rules writes themselves via the three bidirectional building blocks. The consequence is a small and lean type checker for GraphQL.

Results

Code size

The approach used here currently amounts to about 1000 LOC, including comments. The reference implementation uses around 3500 LOC on the same thing, also including comments.

In addition, our code reads as typical operational semantics, which makes it far easier to validate and verify. Many bugs have been fixed by addition of a simple rule, or rearrangement of the rule checking order.

If one wanted to translate GraphQL type rules into a logical framework such as Twelf, or into a proof assistant such as Agda, it should be fairly straightforward. Also, the static semantics in operational form should be easier to write down if one wanted a more formal approach to the type checking of GraphQL.

Speed

The type checker usually runs much faster than the interpreter phase. A query is static, and not executed per fetched node, whereas the interpreter has to walk over data returned. Factors of 1000:1 in favor of the type checker is not unheard of. This argues one should write the type checker for simplicity rather than speed.