Ranking 4 Million Quake Live Duels in 1.5 seconds

The last couple of days, I’ve been toying with a little hobby project of mine.

The project is to rank players which player Quake Live duels, and I have reported on it before. I’ve been gathering duels since February 2012 up to now. The project is written as a hybrid. Most of the code for storage, retrieval and presentation is written in Erlang. The ranking code is written in Go.

At the moment, I have gathered up exactly 4067842 matches of which I deem 3898355 to be eligible for ranking. Sometimes, Quake Live reports a match to have one player only, and some times, a match only lasted a couple of seconds. I remove these from the ranking as I don’t think they are good enough. There are around 165000 players in the database right now, but the average number of played matches varies a lot from player to player. Some only played one duel while some clock in over 60 duels a week (which is a lot given duels often take 10 minutes).

Everything is stored in my database of choice, Postgresql. I recently upgraded to 9.4, which meant I could change the internal representation from Erlang’s term_to_binary/1, binary_to_term/1 conversions into the jsonb storage type directly in Postgres. The price is storage. Where Erlangs serialization format took up 6.8 gigabytes of disk space, jsonb takes up 9.7 gigabytes. The advantage is ease. I can now do queries directly on the JSON representations in the database; take the query here as an example

SELECT id
FROM core.raw_match
WHERE ((content ->> 'RANKED')::int = 0)

which will look inside the ‘content’ JSON column and grab rows having a ‘RANKED’ top-level key where it’s integer representation is equal to 0. This simplifies a lot of processing as I can use UPDATE statements to pre-analyze large swaths of duels without having to do low-level work on them in Erlang.

Erlang

The choice of using Erlang for the processing have proved its worth again and again. Number of restarts due to fatal failures is around 2. The reason is the fault-tolerance of Erlang: small mistakes will not affect the code as a whole. Often, these mistakes is not in our end, but in the end of the http://quakelive.com (This is now on steam). To handle these, there is a fuse/circuit-breaker installed[0] on the request code, which will clamp down on the operation if the site experiences problems. Either because it is unreachable, or because we trip a timer where requests are too slow to process. This means our end backs off if the Quake Live site experiences trouble.

In order to limit concurrent access to the QuakeLive site, we have installed the safetyvalve application[1], which defines a request queue and a policy for how fast that request queue will be emptied. This means we can run the Erlang system with an internal concurrency level around 150 outstanding processes which all tries to fetch from QuakeLive. But the queue then controls the policy and sets up limits:

How many outstanding concurrent requests do we allow?
Which frequency we start new requests

In short, Erlang is the right tool of choice for a long-running daemon service which must not go down, even if the database does, or the foreign system on which we rely do. One example of the power is when QuakeLive upgrades their service and takes it down. I had to change 0 lines of code in order to handle that. The fuse simply blows, and the system then rechecks connectivity every 5 minutes.

Another important design decision were to make all database operations idempotent. Since Postgresql gives us atomic operations, we can simply make sure jobs are idempotent and can be retried later on. The system is always trying to ‘catch up’ to the current state. And the system is always behind real time by some (small) factor. If we should go down, we can handle the situation by catching up and the virtue of idempotence will save us. QuakeLive only removes matches some 4–5 weeks after they have been played (for non-paying customers—I bet they keep them internally). So we have at least 28 days before we have to act on a grave problem.

Go

Erlang shines due to the fault tolerance, robustness and by being a nice functional language in which you can write succinct code. Go is somewhat the opposite: imperative, explicit data layout, statically typed with a simple type system. However, Go is compiled to native code, provides good parallelism and has some very nice unique features in interfaces and channel-based-message-passing. Some of the things where I think Go shines are:

Syntax. For a C-like imperative language it was the first language in about 35 years to actually innovate and simplify the language. When writing Go, you only what is necessary and this is important. Another brilliant decision is to get rid of semicolons and use a simple layout rule (while still keeping a mostly LALR(1) grammar) and provide the excellent ‘go fmt’ tool to re-indent code to a pre-defined default.
Packages. Go’s way of handling libraries and imported code is something other languages should be picking up by now. The trick is that an import statement is a string, like “github.com/jlouis/glicko2", but then an external tool understands how to parse this string and automatically fetch the source code for the library and compile it. This coalesces the concept of package dependencies into the language itself and removes a lot of external boilerplate management.
Toolchain. Quick recompilation of software helps a lot when developing and removes the overhead of compilation. Coming from other languages with very fast compilers like OCaml or Erlang, this is nice. You shouldn’t have to wait on the compiler. One design decision is that transitive dependencies doesn’t have to be recompiled. If A depends on B which depends on C. Then when compiling B, everything related to C is pulled into the resulting object file. In turn, when compiling A, we only have to look at B and can avoid C. This helps compilation times a lot.
Interfaces. In Go, packages and interfaces are your structuring tools which allows you to break a large system into smaller parts. This increases modularity of the code base since altering one module of the code is less likely to yield alterations in other modules. Decoupling is by far the most important construction for handling large code bases. In Erlang, you write an independent application. In OCaml, you create a module or a functor, and in Go, you write a separate package and eventually use an interface. The abstraction provided is different from a parameterization as in a Java generic or an OCaml functor. But I have not really experienced any limitations of the approach yet.

In an Earlier post of mine, I looked at Glicko2 rankings for different approaches in different languages[r1]. I ended up choosing Go over OCaml and Erlang. Erlang is not strong at number crunching in the floating point domain. And OCamls current lack of parallelism excluded it. I have recently updated the glicko2 Go package[2] so it is better and simpler to use than ever.

The main interface to the package is the following:

type Opponent interface {
    R() float64
    RD() float64
    Sigma() float64
    SJ() float64
}

func Rank(r, rd, sigma float64, opponents []Opponent, tau float64) (nr, nrd, nsigma float64)

To rank a player, you call rank with the player parameters‘r’, ‘rd’ and ‘sigma’, a set of opponents, given as a slice and a configuration parameter for the system called ‘tau’. Opponent is an interface which you have to implement. By making Opponent an interface, we avoid the problem where a caller has to take their code and mangle it to fit our scheme. Rather, they can wrap their data structures and provide Opponent interfaces for them. In turn, we can rank games.

The code can also optimize the configuration parameters tau and the initial ‘rd’ to use in the system, by running a Nelder-Mead optimization routine[r2][3]. Here, the API is

func Optimize(f func([]float64) float64, start [][]float64, cf func([]float64)) ([]float64, int, int)

where you will optimize the function “f” subject to start values “start” and under constraints “cf”. Note that speed is of no concern here as the overhead is in the “f”function and its computation. To make everything work out, we have adapted a simple parallel variant of ranking computation to speed them up. Full ranking of 150 weeks of matches takes less than 2 seconds. Full optimization is completed in 43 seconds (Core i7–4900MQ, 16 Gigabyte RAM, Lenovo W540). All of the technical code is at [4].

D3.js — the next steps

The next phase of the project is to employ D3.js in order to provide nice graphical output of the data. I initially tested the output in R with the very nice plotting package“ggplot2” by Hadley Wickham. But in order to make it easier for everyone to use the system, I’ve decided to build a front-end which uses D3.js to plot results. This work is currently ongoing, but once it is done, I should have a way to present the data.

References

[0] https://github.com/jlouis/fuse [1] https://github.com/jlouis/safetyvalve [2] http://godoc.org/github.com/jlouis/glicko2 [3] http://godoc.org/github.com/jlouis/nmoptim [4] http://godoc.org/github.com/jlouis/rank [r1] https://medium.com/@jlouis666/glicko2-benchmarking-1-548b3f99136e [r2] https://medium.com/@jlouis666/glicko2-benchmarking-2-775b573c086f