jlouis' Ramblings

Musings on tech, software, and other things

Breaking Erlang Maps #3

Easter hacking on the Maps Erlang QuickCheck model is progressing nicely. And I have new interesting findings to report. I started by trying to generate very large maps, in order to get some collisions on the underlying hash value. At 25.000 elements and a 32-bit hash, we can approximate the chance of a collision by a Taylor expansion of the birthday paradox. If we take

n = 25000
k = 65536 * 65536
C = 1-exp(-(n*(n-1)/2)/k)

and my approximation calculation is right, then C is 0.07 which is 7% chance at a hash collision.

This is enough that we can build maps of 25000 elements and if we build enough of those, then there will be random hash collisions. However, such a large map is bad when we consider shrinking, and just printing out a model with 25000 elements is not going to be easy. Furthermore, operating on such a large map takes more computational time.

Switching strategy

A far better strategy is to change our generator. We first write a helper module which uses a Pseudo-Random Number Generator to generate random key values. Then we check those against the hash value of other keys. If the same hash is generated, we have an overlapping keypair, [K1, K2]. A couple of seconds generates thousands of these, and thus we have a lot of keys which will collide.

Now, we don’t need large maps to create collisions. We just have to pick some keys which are colliding. This is way easier to handle and as long as we begin toying with the HAMT structure, we are likely to find lots of errors, while keeping the maps small. We can then easily alter the generator for ‘map_key()’ to take the current state into account. We store a list of collisions as a list of lists, [[K01, K02], [K11, K12], …] where the inner lists have keys which are known to collide. Then we can generate colliding keys. First a little helper function:

find_eligible_collisions(Keys, Cols) ->
    lists:flatten(eligible(Keys, Cols)).

eligible([], _) -> [];
eligible([K | Ks], Cols) ->
    Pairs = [Cs || Cs <- Cols, lists:member(K, Cs)],
    [Pairs -- [K] | eligible(Ks, Cols)].

We use the find_eligible_collisions/2 to search among a set of Keys. It then returns the “other” keys among the ones which collide. In turn, it produces a list of possibly colliding keys. With this, we can fix the generator for map keys:

map_key(#state { collisions = [] }) ->
      map_term();
map_key(#state { collisions = Cols, contents = Contents }) ->
      Keys = [K || {K, _} <- Contents],
      case find_eligible_collisions(Keys, Cols) of
          [] -> oneof([map_term(),
                       oneof(lists:flatten(Cols))]);
          Eligible ->
              frequency([
                  {10, map_term()},
                  {10, oneof(lists:flatten(Cols))},
                  {20, oneof(Eligible)}])
      end.

If there are no collisions, we just generate a map_term() as always. If there are collisions, we search among these to find possibly eligible colliders. And if there are, we make it slightly more likely to generate one of those as a key.

Running this test on 18.0-rc1+patches for every error we know of right now, we get:

1> eqc:module({numtests, 1000}, maps_eqc).
prop_map:
....................................................................................................(x10)..........................User defined signal 2

which is due to an allocation error where we can’t allocate more memory. This is due to some looping inside maps:merge/2 where a flatmap and a hashmap are merged together. It is likely that this is a bug we haven’t discovered yet.

The next phase of testing will have to consider code coverage of the maps C code in order to figure out what we are not covering yet. Given the excellent debug-tooling in Erlang/OTP, this should be fairly straightforward to get a grip at.