Erik Rigtorp

Choosing a non-cryptographic hash function

  1. For user supplied data use SipHash to prevent hash flood attacks.
  2. Use xxHash as a fast hash for generic data.
  3. For integers or small data (less than 64 bits) use a mixing function as described below.

Integer hashing / mixing

When hashing small data (less than 64 bits) such as integers it’s worthwhile to use a specialized hash function that is small, fast and can be inlined by the compiler. In practice using the CRC32 instruction provides a very good speed versus collision trade-off. I have used it with linear probing hash tables with good results. I tried different hashes such as Murmur3 finalizer, rrmxmx and splitmix64, but CRC32 seems to provide the better speed vs collision trade-off. Linear probing hash tables needs a good hash function, but in my experience performance is not very sensitive to the hash function, it only needs to be good enough. Note that CRC32 is only fast when using a hardware implementation.

CRC32 implementation in C++:

struct crc32_hash {
  size_t operator()(uint64_t h) const noexcept {
    return _mm_crc32_u64(0, h);
  }
};

Or seeded version with some hash flooding protection:

struct crc32_seeded_hash {
  uint32_t seed;

  crc32_seeded_hash() {
    while (!_rdrand32_step(&seed));
  }

  size_t operator()(uint64_t h) const noexcept {
    return _mm_crc32_u64(seed, h);
  }
};

An alternative that doesn’t require a CRC32 instruction is the shift-xor-multiply style mixers. These are used as a finalizer in many of the fast hash functions, for example Murmur3, CityHash, clhash, xxHash, etc. It’s also used by the splitmix64 random number generator.

C++ implementation using optimized constants from Pelle Evensen:

struct moremur_hash {
  size_t operator()(uint64_t x) const noexcept {
    x ^= x >> 27;
    x *= 0x3C79AC492BA7B653UL;
    x ^= x >> 33;
    x *= 0x1C69B3F74AC4AE35UL;
    x ^= x >> 27;
    return x;
  }
};

Other alternative parametrizations of the above function:

References

SMHasher is suite of hash function quality and speed tests: https://github.com/rurban/smhasher

Assorted list of hash functions:

Useful links: