The Holy Grail! A Hash Array Mapped Trie for C++
C++ has a handful of associative containers. We started with set and map, both based on node-based red-black trees. These are fine but are not the most efficient and, in particular, suffer from more cache misses than we’d like. If we want to build persistent versions of them it’s achievable but aggravates the problems even more and adds considerable extra complexity. (I know — I’ve done it!) C++11 brought the hash-map based unordered_set and unordered_map, which are generally much faster, with better cache locality — but can be less memory-efficient and also don’t translate so easily into persistent versions. But there exists another general-purpose data structure that combines many of the characteristics of trees and hash tables into something that in many important ways is superior to both, and with minimal downside (they are close but not quite as fast as pure hash tables). Hash Array Mapped Tries are more memory-efficient than hash tables and, as a bonus, are trivially made persistent — with big implications for concurrency, functional programming, and other applications that benefit from being able to treat them immutably (as well as share large amounts of common state in memory at once). This talk will describe how this data structure works from the ground up and look at a reference implementation I am writing with the intention of proposing as a Boost library — and possibly later for standardisation. We’ll also look at how it can be used in practice, and at some of the performance characteristics.