TL;DR: Unless you’re careful, using foldMap
to build HashMap
s (from unordered-containers
) can take O(n2) time. This can be avoided by:
Building a Map
(from containers
) instead;
Using foldr
or foldl
with insert
; or
Using foldMap
to build an intermediate list [(k, v)]
and then building the HashMap
with fromList
.
Shout-out to Luke Worth who found this out while we were chasing down performance problems.
foldMap
The foldMap
function is a very convenient way to build up maps, especially if you use the newtype wrappers from monoidal-containers
:
foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m
-- Specialise m ~ MonoidalMap k v:
foldMap
:: (Foldable t, Eq k, Ord k, Monoid v)
=> (a -> MonoidalMap k v) -> t a -> MonoidalMap k v
You map each a
to a map (usually with the singleton
function), and the Monoid
instance combines each map into a single final result. This is a convenient idiom, but it takes quadratic time if you try to use it to build HashMap
s. Let’s look at why.
The Monoid
instance for HashMap
is implemented in terms of union
(MonoidalHashMap
uses unionWith (<>)
, but is similar). union
has O(m + n) runtime, and the computation we get from our foldMap
has the form:
`union` (m2 `union` (m3 `union` (m4 `union` (...)))) m1
Assume each m_i
is a singleton map (size 1). Then the total cost of the union
s will add up like this:
K1, the cost of the innermost m_(n-1) `union` m_n
, costs 1 + 1 = 2;
K2, the cost of the two innermost union
s (m_(n-2) `union` (m_(n-1) `union` m_n)
) costs 1 + 2 + 2 = 5, because the map costs 1 + 2 = 3 to build, plus another 2 for the innermost union
;
K3, the cost of the three innermost union
s (m_(n-3) `union` (m_(n-2) `union` (m_(n-1) `union` m_n))
) costs 1 + 3 + 5 = 9, because we pay 1 + 3 = 4 to build the map, plus another 5 to build the union of the three final entries;
Generally, the cost of the ith union, Ki, will cost 1 + i + Ki − 1.
If you expand out the KN term, you will find a 1 + 2 + 3 + … + N sum, which shows that KN ∈ O(N2).
Map
Fine?union
for Map
has running time O(m × log (n/m + 1)); m ≤ n. Since we’re dealing with singleton maps, we can say that m = 1 is the smaller side of the union
, and the runtime simplifies to O(log n). From this, we can see that the runtime of all the union
s will be O(nlog n).
HashMap
?There are at least a couple of ways to turn a Foldable
into a HashMap
without hitting this performance trap:
HashMap
’s insert
has O(log n) performance. Using it with the Foldable
version of foldl
or foldr
will give you a HashMap
that takes O(nlog n) time to build.
If you’re not building singleton maps, you could try building an intermediate list of (k, v)
pairs, and calling fromList
on it. The Foldable
instance for []
right-associates the applications of (<>)
, so concatenating singleton lists should be performant. HashMap
’s fromList
has O(nlog n) complexity, which is also fine.
Map
can be surprisingly fast, and HashMap
can be surprisingly slow. None of this says, “never use HashMap
”, but I was surprised to see an idiom as natural as foldMap
become a performance trap. This also isn’t the first time I’ve been able to get better performance out of Map
— functions like fromDistinctAscList
can exploit input structure for better performance — and I often find the O(log n) lookups quite acceptable. I’m left wondering whether good ol’ Map
should be my default mapping type, unless I have a demonstrated need for HashMap
’s features.