Posted on October 16, 2020
by Jack Kelly

**TL;DR:** Unless you’re careful, using `foldMap`

to build `HashMap`

s (from `unordered-containers`

) can take *O*(*n*^{2}) time. This can be avoided by:

Building a

`Map`

(from`containers`

) instead;Using

`foldr`

or`foldl`

with`insert`

; orUsing

`foldMap`

to build an intermediate list`[(k, v)]`

and then building the`HashMap`

with`fromList`

.

Shout-out to Luke Worth who found this out while we were chasing down performance problems.

`foldMap`

The `foldMap`

function is a very convenient way to build up maps, especially if you use the newtype wrappers from `monoidal-containers`

:

```
foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m
-- Specialise m ~ MonoidalMap k v:
foldMap
:: (Foldable t, Eq k, Ord k, Monoid v)
=> (a -> MonoidalMap k v) -> t a -> MonoidalMap k v
```

You map each `a`

to a map (usually with the `singleton`

function), and the `Monoid`

instance combines each map into a single final result. This is a convenient idiom, but it takes quadratic time if you try to use it to build `HashMap`

s. Let’s look at why.

The `Monoid`

instance for `HashMap`

is implemented in terms of `union`

(`MonoidalHashMap`

uses `unionWith (<>)`

, but is similar). `union`

has *O*(*m* + *n*) runtime, and the computation we get from our `foldMap`

has the form:

``union` (m2 `union` (m3 `union` (m4 `union` (...)))) m1 `

Assume each `m_i`

is a singleton map (size 1). Then the total cost of the `union`

s will add up like this:

*K*_{1}, the cost of the innermost`m_(n-1) `union` m_n`

, costs 1 + 1 = 2;*K*_{2}, the cost of the two innermost`union`

s (`m_(n-2) `union` (m_(n-1) `union` m_n)`

) costs 1 + 2 + 2 = 5, because the map costs 1 + 2 = 3 to build, plus another 2 for the innermost`union`

;*K*_{3}, the cost of the three innermost`union`

s (`m_(n-3) `union` (m_(n-2) `union` (m_(n-1) `union` m_n))`

) costs 1 + 3 + 5 = 9, because we pay 1 + 3 = 4 to build the map, plus another 5 to build the union of the three final entries;Generally, the cost of the

*i*^{th}union,*K*_{i}, will cost 1 +*i*+*K*_{i − 1}.

If you expand out the *K*_{N} term, you will find a 1 + 2 + 3 + … + *N* sum, which shows that *K*_{N} ∈ *O*(*N*^{2}).

`Map`

Fine?`union`

for `Map`

has running time *O*(*m* × log (*n*/*m* + 1)); *m* ≤ *n*. Since we’re dealing with singleton maps, we can say that *m* = 1 is the smaller side of the `union`

, and the runtime simplifies to *O*(log *n*). From this, we can see that the runtime of all the `union`

s will be *O*(*n*log *n*).

`HashMap`

?There are at least a couple of ways to turn a `Foldable`

into a `HashMap`

without hitting this performance trap:

`HashMap`

’s`insert`

has*O*(log*n*) performance. Using it with the`Foldable`

version of`foldl`

or`foldr`

will give you a`HashMap`

that takes*O*(*n*log*n*) time to build.If you’re not building singleton maps, you could try building an intermediate list of

`(k, v)`

pairs, and calling`fromList`

on it. The`Foldable`

instance for`[]`

right-associates the applications of`(<>)`

, so concatenating singleton lists should be performant.`HashMap`

’s`fromList`

has*O*(*n*log*n*) complexity, which is also fine.

`Map`

can be surprisingly fast, and `HashMap`

can be surprisingly slow. None of this says, “never use `HashMap`

”, but I was surprised to see an idiom as natural as `foldMap`

become a performance trap. This also isn’t the first time I’ve been able to get better performance out of `Map`

— functions like `fromDistinctAscList`

can exploit input structure for better performance — and I often find the *O*(log *n*) lookups quite acceptable. I’m left wondering whether good ol’ `Map`

should be my default mapping type, unless I have a demonstrated need for `HashMap`

’s features.