I’ve been wrangling with an interesting decoding problem over the weekend. I started out fiddling with profunctors and arrows, but it morphed into something more lenslike for a while. Since Chris Penner has just published his article about fitting `Witherable`

into an optical framework, I thought I’d write up what I have so far, and see if I can get some useful thoughts out while this stuff is flavour of the week.

`amazonka-dynamodb`

uses an `AttributeValue`

type to represent data stored in a DynamodDB table. Underneath all the lenses, this is morally a sum type: an `AttributeValue`

can be a string, number, boolean, binary, null, string set, binary set, &c.

When storing or retrieving data (e.g., with the `PutItem`

and `Query`

operations), amazonka often uses `HashMap Text AttributeValue`

in its request and response types. We’re looking for a solution with the following properties:

It needs to be composable. There should be little difference between (de)serialising a primitive

`AttributeValue`

and an`AttributeValue`

at some specific column in the table.(De)serialising should be as close to bidirectional as possible, but only when that makes sense. Basic use cases treat a DynamoDB table as a key-value store, and deserialisation becomes very similar to serialisation (read/write

*these*fields at*those*keys). DynamoDB wizards pack many access patterns into a single table, and each access pattern could well need a different (de)serialiser.Aside: Single-Table Design in DynamoDB

Alex DeBrie has a great article on single-table design in DynamoDB. In it, he quotes Forrest Brazeal (whose article is also pretty good):

In fact, a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet — despite all the bespoke, human finagling it took to create it.

Rick Houlihan has also given some excellent talks about DynamoDB: one in 2018, and one in 2017. Watch the 2018 talk first, because it does a better job of talking through technology shifts, and motivating the use of denormalised data. The 2017 talk has a lot of overlap with the 2018 talk, but there’s enough different material to make it worth watching too.

It should be symmetric. Where it makes sense, encoding and decoding should have similar interfaces.

It should be ergonomic. This is somewhat subjective, but important. Bad ergonomics is one reason I didn’t pursue monoidal functors any further — the

`newtype`

s that select which bifunctor to use added a lot of noise.It should have good error reporting. It’s unacceptable to return

`Nothing`

on failure.

`To`

/`From`

Classes`aeson`

-style encoder/decoder typeclasses are well-understood. We need to deserialise both `AttributeValue`

and `HashMap Text AttributeValue`

, which means we need two classes for each direction:

```
data DecodeError = Something -- Placeholder
class FromAttributeValue a where
fromAttributeValue :: AttributeValue -> Either DecodeError a
class FromAttributeValues a where
fromAttributeValues :: HashMap Text AttributeValue -> Either DecodeError a
class ToAttributeValue a where
toAttributeValue :: a -> AttributeValue
class ToAttributeValues a where
toAttributeValues :: a -> HashMap Text AttributeValue
```

This is not acceptable for several reasons:

Type classes for serialisation become awkward across package boundaries. Libraries that provide data structures end up depending on serialisation libraries (potentially pulling in large dependency trees) or are forced to create a separate package with either orphan instances or newtypes just to hold instances. None of these are satisfactory options.

Type-class-based serialisation libraries sometimes provide instances that use an element’s (de)serialisers to (de)serialise a collection (e.g.,

`instance FromJSON a => FromJSON [a]`

). This is convenient, but presupposes a single canonical transformation, which is not necessarily the case.There’s no symmetry between encoders and decoders. Decoders may fail, encoders always succeed.

We want to be able to abstract our inputs over `AttributeValue`

and `HashMap Text AttributeValue`

, so the obvious thing to do is to have a type variable for the input as well as the output. We’d also like to deal with different ways of failing (`Maybe`

, `Either SomeError`

, accumulating errors into a `Validation`

, `These`

, …). Let’s hack up some types:

```
class Failing f where
warn :: String -> a -> f a
err :: String -> f a
newtype Coder f i o = Coder { runCoder :: i -> f o }
deriving stock Functor
deriving newtype (Semigroup, Monoid)
deriving (Applicative, Monad) via (Star f i)
hoistCoder :: (forall a . f a -> g a) -> Coder f i o -> Coder g i o
Coder f) = Coder $ nt . f hoistCoder nt (
```

(`Star`

is `Data.Profunctor.Star`

, and is representationally equivalent to `Coder`

.)

This gives us a whole pile of useful instances, and we can even use instances from `Control.Arrow`

or `product-profunctors`

to construct analogues to `Divisible`

and `Decidable`

. (I gave a talk about this at fp-syd the other week, but it was quite exploratory and I’m not sure if there’s good video.)

Let’s assume the existence of a simple decoder of `Int`

from `AttributeValue`

, and write a lifting function that can extract from a map key:

```
decodeInt :: Failing f => Coder f AttributeValue Int
= undefined -- Not important
decodeInt
decodeAtKey :: (Eq k, Hashable k, Show k, Failing f)
=> k -> Coder f v o -> Coder f (HashMap k v) o
Coder f) = Coder $ \i ->
decodeAtKey k (maybe (err $ "missing key: " <> show k) f $ HashMap.lookup k i
```

So far, so good. We could go even further and decode the `[HashMap Text AttributeValue]`

that the `Query`

operation returns. What about encoding? Let’s suppose the existence of an encoder from `Int`

to `AttributeValue`

and try to lift it into encoding at a map key:

```
encodeInt :: Coder Identity Int AttributeValue
= undefined -- Not important
encodeInt
encodeAtKey :: (Functor f, Eq k, Hashable k)
=> k -> Coder f i v -> Coder f i (HashMap k v)
Coder f) = Coder $ \i ->
encodeAtKey k (<&> \v -> HashMap.singleton k v f i
```

Here’s where the cracks creep in. We’d like to `mconcat`

our outputs together to make a larger structure, but that can be surprisingly slow. We actually want the `Monoid`

instance of `Endo m`

:

```
encodeAtKey :: (Eq k, Hashable k)
=> k -> Coder Identity i v -> Coder Endo i (HashMap k v)
Coder f) = Coder $ \i ->
encodeAtKey k (Endo $ HashMap.insert k (runIdentity $ f i)
collectHashMap :: [Coder Endo i (HashMap k v)] -> Coder Identity i (HashMap k v)
= Coder $ \i ->
collectHashMap coders Identity $ appEndo (runCoder (mconcat coders) i) HashMap.empty
```

To do this, we’ve had to abandon any pretense of generality in the “`Functor`

” type parameter `f`

. We can’t use `Compose Endo f`

because it doesn’t have the `Monoid`

instance we want.

Let’s go back to when things were going well, and look more closely at `decodeAtKey`

. What happens if we strip away the `newtype`

s?

```
decodeAtKey :: (Eq k, Hashable k, Show k, Failing f)
=> k -> Coder f v o -> Coder f (HashMap k v) o
-- k -> (v -> f o) -> (HashMap k v -> f o)
-- k -> LensLike f (HashMap k v) o v o
```

This seems promising, and if we step into the `lens`

universe we can use `Prism`

s for our primitives. This also gives us a start on our “bidirectional where it makes sense” goal:

```
int :: Prism' AttributeValue Int
= undefined -- Not important
int
int :: m -> Maybe Int
preview int :: Int -> AttributeValue review
```

Trouble is, it’s very easy to lose prism-ness, and find ourselves stuck with a `Traversal`

or `Fold`

. To lift our primitives to work over map keys, we need to wrangle the prism itself:

```
atKey :: (Eq k, Hashable k) => k -> Prism' v a -> Prism' (HashMap k v) a
= prism'
atKey k pr . review pr)
(HashMap.singleton k >=> preview pr) (HashMap.lookup k
```

We’ve found ourselves back at our first problem: we’re making singleton maps again. There might be ways around this (Yair Chuchem has some examples of using `Prism`

s to implement codecs and “roundtrip” ASTs), but even then you have to solve the error-reporting problem. Yair proposes a `VerbosePointed`

typeclass to smuggle error information into the `f`

parameter of an optic, but Ed Kmett notes that they ‘played with “coindexed” prisms which looked somewhat similar to these, but ran into problems with inference for them when folks would compose them with indexed traversals’, so that doesn’t sound promising.

How else can we keep encoding and decoding together? I came across a 2004 Functional Pearl, “Pickler Combinators”, which defines a “pickler/unpickler” type:

```
type St = [Char]
data PU a = PU { appP :: (a, St) -> St, appU :: St -> (a, St) }
```

Many examples in the paper do the equivalent of `invmap`

ing with a partial function on one side, which definitely doesn’t give us the safety we’re after. We also want to extend this to support varying input types, which means we have a `data PU i o = ...`

type where **both** `i`

and `o`

are invariant. This quickly leads down a similar road as the monoidal functor stuff, but two invariant type variables make it twice as awkward.

I’m not really satisfied by any of the options in this post, but the `Coder`

profunctor/arrow experiment dissatisfies me least. I’ll probably develop this further at some point, and it might even be worth putting a “`Functor`

-shaped parameter” on the input side, to allow a `Coder`

to consume input from a streaming library:

`newtype Coder f g a b = Coder { unCoder :: f a -> g b }`

While I haven’t found a good story for bidirectional {en,de}coding, the `g`

parameter enables a bunch of neat tricks:

- With
`g ~ Either AnError`

, you can report failure; - With
`g ~ Endo`

, you can build up maps or other data structures; - With
`g ~ Compose IO (Const ())`

, you can execute actions on the coded structure; or - With
`g ~ ((,) i)`

, you can pass unconsumed input to a downstream`Coder`

, giving it the flavour of a parser combinator library.

There’s a few things I’m hoping to do, to see whether or not this actually has legs:

- Build a vocabulary of combinators to lift
`Coder`

s in and out of maps, lists, other monoids; sequence`Coder`

s and build up structures using`Applicative`

/`Divisible`

/`Decidable`

-style combinators &c.; - Build out a vocabulary of primitive {en,de}-coders for DynamoDB
`AttributeValue`

s; - Work out how to make the different functors for the
`g`

parameter compose well. I suspect something MTL-ish will emerge, as we’ll need to consider (e.g.)`Coders`

that pass input through**and**report errors; - Decide whether the
`f`

parameter on the input is worth the ergonomic cost. I suspect it is not; while decoding from an input stream is appealing, setting`f ~ Stream (Of a) m`

severely constrains the type of the`g`

parameter, and the a in`f a`

won’t even refer to the stream item type (it names the type of the final value returned by the streaming computation).