Profunctor Decoders? Optical Decoders?

Posted on November 3, 2020 by Jack Kelly

I’ve been wrangling with an interesting decoding problem over the weekend. I started out fiddling with profunctors and arrows, but it morphed into something more lenslike for a while. Since Chris Penner has just published his article about fitting Witherable into an optical framework, I thought I’d write up what I have so far, and see if I can get some useful thoughts out while this stuff is flavour of the week.

The Problem

amazonka-dynamodb uses an AttributeValue type to represent data stored in a DynamodDB table. Underneath all the lenses, this is morally a sum type: an AttributeValue can be a string, number, boolean, binary, null, string set, binary set, &c.

When storing or retrieving data (e.g., with the PutItem and Query operations), amazonka often uses HashMap Text AttributeValue in its request and response types. We’re looking for a solution with the following properties:

It needs to be composable. There should be little difference between (de)serialising a primitive AttributeValue and an AttributeValue at some specific column in the table.
(De)serialising should be as close to bidirectional as possible, but only when that makes sense. Basic use cases treat a DynamoDB table as a key-value store, and deserialisation becomes very similar to serialisation (read/write these fields at those keys). DynamoDB wizards pack many access patterns into a single table, and each access pattern could well need a different (de)serialiser.

Aside: Single-Table Design in DynamoDB

Alex DeBrie has a great article on single-table design in DynamoDB. In it, he quotes Forrest Brazeal (whose article is also pretty good):

In fact, a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet — despite all the bespoke, human finagling it took to create it.
Rick Houlihan has also given some excellent talks about DynamoDB: one in 2018, and one in 2017. Watch the 2018 talk first, because it does a better job of talking through technology shifts, and motivating the use of denormalised data. The 2017 talk has a lot of overlap with the 2018 talk, but there’s enough different material to make it worth watching too.
It should be symmetric. Where it makes sense, encoding and decoding should have similar interfaces.
It should be ergonomic. This is somewhat subjective, but important. Bad ergonomics is one reason I didn’t pursue monoidal functors any further — the newtypes that select which bifunctor to use added a lot of noise.
It should have good error reporting. It’s unacceptable to return Nothing on failure.

Non-Solution: `To`/`From` Classes

aeson-style encoder/decoder typeclasses are well-understood. We need to deserialise both AttributeValue and HashMap Text AttributeValue, which means we need two classes for each direction:

data DecodeError = Something -- Placeholder

class FromAttributeValue a where
  fromAttributeValue :: AttributeValue -> Either DecodeError a

class FromAttributeValues a where
  fromAttributeValues :: HashMap Text AttributeValue -> Either DecodeError a

class ToAttributeValue a where
  toAttributeValue :: a -> AttributeValue

class ToAttributeValues a where
  toAttributeValues :: a -> HashMap Text AttributeValue

This is not acceptable for several reasons:

Type classes for serialisation become awkward across package boundaries. Libraries that provide data structures end up depending on serialisation libraries (potentially pulling in large dependency trees) or are forced to create a separate package with either orphan instances or newtypes just to hold instances. None of these are satisfactory options.
Type-class-based serialisation libraries sometimes provide instances that use an element’s (de)serialisers to (de)serialise a collection (e.g., instance FromJSON a => FromJSON [a]). This is convenient, but presupposes a single canonical transformation, which is not necessarily the case.
There’s no symmetry between encoders and decoders. Decoders may fail, encoders always succeed.

Partial Solution: Profunctor/Arrow Coders

We want to be able to abstract our inputs over AttributeValue and HashMap Text AttributeValue, so the obvious thing to do is to have a type variable for the input as well as the output. We’d also like to deal with different ways of failing (Maybe, Either SomeError, accumulating errors into a Validation, These, …). Let’s hack up some types:

class Failing f where
  warn :: String -> a -> f a
  err :: String -> f a

newtype Coder f i o = Coder { runCoder :: i -> f o }
  deriving stock Functor
  deriving newtype (Semigroup, Monoid)
  deriving (Applicative, Monad) via (Star f i)

hoistCoder :: (forall a . f a -> g a) -> Coder f i o -> Coder g i o
hoistCoder nt (Coder f) = Coder $ nt . f

(Star is Data.Profunctor.Star, and is representationally equivalent to Coder.)

This gives us a whole pile of useful instances, and we can even use instances from Control.Arrow or product-profunctors to construct analogues to Divisible and Decidable. (I gave a talk about this at fp-syd the other week, but it was quite exploratory and I’m not sure if there’s good video.)

Let’s assume the existence of a simple decoder of Int from AttributeValue, and write a lifting function that can extract from a map key:

decodeInt :: Failing f => Coder f AttributeValue Int
decodeInt = undefined -- Not important

decodeAtKey
  :: (Eq k, Hashable k, Show k, Failing f)
  => k -> Coder f v o -> Coder f (HashMap k v) o
decodeAtKey k (Coder f) = Coder $ \i ->
  maybe (err $ "missing key: " <> show k) f $ HashMap.lookup k i

So far, so good. We could go even further and decode the [HashMap Text AttributeValue] that the Query operation returns. What about encoding? Let’s suppose the existence of an encoder from Int to AttributeValue and try to lift it into encoding at a map key:

encodeInt :: Coder Identity Int AttributeValue
encodeInt = undefined -- Not important

encodeAtKey
  :: (Functor f, Eq k, Hashable k)
  => k -> Coder f i v -> Coder f i (HashMap k v)
encodeAtKey k (Coder f) = Coder $ \i ->
  f i <&> \v -> HashMap.singleton k v

Here’s where the cracks creep in. We’d like to mconcat our outputs together to make a larger structure, but that can be surprisingly slow. We actually want the Monoid instance of Endo m:

encodeAtKey
  :: (Eq k, Hashable k)
  => k -> Coder Identity i v -> Coder Endo i (HashMap k v)
encodeAtKey k (Coder f) = Coder $ \i ->
  Endo $ HashMap.insert k (runIdentity $ f i)

collectHashMap :: [Coder Endo i (HashMap k v)] -> Coder Identity i (HashMap k v)
collectHashMap coders = Coder $ \i ->
  Identity $ appEndo (runCoder (mconcat coders) i) HashMap.empty

To do this, we’ve had to abandon any pretense of generality in the “Functor” type parameter f. We can’t use Compose Endo f because it doesn’t have the Monoid instance we want.

What About Optics?

Let’s go back to when things were going well, and look more closely at decodeAtKey. What happens if we strip away the newtypes?

decodeAtKey
  :: (Eq k, Hashable k, Show k, Failing f)
  => k -> Coder f v o  -> Coder f (HashMap k v) o
  -- k -> (v -> f o) -> (HashMap k v -> f o)
  -- k -> LensLike f (HashMap k v) o v o

This seems promising, and if we step into the lens universe we can use Prisms for our primitives. This also gives us a start on our “bidirectional where it makes sense” goal:

int :: Prism' AttributeValue Int
int = undefined -- Not important

preview int :: m -> Maybe Int
review int :: Int -> AttributeValue

Trouble is, it’s very easy to lose prism-ness, and find ourselves stuck with a Traversal or Fold. To lift our primitives to work over map keys, we need to wrangle the prism itself:

atKey :: (Eq k, Hashable k) => k -> Prism' v a -> Prism' (HashMap k v) a
atKey k pr = prism'
  (HashMap.singleton k . review pr)
  (HashMap.lookup k >=> preview pr)

We’ve found ourselves back at our first problem: we’re making singleton maps again. There might be ways around this (Yair Chuchem has some examples of using Prisms to implement codecs and “roundtrip” ASTs), but even then you have to solve the error-reporting problem. Yair proposes a VerbosePointed typeclass to smuggle error information into the f parameter of an optic, but Ed Kmett notes that they ‘played with “coindexed” prisms which looked somewhat similar to these, but ran into problems with inference for them when folks would compose them with indexed traversals’, so that doesn’t sound promising.

Pickler Combinators

How else can we keep encoding and decoding together? I came across a 2004 Functional Pearl, “Pickler Combinators”, which defines a “pickler/unpickler” type:

type St = [Char]
data PU a = PU { appP :: (a, St) -> St, appU :: St -> (a, St) }

Many examples in the paper do the equivalent of invmaping with a partial function on one side, which definitely doesn’t give us the safety we’re after. We also want to extend this to support varying input types, which means we have a data PU i o = ... type where both i and o are invariant. This quickly leads down a similar road as the monoidal functor stuff, but two invariant type variables make it twice as awkward.

What Next?

I’m not really satisfied by any of the options in this post, but the Coder profunctor/arrow experiment dissatisfies me least. I’ll probably develop this further at some point, and it might even be worth putting a “Functor-shaped parameter” on the input side, to allow a Coder to consume input from a streaming library:

newtype Coder f g a b = Coder { unCoder :: f a -> g b }

While I haven’t found a good story for bidirectional {en,de}coding, the g parameter enables a bunch of neat tricks:

With g ~ Either AnError, you can report failure;
With g ~ Endo, you can build up maps or other data structures;
With g ~ Compose IO (Const ()), you can execute actions on the coded structure; or
With g ~ ((,) i), you can pass unconsumed input to a downstream Coder, giving it the flavour of a parser combinator library.

There’s a few things I’m hoping to do, to see whether or not this actually has legs:

Build a vocabulary of combinators to lift Coders in and out of maps, lists, other monoids; sequence Coders and build up structures using Applicative/Divisible/Decidable-style combinators &c.;
Build out a vocabulary of primitive {en,de}-coders for DynamoDB AttributeValues;
Work out how to make the different functors for the g parameter compose well. I suspect something MTL-ish will emerge, as we’ll need to consider (e.g.) Coders that pass input through and report errors;
Decide whether the f parameter on the input is worth the ergonomic cost. I suspect it is not; while decoding from an input stream is appealing, setting f ~ Stream (Of a) m severely constrains the type of the g parameter, and the a in f a won’t even refer to the stream item type (it names the type of the final value returned by the streaming computation).