Profunctor Decoders? Optical Decoders?

Posted on November 3, 2020 by Jack Kelly
Tags: haskell, coding, dynamodb

I’ve been wrangling with an interesting decoding problem over the weekend. I started out fiddling with profunctors and arrows, but it morphed into something more lenslike for a while. Since Chris Penner has just published his article about fitting Witherable into an optical framework, I thought I’d write up what I have so far, and see if I can get some useful thoughts out while this stuff is flavour of the week.

The Problem

amazonka-dynamodb uses an AttributeValue type to represent data stored in a DynamodDB table. Underneath all the lenses, this is morally a sum type: an AttributeValue can be a string, number, boolean, binary, null, string set, binary set, &c.

When storing or retrieving data (e.g., with the PutItem and Query operations), amazonka often uses HashMap Text AttributeValue in its request and response types. We’re looking for a solution with the following properties:

  1. It needs to be composable. There should be little difference between (de)serialising a primitive AttributeValue and an AttributeValue at some specific column in the table.
  2. (De)serialising should be as close to bidirectional as possible, but only when that makes sense. Basic use cases treat a DynamoDB table as a key-value store, and deserialisation becomes very similar to serialisation (read/write these fields at those keys). DynamoDB wizards pack many access patterns into a single table, and each access pattern could well need a different (de)serialiser.

    Aside: Single-Table Design in DynamoDB

    Alex DeBrie has a great article on single-table design in DynamoDB. In it, he quotes Forrest Brazeal (whose article is also pretty good):

    In fact, a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet — despite all the bespoke, human finagling it took to create it.
    Rick Houlihan has also given some excellent talks about DynamoDB: one in 2018, and one in 2017. Watch the 2018 talk first, because it does a better job of talking through technology shifts, and motivating the use of denormalised data. The 2017 talk has a lot of overlap with the 2018 talk, but there’s enough different material to make it worth watching too.
  3. It should be symmetric. Where it makes sense, encoding and decoding should have similar interfaces.
  4. It should be ergonomic. This is somewhat subjective, but important. Bad ergonomics is one reason I didn’t pursue monoidal functors any further — the newtypes that select which bifunctor to use added a lot of noise.
  5. It should have good error reporting. It’s unacceptable to return Nothing on failure.

Non-Solution: To/From Classes

aeson-style encoder/decoder typeclasses are well-understood. We need to deserialise both AttributeValue and HashMap Text AttributeValue, which means we need two classes for each direction:

data DecodeError = Something -- Placeholder

class FromAttributeValue a where
  fromAttributeValue :: AttributeValue -> Either DecodeError a

class FromAttributeValues a where
  fromAttributeValues :: HashMap Text AttributeValue -> Either DecodeError a

class ToAttributeValue a where
  toAttributeValue :: a -> AttributeValue

class ToAttributeValues a where
  toAttributeValues :: a -> HashMap Text AttributeValue

This is not acceptable for several reasons:

Partial Solution: Profunctor/Arrow Coders

We want to be able to abstract our inputs over AttributeValue and HashMap Text AttributeValue, so the obvious thing to do is to have a type variable for the input as well as the output. We’d also like to deal with different ways of failing (Maybe, Either SomeError, accumulating errors into a Validation, These, …). Let’s hack up some types:

class Failing f where
  warn :: String -> a -> f a
  err :: String -> f a

newtype Coder f i o = Coder { runCoder :: i -> f o }
  deriving stock Functor
  deriving newtype (Semigroup, Monoid)
  deriving (Applicative, Monad) via (Star f i)

hoistCoder :: (forall a . f a -> g a) -> Coder f i o -> Coder g i o
hoistCoder nt (Coder f) = Coder $ nt . f

(Star is Data.Profunctor.Star, and is representationally equivalent to Coder.)

This gives us a whole pile of useful instances, and we can even use instances from Control.Arrow or product-profunctors to construct analogues to Divisible and Decidable. (I gave a talk about this at fp-syd the other week, but it was quite exploratory and I’m not sure if there’s good video.)

Let’s assume the existence of a simple decoder of Int from AttributeValue, and write a lifting function that can extract from a map key:

decodeInt :: Failing f => Coder f AttributeValue Int
decodeInt = undefined -- Not important

  :: (Eq k, Hashable k, Show k, Failing f)
  => k -> Coder f v o -> Coder f (HashMap k v) o
decodeAtKey k (Coder f) = Coder $ \i ->
  maybe (err $ "missing key: " <> show k) f $ HashMap.lookup k i

So far, so good. We could go even further and decode the [HashMap Text AttributeValue] that the Query operation returns. What about encoding? Let’s suppose the existence of an encoder from Int to AttributeValue and try to lift it into encoding at a map key:

encodeInt :: Coder Identity Int AttributeValue
encodeInt = undefined -- Not important

  :: (Functor f, Eq k, Hashable k)
  => k -> Coder f i v -> Coder f i (HashMap k v)
encodeAtKey k (Coder f) = Coder $ \i ->
  f i <&> \v -> HashMap.singleton k v

Here’s where the cracks creep in. We’d like to mconcat our outputs together to make a larger structure, but that can be surprisingly slow. We actually want the Monoid instance of Endo m:

  :: (Eq k, Hashable k)
  => k -> Coder Identity i v -> Coder Endo i (HashMap k v)
encodeAtKey k (Coder f) = Coder $ \i ->
  Endo $ HashMap.insert k (runIdentity $ f i)

collectHashMap :: [Coder Endo i (HashMap k v)] -> Coder Identity i (HashMap k v)
collectHashMap coders = Coder $ \i ->
  Identity $ appEndo (runCoder (mconcat coders) i) HashMap.empty

To do this, we’ve had to abandon any pretense of generality in the “Functor” type parameter f. We can’t use Compose Endo f because it doesn’t have the Monoid instance we want.

What About Optics?

Let’s go back to when things were going well, and look more closely at decodeAtKey. What happens if we strip away the newtypes?

  :: (Eq k, Hashable k, Show k, Failing f)
  => k -> Coder f v o  -> Coder f (HashMap k v) o
  -- k -> (v -> f o) -> (HashMap k v -> f o)
  -- k -> LensLike f (HashMap k v) o v o

This seems promising, and if we step into the lens universe we can use Prisms for our primitives. This also gives us a start on our “bidirectional where it makes sense” goal:

int :: Prism' AttributeValue Int
int = undefined -- Not important

preview int :: m -> Maybe Int
review int :: Int -> AttributeValue

Trouble is, it’s very easy to lose prism-ness, and find ourselves stuck with a Traversal or Fold. To lift our primitives to work over map keys, we need to wrangle the prism itself:

atKey :: (Eq k, Hashable k) => k -> Prism' v a -> Prism' (HashMap k v) a
atKey k pr = prism'
  (HashMap.singleton k . review pr)
  (HashMap.lookup k >=> preview pr)

We’ve found ourselves back at our first problem: we’re making singleton maps again. There might be ways around this (Yair Chuchem has some examples of using Prisms to implement codecs and “roundtrip” ASTs), but even then you have to solve the error-reporting problem. Yair proposes a VerbosePointed typeclass to smuggle error information into the f parameter of an optic, but Ed Kmett notes that they ‘played with “coindexed” prisms which looked somewhat similar to these, but ran into problems with inference for them when folks would compose them with indexed traversals’, so that doesn’t sound promising.

Pickler Combinators

How else can we keep encoding and decoding together? I came across a 2004 Functional Pearl, “Pickler Combinators”, which defines a “pickler/unpickler” type:

type St = [Char]
data PU a = PU { appP :: (a, St) -> St, appU :: St -> (a, St) }

Many examples in the paper do the equivalent of invmaping with a partial function on one side, which definitely doesn’t give us the safety we’re after. We also want to extend this to support varying input types, which means we have a data PU i o = ... type where both i and o are invariant. This quickly leads down a similar road as the monoidal functor stuff, but two invariant type variables make it twice as awkward.

What Next?

I’m not really satisfied by any of the options in this post, but the Coder profunctor/arrow experiment dissatisfies me least. I’ll probably develop this further at some point, and it might even be worth putting a “Functor-shaped parameter” on the input side, to allow a Coder to consume input from a streaming library:

newtype Coder f g a b = Coder { unCoder :: f a -> g b }

While I haven’t found a good story for bidirectional {en,de}coding, the g parameter enables a bunch of neat tricks:

There’s a few things I’m hoping to do, to see whether or not this actually has legs:

Previous Post
All Posts | RSS | Atom
Next Post
Copyright © 2024 Jack Kelly
Site generated by Hakyll (source)