I’ve been wrangling with an interesting decoding problem over the weekend. I started out fiddling with profunctors and arrows, but it morphed into something more lenslike for a while. Since Chris Penner has just published his article about fitting Witherable
into an optical framework, I thought I’d write up what I have so far, and see if I can get some useful thoughts out while this stuff is flavour of the week.
amazonka-dynamodb
uses an AttributeValue
type to represent data stored in a DynamodDB table. Underneath all the lenses, this is morally a sum type: an AttributeValue
can be a string, number, boolean, binary, null, string set, binary set, &c.
When storing or retrieving data (e.g., with the PutItem
and Query
operations), amazonka often uses HashMap Text AttributeValue
in its request and response types. We’re looking for a solution with the following properties:
AttributeValue
and an AttributeValue
at some specific column in the table.
(De)serialising should be as close to bidirectional as possible, but only when that makes sense. Basic use cases treat a DynamoDB table as a key-value store, and deserialisation becomes very similar to serialisation (read/write these fields at those keys). DynamoDB wizards pack many access patterns into a single table, and each access pattern could well need a different (de)serialiser.
Alex DeBrie has a great article on single-table design in DynamoDB. In it, he quotes Forrest Brazeal (whose article is also pretty good):
In fact, a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet — despite all the bespoke, human finagling it took to create it.Rick Houlihan has also given some excellent talks about DynamoDB: one in 2018, and one in 2017. Watch the 2018 talk first, because it does a better job of talking through technology shifts, and motivating the use of denormalised data. The 2017 talk has a lot of overlap with the 2018 talk, but there’s enough different material to make it worth watching too.
newtype
s that select which bifunctor to use added a lot of noise.
Nothing
on failure.
To
/From
Classesaeson
-style encoder/decoder typeclasses are well-understood. We need to deserialise both AttributeValue
and HashMap Text AttributeValue
, which means we need two classes for each direction:
data DecodeError = Something -- Placeholder
class FromAttributeValue a where
fromAttributeValue :: AttributeValue -> Either DecodeError a
class FromAttributeValues a where
fromAttributeValues :: HashMap Text AttributeValue -> Either DecodeError a
class ToAttributeValue a where
toAttributeValue :: a -> AttributeValue
class ToAttributeValues a where
toAttributeValues :: a -> HashMap Text AttributeValue
This is not acceptable for several reasons:
Type classes for serialisation become awkward across package boundaries. Libraries that provide data structures end up depending on serialisation libraries (potentially pulling in large dependency trees) or are forced to create a separate package with either orphan instances or newtypes just to hold instances. None of these are satisfactory options.
Type-class-based serialisation libraries sometimes provide instances that use an element’s (de)serialisers to (de)serialise a collection (e.g., instance FromJSON a => FromJSON [a]
). This is convenient, but presupposes a single canonical transformation, which is not necessarily the case.
There’s no symmetry between encoders and decoders. Decoders may fail, encoders always succeed.
We want to be able to abstract our inputs over AttributeValue
and HashMap Text AttributeValue
, so the obvious thing to do is to have a type variable for the input as well as the output. We’d also like to deal with different ways of failing (Maybe
, Either SomeError
, accumulating errors into a Validation
, These
, …). Let’s hack up some types:
class Failing f where
warn :: String -> a -> f a
err :: String -> f a
newtype Coder f i o = Coder { runCoder :: i -> f o }
deriving stock Functor
deriving newtype (Semigroup, Monoid)
deriving (Applicative, Monad) via (Star f i)
hoistCoder :: (forall a . f a -> g a) -> Coder f i o -> Coder g i o
Coder f) = Coder $ nt . f hoistCoder nt (
(Star
is Data.Profunctor.Star
, and is representationally equivalent to Coder
.)
This gives us a whole pile of useful instances, and we can even use instances from Control.Arrow
or product-profunctors
to construct analogues to Divisible
and Decidable
. (I gave a talk about this at fp-syd the other week, but it was quite exploratory and I’m not sure if there’s good video.)
Let’s assume the existence of a simple decoder of Int
from AttributeValue
, and write a lifting function that can extract from a map key:
decodeInt :: Failing f => Coder f AttributeValue Int
= undefined -- Not important
decodeInt
decodeAtKey :: (Eq k, Hashable k, Show k, Failing f)
=> k -> Coder f v o -> Coder f (HashMap k v) o
Coder f) = Coder $ \i ->
decodeAtKey k (maybe (err $ "missing key: " <> show k) f $ HashMap.lookup k i
So far, so good. We could go even further and decode the [HashMap Text AttributeValue]
that the Query
operation returns. What about encoding? Let’s suppose the existence of an encoder from Int
to AttributeValue
and try to lift it into encoding at a map key:
encodeInt :: Coder Identity Int AttributeValue
= undefined -- Not important
encodeInt
encodeAtKey :: (Functor f, Eq k, Hashable k)
=> k -> Coder f i v -> Coder f i (HashMap k v)
Coder f) = Coder $ \i ->
encodeAtKey k (<&> \v -> HashMap.singleton k v f i
Here’s where the cracks creep in. We’d like to mconcat
our outputs together to make a larger structure, but that can be surprisingly slow. We actually want the Monoid
instance of Endo m
:
encodeAtKey :: (Eq k, Hashable k)
=> k -> Coder Identity i v -> Coder Endo i (HashMap k v)
Coder f) = Coder $ \i ->
encodeAtKey k (Endo $ HashMap.insert k (runIdentity $ f i)
collectHashMap :: [Coder Endo i (HashMap k v)] -> Coder Identity i (HashMap k v)
= Coder $ \i ->
collectHashMap coders Identity $ appEndo (runCoder (mconcat coders) i) HashMap.empty
To do this, we’ve had to abandon any pretense of generality in the “Functor
” type parameter f
. We can’t use Compose Endo f
because it doesn’t have the Monoid
instance we want.
Let’s go back to when things were going well, and look more closely at decodeAtKey
. What happens if we strip away the newtype
s?
decodeAtKey :: (Eq k, Hashable k, Show k, Failing f)
=> k -> Coder f v o -> Coder f (HashMap k v) o
-- k -> (v -> f o) -> (HashMap k v -> f o)
-- k -> LensLike f (HashMap k v) o v o
This seems promising, and if we step into the lens
universe we can use Prism
s for our primitives. This also gives us a start on our “bidirectional where it makes sense” goal:
int :: Prism' AttributeValue Int
= undefined -- Not important
int
int :: m -> Maybe Int
preview int :: Int -> AttributeValue review
Trouble is, it’s very easy to lose prism-ness, and find ourselves stuck with a Traversal
or Fold
. To lift our primitives to work over map keys, we need to wrangle the prism itself:
atKey :: (Eq k, Hashable k) => k -> Prism' v a -> Prism' (HashMap k v) a
= prism'
atKey k pr . review pr)
(HashMap.singleton k >=> preview pr) (HashMap.lookup k
We’ve found ourselves back at our first problem: we’re making singleton maps again. There might be ways around this (Yair Chuchem has some examples of using Prism
s to implement codecs and “roundtrip” ASTs), but even then you have to solve the error-reporting problem. Yair proposes a VerbosePointed
typeclass to smuggle error information into the f
parameter of an optic, but Ed Kmett notes that they ‘played with “coindexed” prisms which looked somewhat similar to these, but ran into problems with inference for them when folks would compose them with indexed traversals’, so that doesn’t sound promising.
How else can we keep encoding and decoding together? I came across a 2004 Functional Pearl, “Pickler Combinators”, which defines a “pickler/unpickler” type:
type St = [Char]
data PU a = PU { appP :: (a, St) -> St, appU :: St -> (a, St) }
Many examples in the paper do the equivalent of invmap
ing with a partial function on one side, which definitely doesn’t give us the safety we’re after. We also want to extend this to support varying input types, which means we have a data PU i o = ...
type where both i
and o
are invariant. This quickly leads down a similar road as the monoidal functor stuff, but two invariant type variables make it twice as awkward.
I’m not really satisfied by any of the options in this post, but the Coder
profunctor/arrow experiment dissatisfies me least. I’ll probably develop this further at some point, and it might even be worth putting a “Functor
-shaped parameter” on the input side, to allow a Coder
to consume input from a streaming library:
newtype Coder f g a b = Coder { unCoder :: f a -> g b }
While I haven’t found a good story for bidirectional {en,de}coding, the g
parameter enables a bunch of neat tricks:
g ~ Either AnError
, you can report failure;g ~ Endo
, you can build up maps or other data structures;g ~ Compose IO (Const ())
, you can execute actions on the coded structure; org ~ ((,) i)
, you can pass unconsumed input to a downstream Coder
, giving it the flavour of a parser combinator library.There’s a few things I’m hoping to do, to see whether or not this actually has legs:
Coder
s in and out of maps, lists, other monoids; sequence Coder
s and build up structures using Applicative
/Divisible
/Decidable
-style combinators &c.;AttributeValue
s;g
parameter compose well. I suspect something MTL-ish will emerge, as we’ll need to consider (e.g.) Coders
that pass input through and report errors;f
parameter on the input is worth the ergonomic cost. I suspect it is not; while decoding from an input stream is appealing, setting f ~ Stream (Of a) m
severely constrains the type of the g
parameter, and the a in f a
won’t even refer to the stream item type (it names the type of the final value returned by the streaming computation).