I’ve been wrangling with an interesting decoding problem over the
weekend. I started out fiddling with profunctors and arrows, but it
morphed into something more lenslike for a while. Since Chris Penner has
just published his article about fitting
Witherable
into an optical framework, I thought I’d
write up what I have so far, and see if I can get some useful thoughts
out while this stuff is flavour of the week.
amazonka-dynamodb
uses an AttributeValue
type to represent data stored in a DynamodDB table. Underneath all the
lenses, this is morally a sum type: an AttributeValue
can
be a string, number, boolean, binary, null, string set, binary set,
&c.
When storing or retrieving data (e.g., with the PutItem
and Query
operations), amazonka often uses
HashMap Text AttributeValue
in its request and response
types. We’re looking for a solution with the following properties:
AttributeValue
and an
AttributeValue
at some specific column in the table.
(De)serialising should be as close to bidirectional as possible, but only when that makes sense. Basic use cases treat a DynamoDB table as a key-value store, and deserialisation becomes very similar to serialisation (read/write these fields at those keys). DynamoDB wizards pack many access patterns into a single table, and each access pattern could well need a different (de)serialiser.
Alex DeBrie has a great article on single-table design in DynamoDB. In it, he quotes Forrest Brazeal (whose article is also pretty good):
In fact, a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet — despite all the bespoke, human finagling it took to create it.Rick Houlihan has also given some excellent talks about DynamoDB: one in 2018, and one in 2017. Watch the 2018 talk first, because it does a better job of talking through technology shifts, and motivating the use of denormalised data. The 2017 talk has a lot of overlap with the 2018 talk, but there’s enough different material to make it worth watching too.
newtype
s that select which
bifunctor to use added a lot of noise.
Nothing
on failure.
To
/From
Classesaeson
-style
encoder/decoder typeclasses are well-understood. We need to deserialise
both AttributeValue
and
HashMap Text AttributeValue
, which means we need two
classes for each direction:
data DecodeError = Something -- Placeholder
class FromAttributeValue a where
fromAttributeValue :: AttributeValue -> Either DecodeError a
class FromAttributeValues a where
fromAttributeValues :: HashMap Text AttributeValue -> Either DecodeError a
class ToAttributeValue a where
toAttributeValue :: a -> AttributeValue
class ToAttributeValues a where
toAttributeValues :: a -> HashMap Text AttributeValue
This is not acceptable for several reasons:
Type classes for serialisation become awkward across package boundaries. Libraries that provide data structures end up depending on serialisation libraries (potentially pulling in large dependency trees) or are forced to create a separate package with either orphan instances or newtypes just to hold instances. None of these are satisfactory options.
Type-class-based serialisation libraries sometimes provide
instances that use an element’s (de)serialisers to (de)serialise a
collection (e.g., instance FromJSON a => FromJSON [a]
).
This is convenient, but presupposes a single canonical transformation,
which is not necessarily the case.
There’s no symmetry between encoders and decoders. Decoders may fail, encoders always succeed.
We want to be able to abstract our inputs over
AttributeValue
and
HashMap Text AttributeValue
, so the obvious thing to do is
to have a type variable for the input as well as the output. We’d also
like to deal with different ways of failing (Maybe
,
Either SomeError
, accumulating errors into a
Validation
, These
, …). Let’s hack up some
types:
class Failing f where
warn :: String -> a -> f a
err :: String -> f a
newtype Coder f i o = Coder { runCoder :: i -> f o }
deriving stock Functor
deriving newtype (Semigroup, Monoid)
deriving (Applicative, Monad) via (Star f i)
hoistCoder :: (forall a . f a -> g a) -> Coder f i o -> Coder g i o
Coder f) = Coder $ nt . f hoistCoder nt (
(Star
is Data.Profunctor.Star
,
and is representationally equivalent to Coder
.)
This gives us a whole pile of useful instances, and we can even use
instances from Control.Arrow
or product-profunctors
to construct analogues to Divisible
and Decidable
. (I gave a talk about this at fp-syd the other week, but it
was quite exploratory and I’m not sure if there’s good video.)
Let’s assume the existence of a simple decoder of Int
from AttributeValue
, and write a lifting function that can
extract from a map key:
decodeInt :: Failing f => Coder f AttributeValue Int
= undefined -- Not important
decodeInt
decodeAtKey :: (Eq k, Hashable k, Show k, Failing f)
=> k -> Coder f v o -> Coder f (HashMap k v) o
Coder f) = Coder $ \i ->
decodeAtKey k (maybe (err $ "missing key: " <> show k) f $ HashMap.lookup k i
So far, so good. We could go even further and decode the
[HashMap Text AttributeValue]
that the Query
operation returns. What about encoding? Let’s suppose the existence of
an encoder from Int
to AttributeValue
and try
to lift it into encoding at a map key:
encodeInt :: Coder Identity Int AttributeValue
= undefined -- Not important
encodeInt
encodeAtKey :: (Functor f, Eq k, Hashable k)
=> k -> Coder f i v -> Coder f i (HashMap k v)
Coder f) = Coder $ \i ->
encodeAtKey k (<&> \v -> HashMap.singleton k v f i
Here’s where the cracks creep in. We’d like to mconcat
our outputs together to make a larger structure, but that can be surprisingly
slow. We actually want the Monoid
instance of Endo m
:
encodeAtKey :: (Eq k, Hashable k)
=> k -> Coder Identity i v -> Coder Endo i (HashMap k v)
Coder f) = Coder $ \i ->
encodeAtKey k (Endo $ HashMap.insert k (runIdentity $ f i)
collectHashMap :: [Coder Endo i (HashMap k v)] -> Coder Identity i (HashMap k v)
= Coder $ \i ->
collectHashMap coders Identity $ appEndo (runCoder (mconcat coders) i) HashMap.empty
To do this, we’ve had to abandon any pretense of generality in the
“Functor
” type parameter f
. We can’t use
Compose Endo f
because it doesn’t have the
Monoid
instance we want.
Let’s go back to when things were going well, and look more closely
at decodeAtKey
. What happens if we strip away the
newtype
s?
decodeAtKey :: (Eq k, Hashable k, Show k, Failing f)
=> k -> Coder f v o -> Coder f (HashMap k v) o
-- k -> (v -> f o) -> (HashMap k v -> f o)
-- k -> LensLike f (HashMap k v) o v o
This seems promising, and if we step into the lens
universe we can use Prism
s for our primitives. This also
gives us a start on our “bidirectional where it makes sense” goal:
int :: Prism' AttributeValue Int
= undefined -- Not important
int
int :: m -> Maybe Int
preview int :: Int -> AttributeValue review
Trouble is, it’s very easy to lose prism-ness, and find ourselves
stuck with a Traversal
or Fold
. To lift our
primitives to work over map keys, we need to wrangle the prism
itself:
atKey :: (Eq k, Hashable k) => k -> Prism' v a -> Prism' (HashMap k v) a
= prism'
atKey k pr . review pr)
(HashMap.singleton k >=> preview pr) (HashMap.lookup k
We’ve found ourselves back at our first problem: we’re making
singleton maps again. There might be ways around this (Yair Chuchem has
some examples of using Prism
s to implement
codecs and “roundtrip”
ASTs), but even then you have to solve the error-reporting problem.
Yair proposes a VerbosePointed
typeclass to smuggle error information into the f
parameter of an optic, but Ed
Kmett notes that they ‘played with “coindexed” prisms which looked
somewhat similar to these, but ran into problems with inference for them
when folks would compose them with indexed traversals’, so that doesn’t
sound promising.
How else can we keep encoding and decoding together? I came across a 2004 Functional Pearl, “Pickler Combinators”, which defines a “pickler/unpickler” type:
type St = [Char]
data PU a = PU { appP :: (a, St) -> St, appU :: St -> (a, St) }
Many examples in the paper do the equivalent of invmap
ing
with a partial function on one side, which definitely doesn’t give us
the safety we’re after. We also want to extend this to support varying
input types, which means we have a data PU i o = ...
type
where both i
and o
are
invariant. This quickly leads down a similar road as the monoidal
functor stuff, but two invariant type variables make it twice as
awkward.
I’m not really satisfied by any of the options in this post, but the
Coder
profunctor/arrow experiment dissatisfies me least.
I’ll probably develop this further at some point, and it might even be
worth putting a “Functor
-shaped parameter” on the input
side, to allow a Coder
to consume input from a streaming
library:
newtype Coder f g a b = Coder { unCoder :: f a -> g b }
While I haven’t found a good story for bidirectional {en,de}coding,
the g
parameter enables a bunch of neat tricks:
g ~ Either AnError
, you can report failure;g ~ Endo
, you can build up maps or other data
structures;g ~ Compose IO (Const ())
, you can execute actions
on the coded structure; org ~ ((,) i)
, you can pass unconsumed input to a
downstream Coder
, giving it the flavour of a parser
combinator library.There’s a few things I’m hoping to do, to see whether or not this actually has legs:
Coder
s in and
out of maps, lists, other monoids; sequence Coder
s and
build up structures using
Applicative
/Divisible
/Decidable
-style
combinators &c.;AttributeValue
s;g
parameter compose well. I suspect something MTL-ish will emerge, as
we’ll need to consider (e.g.) Coders
that pass input
through and report errors;f
parameter on the input is worth
the ergonomic cost. I suspect it is not; while decoding from an input
stream is appealing, setting f ~ Stream (Of a) m
severely
constrains the type of the g
parameter, and the a in
f a
won’t even refer to the stream item type (it names the
type of the final value returned by the streaming computation).