The Road to Amazonka 2.0

Posted on August 30, 2023 by Jack Kelly
Tags: haskell, aws, coding

Last month, Brendan Hay and I released the 2.0 version of Amazonka, the de facto but unofficial AWS SDK for Haskell. Before that, Amazonka had seen intermittent commits and some pretty major improvements, but hadn’t managed an actual release in about four years. Because of the lack of visible progress, more serious industrial users maintained private forks instead of contributing to the main repository. It took about two years of work to pick up what was left behind, triage all the open issues, make several necessary major improvements, and get the whole project back into a shippable state.

I believe that many open source ecosystems have projects like Amazonka: projects which are large, important to their ecosystem, and stuck. These are my notes about how to unstick such a project, using Amazonka as a case study. It’s a fair amount of work, but a surprising amount of help can come out of the woodwork once someone makes the first move. That person could be you, and if it’s not you, then who’s it gonna be?

Getting Started

I started in almost the worst possible way, by barging onto the Hackage Trustees’ issue tracker and asking to do a non-maintainer upload of the whole amazonka-* package family. While this was a pretty rude thing to do, it at least got me and Brendan talking.

After that, I instead tried doing the actual work: looking for issues to close or PR, and classifying open PRs as “needs fixes”, “should merge”, or “should close”. When the project still has a somewhat responsive maintainer, helping to tame the bugtracker is a great way to move from being an occasional PR author and instead become an actual member of the project team. The biggest difficulty with resurrecting a large stuck project like Amazonka is working out what actually needs to be done: issues can become irrelevant with age, some things should be deferred until after the big cleanup, and the whole thing becomes a big unappealing tangle which the maintainer never quite gets to. Fixing this means making the maintainer’s job as easy as possible; my approach was to write more rather than less, and bring enough detail together that Brendan should be able to say “yep, closed” or “yep, merged”.

My experience with Amazonka and other Haskell libraries is that maintainers tend to care deeply about their code, feel very responsible for it, but are usually time-poor. Nearly every time I’ve showed up willing to do the work, people have bent over backwards to accommodate it. The best thing you can do as a contributor, then, is make the maintainer’s job as easy as possible by putting effort into commit messages, CHANGELOG.md entries, PR descriptions, and so on. A time-poor maintainer will have a much easier time approving things if you provide all the necessary context.

But sometimes maintainers just aren’t able to get the work done, for any number of reasons. Burnout is a real problem, they could be working in other languages, in remote locations, or dealing with a major life event. In those instances you will need to take on more responsibility. My recommendation: take the minimum additional responsibility you need to get the job done, because that ruffles the fewest feathers. A handover or co-maintainership is better than a fork with maintainer blessing, which is much better than a hostile fork. I started opening PRs and posting recommendations to issues in early 2021. In September 2021 there was a flare-up on the issue tracker of the “just fork it, it’s never going to get done” type, so I suggested that Brendan lay out a road map and appoint additional maintainers. He offered me collaborator access, I accepted, and then the real work began.

Getting the Repo Under Control

The issue tracker is the map to the next release, and a map is no good if it doesn’t match the territory. After being made collaborator, I tagged every open issue and pull request (PR) with a new needs triage label, and read/triaged/split/closed all of them. This sounds intimidating, and it is, but it’s quite doable. Having the label made it easy to get a list of all untriaged items, and I would go through 5–10 issues over breakfast each morning instead of reading junk online. This proved to be extremely useful, and recommend it to anyone picking up a dormant project. It gave me a handle on how the library worked, what was on Brendan’s road-map, the pain points affecting real users, and so on.

For this reason, I consider stalebots harmful and recommend against summarily closing old issues until you’ve thought about them properly and understood exactly what’s been reported. Each comment, issue, and PR against a dormant project exists because somebody cared enough to write it up, and that’s worth taking seriously.

After reading and triaging the issues and PRs, I had some sense of what things people needed, was able to cluster them into rough groupings, make guesses at what looked easier or harder, and start working through them in batches. This makes learning a large project much less intimidating, because you can go and learn how (say) service overrides work, clear off a bunch of those issues, and then move onto something else like request signing. A full mental model of the project then develops over time. (In hindsight, it may be possible to acquire this mental model more rapidly by using something like reflexion models.)

Handling New Contributors

Cleaning up the issue tracker had a surprising side-effect: a few people noticed the increased activity and came out of the woodwork to submit features they’d developed for their own use. A few major features came via PR, such as support for AWS SSO Identity Center and the sts:AssumeRoleFromWebIdentity API call (used to assume IAM roles via OpenID Connect, as well as from other identity providers).

These PRs often worked well, but sometimes lacked understanding of Amazonka’s architectural direction because that context was never written down. In these cases, I had to politely request a total rewrite, which is feedback that must be delivered carefully. I therefore put extra effort into these reviews to spell out the library’s architecture and make the contributors’ job as easy as possible. In almost every instance, the authors happily rewrote and resubmitted their PRs, even though the rewrites were a decent amount of work. I’m very grateful for their contributions, and for their flexibility.

Calling Back More Users

To confidently make a final release, we needed more people using it. Many people were already using relatively recent versions of Amazonka from git, but we wanted people to move from their private forks to our version. The first release candidate was announced at the end of November 2021, to give people (especially industrial users) a chance to test it with real workloads.

As with maintainers, I think it’s really important to be respectful of your users’ time, and we needed to make it easy for them to try the release candidate. We therefore provided instructions for how to import Amazonka from git, for both Cabal and Stack. (For similar reasons, we also made sure to provide a migration guide from the 1.6.1 to the 2.0 final release.)

This had the desired effect — it brought a lot of reports out of the woodwork. Most said “yes, this is working great for us”, but it also caused a welcome flurry of bug reports. The proposed four-week stabilisation period turned out to be wildly optimistic, and it wasn’t until July 2023 that we were able to announce a second release candidate.

I would’ve preferred a smaller gap between RC1 and RC2, but it turned out that there was a lot more work required, and some of it required long stretches of focused work. One example: to bring amazonka’s authentication support in line with official AWS SDKs, we needed to add support for unsigned requests to AWS and support several new authentication methods. Doing this properly and in an extensible way required a thorough rework of the authentication subsystem.

Thanks

This release would never have happened if not for the support and contributions of a great many people. Here is a partial list:

Final Thoughts

I never expected to be the one to do this: I got into cloud relatively recently, and Amazonka initially looked too intimidating to tackle. But work started leaning into more AWS-specific offerings, and the need for a better SDK became more pressing. I also completed my AWS Certified Solutions Architect — Associate certificate, and became the one on the team with strong AWS and Haskell knowledge. On top of that, some close friends got in my ear, saying things like, “you care a lot about Haskell and about cloud. Your community needs this, and you have the skills to do it. If it’s not you, then who’s it gonna be?”

And that’s my challenge to you. Find a stuck project that’s important to your part of the ecosystem, and see if you can unstick it. Because if it’s not you, then who’s it gonna be?

Previous Post
All Posts | RSS | Atom
Next Post
Copyright © 2024 Jack Kelly
Site generated by Hakyll (source)