2023-02-16

No New Infrastructure

How many times have you heard of an engineering team complain about the lack of investment in Developer Experience? Every team i’ve worked with constantly bemoans the issue. Yet they’re often the same engineers who will spin up the newest AWS product at the drop of a hat. So many engineers I work with don’t appreciate how these two things are intertwined. As a default, you should assume that new infra comes at the cost of developer experience. Let me explain.

Every piece of infrastructure your code depends on requires you can either run it locally, or you have some staging environment that you can develop off of. Otherwise you can no longer run your app locally. Consider SQS. You read on a blog it is what you should be using to distribute work to your web worker. But then you onboard a new developer and you realize you can’t run it on localhost. So you have to provision a new SQS queue for each developer. It’s not the end of the world, but did you script that process? Does that new developer have the appropriate AWS permissions to spin up one himself? Maybe he has to ping his boss to get it setup.

This is a relatable, and i’ll admit a rather benign example of the general problem, but it demonstrates it clearly. Every time you add a new piece of infrastructure to the stack, you complicate your development story. Oh! and what about testing? Are you going to do away with end to end testing? Are you going to mock out SQS and just unit test? Are you going to provision an SQS just for your CI and use the developer queue when running locally? What if you have CI running on multiple branches concurrently? Wires crossed, Whoops!

Worse yet, we often delude ourselves into believing we need to the specialized infrastructure under the guise of future scaling concerns. Can you confidently say you’ve done the math on that? Are you certain you know what Postgres is capable of? Are you aware of the intermediate steps available before your launch that memcached cluster?

Let’s consider some alternatives to two of the more common needs of a web developer, caching and worker queues.

For caching, everyone will jump to Redis. It’s just proven. But are you aware of Postgres UNLOGGED tables? They’re tables that skip writing to the WAL and will therefore drop data under node failure (just like Redis) but give you huge performance boosts against a standard Postgres table. Hell, even if scale creates database contention, you can partition off your unlogged table to another Postgres instance in production. Running everything against a single Postgres instance locally and in tests.

For background workers, similarly people will either jump to Redis or SQS. Though, did you know Postgres has a pub/sub system built in and row level notifications. Through your standard database connection pool, the database can notify your workers when there is a new task to compute avoiding unnecessary polling. And just like the caching example, you can just split out those tables into a new database in production if you are truly hitting scale issues before you have to migrate to a custom piece of infrastructure. Running against a single database locally and in tests.

Best part, is that you can leverage all the migration tools your developers already know through regular application development. Nothing new.

There are other ways to solve this problem. The Laravel community is big on creating mock implementations of infrastructure to run locally. They’re expensive to develop, but they’re very effective. You could imagine creating an in-memory GET/SET/EXPIRE type interface to mock out your cache locally. You don’t have to rely on Postgres like I mentioned above. I only suggest using Postgres is because you already have it in your stack. It’s necessary, so leverage it maximally.

Look, the sentiment here may be a bit hyperbolic but it’s to illustrate a point. I’ve seen it so many times before. Rarely is the developer experience considered when new infrastructure is introduced. If it’s not considered, it’ll come at a cost to it.

With new infra, It’s best to be default no, until proven otherwise.