2024-04-04

Buy your GPUs

You will own nothing and be happy.. Yeah.. right.. 🦅🇺🇸

Fly.io recently published an article, Easy at-home AI with Bumblebee and Fly GPUs. It’s a clever article that takes advantage of Wireguard and Distributed Erlang to allow you to develop code locally against powerful GPUs in the cloud. And while I don’t actually have any issue with this, I’m bored, and I feel like picking a fake fight.

The folks at Fly are playing the good guy. You have your cute little laptop, and think AI is inaccessible to you because you don’t have a spare Kilowatt to run through a PCIe slot. “Come rent one from us”, they say. “You can develop your code against the latest and greatest GPUs and only pay by the hour. Elixir makes it easy. You too can code the next great AI-powered app.”

That’s nice of them, but let’s do some math.

Fly.io charges you a whopping $2.50/hr for a A100 40GB graphics card in their cloud. That’s $1825/month! A TinyBox Green with 6x4090 (144GB of VRAM) is 3x the TFLOPs and 3.5x the VRAM for $25k, aka ~4 month payoff period of the equivalent 3xA100 GPUs. Sounds like a fair trade…

“Sure, but Thomas” you say, “they’re only advocating this for development, you’ll obviously shut them down after you’re done. Your bill won’t be that high.”

Sir, we’re in the business of shipping. If the story ends on localhost, it’s a bad story. What happens when you actually deploy that code? How does your cloud bill look now?

Let’s think about this through a little bit. You spend a month developing an MVP for your new AI powered app. You use the Fly cloud GPU for 7 hours a day. You’re ready to ship it to production, so you get another cloud GPU, but that one has to run 24/7 because while you may not have any users, you may just get one tomorrow at 3am.

The clock starts..

If we’re being honest, you’re a developer and you aren’t going to market your product very well for the first 6 months, instead you’re going to toil away on that next feature. We’re all guilty of this. Day by day that bill grows.

tick, tock..

Another day, another $60. Your conviction wanes. Is this idea really any good? I’m already $1k in the hole, I can’t justify spending money on ads. Maybe I should quit. I’m so stupid.

tick, tick, tick..

It doesn’t have to be this way, dear reader. There is a better way.

The Inverse Fly Stack

The Fly guys are slick, don’t get me wrong. But what if instead of having our dev environment use Erlang distribution to talk to a cloud GPU, we instead have our prod environment use Erlang distribution to talk to the gaming PC sitting in your closet?

Head on over to Maingear.com and buy your self a prebuilt RTX 4090 rig for $4200. Since it’s a business write-off, you get to use pre-tax dollars. Pay-off period, 2.5 months of Fly GPUs. (don’t worry, I bought one already and yes you can turn off the stupid RGB lights)

Develop in peace knowing you’re past the return window and if the app fails to get product market fit, at least you OWN the machine and can start a new career as a Twitch streamer.

Then, when you’re ready to go to production, just as in Fly’s blog post, setup the wireguard tunnel to your dream machine but deploy the bumblebee-model-harness on your machine instead of the cloud. Web request goes to your cloud webserver, it delegates all machine learning tasks to the monster in your closet.

Inverse Fly Stack

Zero daily active users, zero marginal cost. That’s the scale-to-zero serverless we really want.

Best part is that while you have no users, you can still use your new machine to play video games as you contemplate whether you should start marketing your app. Better yet, you can finally put a cost to your gaming addiction. It’s about $2.50/hr.

Hell, since your app won’t fully utilize that 4090 for the first year anyways, you can host multiple project’s models on it concurrently. See which one takes off, double down.

Local Power Outage? FLAME’s got you covered

As I said, the Fly guys are slick. They (Chris McCord) even built a library to give your homebrew hacker setup resiliency. Wrap up your bumblebee-model-harness in a FLAME pool with min: 1. If your dream machine ever goes offline, the FLAME pool manager running in your cloud web node will deploy a backup node on Fly GPUs automatically.

God willing, your app is a smash hit. Out scale that single 4090? Hopefully you monetized, because either way, FLAME will autoscale up those pricey cloud GPUs if you let it.

Infinite scale unlocked.
Scale-to-zero* infrastructure.
Tax-free gaming PC.

Owning your shit is underrated.

I will own my GPUs. I will not eat the bugs. I will be happy.

postscript.

Fly guy is alright, I opportunistically chose them as my strawman so I could write the post I’ve always wanted to write.