Back to all essays
Own Your TechResources

How I'm Building Multi-Cloud (Before Spending a Dollar)

·7 min read
George Pu
George Pu$10M+ Portfolio

27 · Toronto · Building to own for 30+ years

How I'm Building Multi-Cloud (Before Spending a Dollar)

This is a follow-up to my last post about cloud lock-in. That piece was about the philosophy — why we don't go deep on any single provider's managed services.

This one is about what happens next. You've decided you don't want to be locked in. Great. Now what?

I'll be honest — I expected this part to be straightforward. Pick a few providers, compare prices, split the workload. Done.

It wasn't like that at all.

Some Context

I run a small company. Three people.

We're building AI models that need serious GPU compute — the kind of hardware that costs thousands of dollars a day to rent.

We're bootstrapped. Every dollar matters.

We don't have the luxury of picking the most expensive provider because it's "safe."

But we also can't afford to pick a cheap provider that turns out to be unreliable or can't actually deliver what it advertises.

And we've already lived through a month-long cloud migration that taught me what happens when you go all-in on one provider and then need to move. I wrote about that in the last post.

So this time, the plan was simple: build multi-cloud from day one. No single provider holds more than half our spend.

If one disappears tomorrow, we shift workloads to the others within hours.

That's the principle. Making it real is where it gets humbling.

What Nobody Tells You About Cloud Provider Research

I spent the better part of two weeks doing this research. Calls with sales teams. Digging through documentation. Spinning up test instances. Reading the fine print.

And the single biggest thing I learned is this: pricing pages lie by omission.

Every provider shows you headline GPU rates. Clean numbers. Nice comparison tables.

What almost none of them tell you — at least not anywhere obvious — is which GPUs are available in which data centers.

This nearly cost us a five-figure mistake. More on that in a minute.

The Five Providers We Evaluated

We looked at five options. Here's what we found — the real version, not the marketing version.

DigitalOcean — The Surprise

I'll admit I had a bias going in. DigitalOcean, to me, was the place you host a side project in college. Small apps. Developer tools. Not serious AI compute.

I was wrong.

They launched GPU instances in late 2024, and their Toronto data center now has H100 GPUs — the same chips AWS and Google charge $6-10 an hour for — at $2.99.

I actually double-checked this. Then checked again. Then spun up an instance to make sure it was real.

It's real.

They also have cheaper options for experimentation.

L40S cards at $1.57 an hour. RTX 4000 Ada at $0.76 an hour — less than a dollar for a GPU you can prototype on. All in Toronto.

The catch is capacity. There are Hacker News threads from other users reporting that DigitalOcean's Toronto GPUs sell out. That when they tried to create a GPU instance, it wasn't available.

This is the tradeoff with smaller providers. The price is incredible. The availability is uncertain.

We're testing this right now. If TOR1 has consistent capacity, DigitalOcean becomes our primary provider. If it doesn't, our entire cost model changes overnight.

Google Cloud — The Expensive Safety Net

Montreal data center. Massive capacity. Documentation so clean it makes accountants happy.

H100 at $9.80 an hour.

That's roughly 3x what DigitalOcean charges for the same chip.

I've had good conversations with the GCS team. They know what they're offering and they know the premium is steep.

The pitch is: you're paying for scale, reliability, and the ability to spin up 32 or 64 GPUs working together across multiple machines for large training runs.

For a single-machine job, GCS makes no financial sense for us.

But for the larger models we're planning to train later — the ones that need dozens of GPUs coordinating with high-speed interconnects — GCS might be one of the only options.

That's the uncomfortable reality of multi-cloud at our scale.

The cheap provider handles the small jobs. The expensive provider handles the jobs that actually matter most.

AWS — Similar Story, Better People

AWS is in the same bucket as Google for us. Strong infrastructure, strong documentation, strong capacity. Pricey.

H100 at around $6.50 an hour. Cheaper than Google, more expensive than DigitalOcean.

What sets AWS apart, honestly, is the people. I mentioned in the last post that an AWS team member once told me my architecture was outdated and saved us thousands.

That wasn't a one-off. Their customer success team genuinely tries to help you optimize, not just upsell.

But the platform incentives are the same as Google's. The more services you touch, the harder it is to leave. So we treat AWS the same way — commodity infrastructure only, kept in reserve for large-scale jobs.

OVHcloud — The One That Almost Got Us

This is the story I want other founders to hear.

OVHcloud is a French company with data centers in Beauharnois, Quebec. On paper, great. Canadian data residency. Established company. Competitive pricing.

Their website shows H100 GPUs. The price looked reasonable. We were ready to allocate budget.

Then I dug into region-specific availability.

Their Canadian data center only has V100 and V100S GPUs. Previous-generation cards. Fine for light experimentation. Not viable for the kind of training we need to do.

The H100s on their pricing page? Gravelines, France.

Not Canada.

If we hadn't checked — if we'd just trusted the headline pricing and signed up — we would have committed budget to a provider that literally cannot deliver what we need in the country we need it in.

This is what I mean when I say pricing pages lie by omission. The prices are real. The GPUs are real. They're just not where you think they are.

OVHcloud is still useful to us. V100S at under a dollar an hour is fine for testing ideas cheaply before burning real money on H100s.

If you're finding this useful, I send essays like this 2-3x per week.
·No spam

But they went from "potential primary provider" to "experimentation bench" in the time it took to read one help page.

ISAIC — The Question Mark

ISAIC is a non-profit in Edmonton, connected to the University of Alberta. They position themselves as an AI compute provider, and they've been mentioned in government funding contexts.

They claim H100 access at $2.50 CAD per hour. If real, that's the cheapest option by far.

I wanted this to work. A Canadian non-profit offering cheap AI compute? That's exactly the kind of thing that should exist.

But when I started actually evaluating them like a real provider, it fell apart.

No public pricing page. No SLA. No uptime guarantees. "Best effort" service — their words, not mine.

They explicitly state they don't do backups. Portal-based VM access instead of standard APIs. Their website runs on Wix.

I'm not saying this to be harsh. They might be great for what they're designed for — university researchers running proof-of-concept projects.

But we're planning to run training jobs that cost five figures and take weeks. "Best effort" and "no backups" don't work for that.

We sent them an email asking if they can support production workloads at scale.

I genuinely hope the answer is yes, because cheap Canadian compute is something the whole ecosystem needs. But I'm not planning around it until I see proof.

Three Things I Didn't Expect to Learn

Going through this process changed how I think about cloud decisions. Three things stood out.

The cheap option might not exist when you need it.

DigitalOcean's pricing is incredible. But if their Toronto GPUs are sold out the week you need to start a training run, the price is irrelevant.

You can't use what isn't available. This is the real reason multi-cloud matters — not just philosophical risk diversification, but practical "I need a GPU right now and my primary is full."

Small teams can't afford to be wrong.

A big company can absorb a bad cloud decision and fix it next quarter. We can't.

The $200 we spent on test instances across all five providers — spinning up a machine for an hour, downloading the invoice, checking the documentation — will save us from mistakes that cost orders of magnitude more.

It felt tedious at the time. Now I think it was the most valuable research we've done this year.

Multi-cloud is a tax you pay for freedom.

Managing multiple providers is genuinely more work.

Different dashboards. Different billing formats. Different support channels. Different quirks.

There's a real operational cost to not just picking one provider and going all-in. But I've lived through the alternative — three developers, one month, zero customer value — and the tax is worth it.

Where We're Starting

Based on everything we've found:

DigitalOcean is the primary — if capacity holds. Best price, confirmed Canadian hardware, clean invoicing. All single-machine training and experimentation starts here.

GCS and AWS are the scale-up tier. When we need multi-node GPU clusters for larger models, they're likely the only realistic options. More expensive, but proven at scale.

OVHcloud is the cheap experimentation bench. Previous-gen GPUs for testing ideas before committing real money.

ISAIC is a maybe. Potentially the cheapest option, but needs to prove it can operate outside a university context.

We're running small test workloads on each provider this month.

A few hours of GPU time. Download the invoice. Check the documentation. Verify the experience matches the marketing.

Then we commit real budget.

Where I'll Be Honest

I want to be upfront about what this is and what it isn't.

We haven't run a massive multi-cloud training pipeline yet.

We haven't battle-tested failover between providers under real pressure. We haven't submitted invoices and gone through the full billing cycle with every provider on this list.

This is the research phase. The homework before the exam.

I'm sharing it because when I went looking for this kind of information — real pricing comparisons, real gotchas, real experiences from a small team trying to figure this out — I found almost nothing.

The content out there is either from enterprise teams that don't think about cost, or from cloud providers marketing themselves.

There's very little from founders actually doing the research with real constraints and real money on the line.

So this is that. Prices as of April 2026. Findings as of right now. More to come as we start spending.

The One Question That Changes Our Entire Budget

There's one question we still need answered, and it's the single biggest variable in our entire cloud strategy:

Can DigitalOcean support multi-node H100 training with high-speed interconnects in their Toronto data center?

If yes, we can run most of our training at $3 an hour instead of $7-10. Over a multi-year project, that's the difference between hundreds of thousands of GPU hours.

If no, the big training runs get pushed to GCS and AWS at 2-3x the cost. DigitalOcean becomes a tool for smaller jobs only. And our budget shrinks accordingly.

One question. One answer. Massive implications.

We're asking this week. I'll share what we find out.