A lawyer friend — call him Mark — called me this week.
He and another friend had spent the weekend trying to run an 8 billion parameter language model on a 16 gigabyte laptop.
Mark thought he was going to show his friend something impressive.
The output was gibberish. Incoherent strings of text that no junior associate would have signed off on.
He'd come to me because he wanted to know what hardware to buy next.
On the call, I didn't know.
What I did know was NVIDIA's software stack.
I'd worked with it enough to be confident that with the right GPUs and a properly quantized model, you could probably get something coherent out the other end.
So I told him exactly that.
I can't tell you it'll work for sure. But knowing what I know about CUDA and Hopper-class inference, I think it probably will. Let me look at the actual numbers and get back to you.
We hung up. I opened a spreadsheet.
The math was brutal.
I called him back and told him to buy nothing.
The pattern shows up everywhere now.
A small firm partner reads about sovereign AI.
They hear that owning your own infrastructure means controlling your data.
They Google "self-host LLM," watch a YouTube tutorial, buy or rent some hardware, and discover they've built something that costs 30-50x more than a managed Claude subscription and works worse.
The math has moved in the past twelve months in ways most professional services buyers haven't tracked.
The Numbers I Walked Him Through
Mark wanted to rent GPUs to serve his firm's lawyers. He'd been pricing this out.
Here's roughly what the market looks like in May 2026 for a small operator trying to serve a handful of users.
Renting 2 NVIDIA H200 GPUs from Telus: $9,000 per month.
Telus is Canada's main sovereign AI infrastructure provider.
Those two GPUs can run a 70 billion parameter model and serve maybe 10 lawyers under typical legal-research load.
A single NVIDIA L4 GPU: $800 per month.
The L4 can serve 3-5 lawyers running 7B-class models.
Quality is meaningfully below what those lawyers already see on Claude or ChatGPT.
Claude Team: $20 per lawyer per month.
That is not a typo. Claude Team standard seats are $20 per seat per month on annual billing.
They include a Data Processing Agreement that explicitly states customer data is not used to train models. They cover heavy day-to-day usage for a practicing lawyer.
So for the same firm, with the same lawyers, doing the same work:
Sovereign GPU rental: $9,000/month for 10 lawyers = $900/lawyer/month.
Self-hosted single GPU: $800/month for 4 lawyers = $200/lawyer/month, with worse model quality.
Managed frontier model with DPA: $20/lawyer/month.
That is a 30-50x premium to self-host, in exchange for sovereignty most of these firms don't legally need.
Why The Gap Is Widening
Mark's question — "but what if I want to own it?" — would have been more defensible 18 months ago.
Today it is less defensible, because two trends are moving in opposite directions.
API prices are falling fast.
Claude 3 Opus, the flagship model in early 2024, was priced at $15 per million input tokens and $75 per million output tokens.
Today, Claude Opus 4.7 — far more capable than the old Opus — which I use every day — sits at $5 input, $25 output.
That is a 67 percent reduction on the flagship tier.
Layer in prompt caching and batch processing and the effective discount runs higher still.
The driver isn't generosity. It is model architecture efficiency, inference optimization, and competition. The trend is continuing.
GPU rental prices are not.
The SemiAnalysis H100 rental price index — which tracks one-year contract prices for production-grade GPUs — hit a low of $1.70 per hour per GPU in October 2025.
By March 2026, that same contract price had rebounded to $2.35 per hour. A 40 percent increase in five months.
The driver is Blackwell-era demand pulling committed Hopper-generation capacity away from the rental market. This continues at least through 2026.
What it means in plain terms: if you sign a five-year contract today to operate your own sovereign GPU infrastructure, you are locking in a cost basis that will look more expensive every quarter as your API competitors get cheaper.
You will spend the next five years explaining to your firm's managing partner why your monthly bill is going up while ChatGPT's is going down.
When Owning Actually Makes Sense
There are buyers for whom this math reverses.
Most discussions of sovereign AI either pretend the use case is universal or pretend it doesn't exist at all. Both are wrong.
Defense-adjacent work.
If your firm's contracts with the federal government, the Department of National Defence, or analogous American agencies prohibit cloud APIs for processing controlled information, you cannot use Claude or ChatGPT regardless of price.
You must operate within an approved sovereign environment.
For this work, $9,000 a month for two H200s is genuinely cheaper than alternatives.
Classified or controlled-information contexts.
If data leaving your physical premises is itself the breach, no API is acceptable. You need on-premises or air-gapped deployment.
Want the full playbook? I wrote a free 350+ page book on building without VC.
Read the free book·Online, free
This is a much smaller market than people imagine.
Regulated industries with explicit processing-locality requirements.
Ontario's Personal Health Information Protection Act creates an effective processing-locality requirement for health information processed by Canadian-controlled entities.
Quebec's Law 25 creates similar requirements for personal information. Some financial sector regulations push in this direction, though OSFI has not mandated processing locality as a hard rule.
Outside these three groups, the sovereignty case is risk-management preference, not statutory requirement.
Canadian Bill C-27 and the Artificial Intelligence and Data Act died at parliamentary prorogation in 2025. As of May 2026, Canada has no AI law.
Private law firms have no hard regulatory mandate to use sovereign infrastructure for client data beyond what PIPEDA requires, which is "adequate safeguards" — a bar that Anthropic's commercial Data Processing Agreement clears.
If you are a private law firm partner reading this and you do not have classified contracts or PHIPA-regulated client work, you do not have a statutory sovereignty requirement.
You have a preference.
That preference might be worth paying a 30-50x premium for. It might not. But you should know which conversation you are having.
What About Cohere?
The reasonable next question is: what about Cohere?
Cohere is the Canadian frontier AI company. They build their own models, offer Canadian deployment, and their enterprise pricing is competitive with Anthropic and OpenAI.
For a Canadian firm whose sovereignty preference is "I want my AI vendor to be Canadian," Cohere is the closest match without taking on infrastructure yourself.
Cohere does not change the self-hosting math.
It is another managed service.
Pricing sits in the same order of magnitude as Claude Team or ChatGPT Enterprise.
Tens of dollars per user per month, not thousands.
If Canadian provenance matters more to you than absolute model capability, Cohere is the honest answer.
If you want the strongest model your data can legally touch, that's still Claude or GPT-5 with a DPA.
Either way, you are choosing between managed services.
The self-hosted option is still 30-50x more expensive than both.
The Decision Most Firms Should Actually Make
When Mark asked me what hardware to buy, the real answer was: nothing.
For most small and mid-size law firms in 2026, the right deployment is a managed frontier model — Claude, GPT-5, or Cohere — with a Data Processing Agreement and Canadian data residency where the provider offers it.
Cost is $15-50 per active user per month depending on usage.
Quality is best-in-class. Compliance posture is documented. Switching costs are near zero if a better option appears.
The firms that should self-host are the ones with hard sovereignty mandates.
That's a small fraction of the Canadian legal market — almost all of it concentrated in firms that already have IT departments capable of running infrastructure.
The firms that should not self-host are the firms reading articles about sovereign AI and wondering whether they should.
If you are wondering, the answer is probably no.
What Mark Actually Did
Mark put three of his lawyers on Claude Pro for two weeks as an evaluation. Total pilot cost: $60.
The output cleared the bar. He migrated the firm onto Claude Team to get the DPA in place — about $200 a month for ten seats.
The more interesting thing is what happened after.
Mark has a handful of other lawyer friends in small practices wrestling with the same question.
He has become the playbook for them. Two of them are running pilots now using the same setup. None of them are buying hardware.
The conversation Mark was trying to have on our first call — about GPUs and sovereign hardware and which model is closest to Claude — was the wrong conversation.
He didn't have an infrastructure problem.
He had a math problem.
Most small firms have a math problem, not an infrastructure problem. The math gets worse every month.
The firms that figure that out fastest will spend the next decade compounding the difference.

