We earn commissions when you shop through links on this site — at no extra cost to you. Learn more

Back to all essays
Own Your Tech

What fine-tuning actually costs (it's not what you think)

·6 min read
George Pu
George Pu$10M+ Portfolio

27 · Toronto · Building to own for 30+ years

What fine-tuning actually costs (it's not what you think)

Training an AI model is assumed to cost millions of dollars.

It's the single most common misconception in the space, and it's wrong by roughly two orders of magnitude for the activity most people actually want to do.

This post is a short, concrete breakdown of what fine-tuning actually costs in 2026, what it doesn't cost, and where the real spend lives.

I'm writing it now because 'how much does this cost' is the first question I get whenever I mention that I'm training a model, and the answer is important enough to be its own post rather than a footnote.

Three different activities, three different costs

When people say 'training an AI model,' they're usually blurring together three different things with wildly different price tags.

Pretraining a foundation model from scratch.

This is what OpenAI, Anthropic, Google, and Meta do. You take raw internet-scale data, trillions of tokens, and train a new model from a random starting point.

This genuinely costs $100 million or more for the current frontier, and it needs a real data center with thousands of GPUs running for weeks.

A handful of companies on earth can afford it. If you're a founder, you're not doing this. You don't need to.

Fine-tuning an existing open-weight model.

This is what I'm doing.

You take a model someone else already pretrained, Qwen 3.5, Llama, Mistral, Gemma, and teach it to be better at a specific thing.

You're not building a new brain. You're giving an existing brain a specialty.

This costs between $100 and $10,000 depending on model size, dataset size, and how many iterations you run. I'll get into the specific numbers below.

Running the model (inference).

Once you have a trained model, using it costs cents per query on consumer hardware, less on optimized servers.

Local inference on a Mac is effectively free after the electricity bill.

When a headline says 'training AI costs millions,' it's talking about the first category.

Almost no one reading this is doing the first category.

The second category is where the actual work happens, and the second category is credit-card money.

What fine-tuning actually costs, line by line

The compute cost is the one everyone fixates on, so start there.

A fine-tuning run on a 4-billion-parameter model, using modern techniques, on 50,000 training examples, takes roughly 8 to 15 hours of single-GPU time.

The GPU I'm using is an NVIDIA L40S on a Canadian cloud provider.

I'm keeping training infrastructure on Canadian soil because I want the data residency to match the story I'm telling about sovereign local AI.

The L40S rents for $1.57 per hour, which is similar to what American hyperscalers charge for the same hardware.

Do the math. A single full training run on my setup is $13 to $24. A smoke test uses a smaller subset to catch pipeline bugs. It costs about $2.

That is not a typo. The compute for a real fine-tuning run of a capable 4B model is cheaper than lunch.

Where does the cost actually go? Four places, in descending order of how much they hurt.

Data preparation.

This is where most of the real spend lives.

Putting together a training dataset, sourcing open data, filtering garbage, deduplicating, generating synthetic examples where needed, validating those examples, spot-checking quality, is real work.

If you're doing it yourself, it's time.

If you're paying for human graders or domain experts to label data, it's money.

Realistically, for a serious fine-tuning project, data prep is $3,000 to $8,000 of labor equivalent.

Most of that isn't dollars on a cloud bill. It's the opportunity cost of however long you or your engineer spends in the weeds with a dataset.

Iteration and failed runs.

You won't get the model right on the first try.

You'll run it, evaluate it, notice it's weak on something, adjust the data mix or the training setup, and run it again.

Budget six real runs before you have something worth shipping.

At roughly $20 per run, that's about $120 in compute across the iteration cycle.

The time cost, each run plus eval takes a day or two to analyze, is larger than the dollar cost.

Evaluation infrastructure.

To know whether your model is actually better, you need benchmarks.

Some are free and open-source.

Some require paid API access to a judge model or human graders.

Budget $1,000 to $2,000 here for a project that evaluates honestly.

Engineering debug time.

This is the silent line.

Want the full playbook? I wrote a free 350+ page book on building without VC.
Read the free book·Online, free

The first time you fine-tune a model, you will hit framework bugs, tokenizer mismatches, chat template drift, quantization failures, and half a dozen other landmines.

The compute cost of debugging is small.

The time cost can be two weeks if you're unlucky. I've been paying in research hours for the last three weeks specifically to avoid paying in debug hours next month.

What I'm actually spending

I'll publish exact numbers after each run, because this series is about showing the work. Here's what I know so far as of April 20.

Research, the last three weeks: about $200 in API credits and tool subscriptions. Time: I spent roughly 60 hours across that window. That's the research phase.

Smoke test, next week: projected under $10. A few hours on an L40S, plus a small amount of debugging margin.

First real training run, early May: projected $15 to $30. Full dataset on the 4B model.

My best estimate for the full project through September: $8,000.

The breakdown is roughly $200 in total compute across all runs, $5,000 in data curation tools and labor equivalent, $1,500 in evaluation infrastructure and paid benchmarks, and $1,300 in margin for the things I haven't thought of.

If I'm wrong by more than 50% in either direction, I'll write the post explaining what I missed.

The bands, for calibration

If you're trying to place a dollar figure on a fine-tuning project you're considering, here's the rough ladder.

  • Smoke test on a 4B model, 1,000 examples: Under $10 in compute. A few hours. The goal isn't a good model. The goal is to prove the pipeline works end to end.
  • Real run on a 4B model, full dataset: $15 to $30 in compute. A single day of wall-clock time.
  • Real run on a 9B model, full dataset: $40 to $80 per run in compute. Maybe $500 to $2,000 across an iteration cycle of five to ten runs.
  • Full fine-tuning project, including data prep, evaluation, and multiple iteration cycles: $5,000 to $15,000 all-in for a 4B. $10,000 to $30,000 for a 9B. Most of it is not the GPU bill.
  • Multi-node training of a 70B+ model: $50,000 to several hundred thousand, depending on how long you run it and across how many GPUs. Still not millions. But not weekend money.
  • Pretraining a new foundation model from scratch: $100 million and up. Do not try this.

The point of publishing these bands is that the jump from 'fine-tuning a useful 4B model' to 'pretraining GPT-5' is four orders of magnitude.

People collapse all of that into 'training AI costs a fortune' and give up before they start. It's wrong.

The real moat

If fine-tuning a useful small model costs thousands of dollars, and if anyone can rent the compute for the price of lunch, then the obvious follow-up question is: what's the moat?

It isn't compute. It stopped being compute around 2023.

The moat is data, taste, and patience.

Data, because the model is only as good as what you train it on.

Sourcing, cleaning, balancing the mix across capabilities, catching the biases in synthetic examples, this is where quality is actually decided.

Most teams underinvest here by a factor of three.

Taste, because fine-tuning is a series of judgment calls that don't show up in benchmarks.

Which base model fits your use case?

When a benchmark score drops two points, is that real regression or test-set noise? None of these have mechanical answers.

Patience, because the feedback loop is long.

A real training run is a day. A full eval pass is hours.

You iterate in days, not minutes.

If you're someone who needs to see a result every hour to stay motivated, fine-tuning will break you.

None of those three are things that money buys you. Compute stopped being the bottleneck. Quality did.

This is the current plan. It might change.

The numbers above are my best calibration as of April 20, 2026. A few things could move them.

GPU prices are still moving.

If a new generation of consumer or enterprise hardware lands in the next six months, the per-hour rate could drop and the total shifts with it.

If demand for inference compute keeps growing faster than supply, rates could rise.

Training techniques are still improving.

Every few months, someone publishes a method that cuts memory usage or speeds up convergence, and the effective cost per useful-model-produced drops. That's been the trend for three years.

My own numbers will also move as I go. I might find data prep costs more than I projected.

I might find my first run worked and I don't need the full iteration budget. I'll update when I know.

The point of the broad bands isn't to lock in a specific dollar figure. It's to move the conversation off the assumption that this is a millionaire's hobby. It isn't. It's a credit card and a few weekends. That changes who gets to build.