Update: We're migrating Ghost Narrator's default TTS from Fish Speech to Qwen3-TTS (Apache 2.0) for full commercial licensing. The architecture and cost savings are identical — only the TTS model changes. We'll update this post when the swap is complete.
We publish about 200 blog posts a month on Founder Reality. Every post gets a narrated audio version — you can listen instead of read.
The obvious solution was ElevenLabs. Best-in-class voice cloning, simple API, great output. $330/month for the scale we needed.
We almost signed up. Then we tried something else.
We ran an open-source model on a laptop.
Qwen 3.5 14B for script rewriting, Fish Speech for voice cloning. The whole pipeline running locally. No API calls. No per-word billing. No rate limits.
It sounded fine. Not perfect. Fine. Good enough that nobody listening would think twice.
So we built it into a system, and today we're open-sourcing it.
It's called Ghost Narrator.
Github Link: https://github.com/getsimpledirect/workos-mvp
What It Does
Ghost Narrator turns written blog posts into narrated audio — automatically.
No manual steps. No copy-pasting into a TTS tool.
Here's the flow:
- A new article gets published
- An automated workflow triggers
- A local LLM (Qwen 3.5 14B) rewrites the article into a natural narration script — removing headers, links, formatting artifacts, and anything that sounds wrong when spoken aloud
- An open-source voice cloner (Fish Speech) generates audio using a cloned voice
- The audio file gets attached to the post
The entire pipeline runs on hardware you own. Nothing leaves your machine.
How It Works
Ghost Narrator has two core pieces working together.
The first is a local LLM — we use Qwen 3.5 14B running through Ollama. When a new article comes in, the LLM rewrites it into something that sounds natural when spoken.
Blog posts are written for eyes — they have headers, bullet points, hyperlinks, parenthetical asides. None of that works as audio.
The LLM strips all of it out and produces a clean narration script that flows like someone telling you a story.
The second is Fish Speech, an open-source voice cloner. You give it a few minutes of sample audio — in our case, my voice — and it generates speech that sounds like that person. It runs locally, no API, no cloud processing.
The cloned voice reads the narration script, and out comes an audio file.
Ghost Narrator ties these together into a single pipeline. Article goes in, audio file comes out. That's it.
The Cost Comparison
This is the part people keep asking about.
ElevenLabs costs $330/month on their Scale plan. That's $3,960 a year, billed per word, with rate limits on how fast you can generate. Your audio gets processed on their servers. You need an API key and an active subscription to keep it running.
Ghost Narrator costs you electricity. No per-word billing. No rate limits. Nothing leaves your machine. You set it up once — maybe an hour or two — and it runs on hardware you already own.
The honest trade-off is voice quality. ElevenLabs sounds better right now. Their models are ahead of open-source on naturalness and emotion.
If you're producing audiobooks or premium podcasts, you probably still want a commercial solution.
But open-source TTS models are improving fast. What sounds "good enough" today will sound noticeably better in six months.
That's the thing about self-hosting — the architecture stays the same. When a better model drops, you swap it in. The pipeline doesn't change. The quality just goes up.
The other lever is hardware. Bigger machines can run bigger models. A 16GB M1 MacBook runs the lightweight voice models fine.
If you're finding this useful, I send essays like this 2-3x per week.
·No spam
A Mac Mini M4 Pro with 64GB can run the larger, higher-quality models that produce noticeably better output. Same pipeline, same setup — just more room for the model to work with.
We plan to keep improving the voice output as better models get released and as we test on better hardware, and we'll document every upgrade on this blog.
If you set up Ghost Narrator today and follow along, your output quality improves over time without rebuilding anything. Better model, better hardware, better voice. The architecture stays the same.
For blog narration at scale — 200 posts a month, consistent voice, quality that's already past the threshold of "nobody notices" — self-hosted wins on economics and independence.
We stopped thinking about per-word costs. We stopped worrying about API rate limits during batch processing. And we stopped sending our content to someone else's servers.
What Hardware You Need
We run this on a MacBook Pro M3 with 36GB RAM. That's more than enough.
The Mac Mini M4 Pro (64GB) is the sweet spot if you want a dedicated narration server. It runs everything with room to spare. We're moving our production setup to one.
Minimum specs: an Apple Silicon Mac (M1 or later) with 16GB+ RAM, or a Linux box with a decent GPU (RTX 3090 or better). About 50GB of free disk space for models.
This isn't a cloud GPU situation. This runs on hardware you'd buy once and keep on your desk.
Who This Is For
Content teams publishing at scale who are tired of per-word TTS pricing. Developers who want a working example of a self-hosted LLM + voice cloning pipeline running locally. Anyone who looked at their ElevenLabs bill and thought "there has to be another way."
If you publish 10 blog posts a month, ElevenLabs is probably fine. If you publish 100+, the math changes fast.
What This Isn't
This isn't a polished product. It's an open-source project released as-is.
There's no one-click installer. There's no support team. There's no guaranteed uptime. You'll need to be comfortable with Docker, command-line tools, and reading a README.
We built this for ourselves. We're releasing it because people kept asking how we did it. If it's useful to you, great. If it breaks, you get to keep both pieces.
MIT License. Do whatever you want with it.
Try It
Listen: Every article on Founder Reality has a narrated version generated by Ghost Narrator. Hit play on any post and you're hearing it.
When we publish a new article, it'll be entered in a queue for narration. The latest blog posts may not have narration right away.
Clone it: https://github.com/getsimpledirect/workos-mvp
Run it: The README has everything you need. Prerequisites, setup, configuration.
What's Next
Ghost Narrator is the first project we're open-sourcing from SimpleDirect. More are coming.
We're building toward a full sovereign AI stack — models, tools, and infrastructure that run on hardware you own. No API dependencies. No monthly subscriptions. No one else's servers. As we replace more paid tools internally, we'll open-source the replacements and write about the process here.
If that interests you, follow along:
- Twitter: @TheGeorgePu
- GitHub: github.com/getsimpledirect
- Blog: founderreality.com/blog

