Back to all essays
George's Takes

I Don't Trust My Own AI-Generated Code (And Neither Should You)

·7 min read
George Pu
George Pu$10M+ Portfolio

27 · Toronto · Building businesses to own for 30+ years

I Don't Trust My Own AI-Generated Code (And Neither Should You)

Honest question: If a repo is genuinely important to you, something you own, something you'd bet on—do you trust 100% AI-written code without manual review?

I don't. Not yet.

Maybe I'm wrong. But I'm not betting my business on "maybe."

Claude writes 3,000 lines of code. I feel worried when I haven't read all 3,000 lines.

Ship it anyway. It works.

But "it works" and "I understand it" are two different things.

That gap bothers me more than it should.

The AI Productivity Paradox

Everyone's celebrating AI coding:

  • "10x developer productivity with Claude!"
  • "Ship features in hours, not weeks!"
  • "AI wrote my entire app!"
  • "The future of development is here!"

The narrative: AI makes coding so fast that manual review is unnecessary overhead.

My reality: AI makes coding so fast that I can't keep up with understanding what it built.

The uncomfortable truth: I'm more productive than ever, and more anxious than ever.

My Daily AI Workflow

Morning reality check:

  • Open Cursor
  • Describe what I want to build
  • Watch Claude write 500-1,000 lines in 10 minutes
  • It compiles. It runs. Tests pass.
  • Ship to production

Evening anxiety:

  • What exactly did I ship?
  • How does that authentication flow work?
  • Why did it choose that database structure?
  • What happens if this edge case hits?
  • Do I actually own this codebase anymore?

The productivity is real. The understanding gap is also real.

The "It Works" Trap

Last month, Claude built our payment integration:

  • 3,200 lines of code
  • Stripe webhooks, error handling, retry logic
  • Database migrations, API endpoints, UI components
  • All generated in about 4 hours
  • Deployed the same day

The results:

  • Zero bugs in production (so far)
  • Customers can pay seamlessly
  • Processing $28K+ per month through it
  • Feature that would have taken me 3 weeks

The problem:

  • I understand maybe 60% of what it built
  • The webhook retry logic is sophisticated but opaque
  • Error handling covers cases I didn't even think of
  • Some design patterns I wouldn't have chosen

It works perfectly. I trust it partially.

What Bothers Me Most

It's not that AI code is bad. It's often better than what I would write.

It's not that it doesn't work. It usually works flawlessly.

It's that I feel like a passenger in my own codebase.

Specific anxieties:

1. The Bus Factor Problem If something breaks at 2 AM, can I debug code I didn't write and don't fully understand?

2. The Technical Debt Question AI optimizes for "working now," not "maintainable later." What's the long-term cost?

3. The Security Blind Spot I can spot obvious vulnerabilities, but what about subtle ones in 3,000 lines of AI code?

4. The Architectural Drift AI makes choices I wouldn't make. Over time, does the codebase become alien to me?

The core fear: Am I still a developer, or just a very sophisticated product manager for AI?

The Manual Review Dilemma

Here's my process now:

Step 1: AI Generation (10 minutes)

  • Describe feature to Claude
  • Review high-level approach
  • Generate initial implementation

Step 2: Manual Review (2-4 hours)

  • Read every line of generated code
  • Understand architectural decisions
  • Validate security implications
  • Check edge case handling
  • Modify anything that doesn't feel right

Step 3: Testing and Refinement (1-2 hours)

  • Write additional tests for edge cases
  • Stress test with real data
  • Document non-obvious decisions
  • Ship with confidence

Total time: 4-6 hours instead of 3 weeks

The question: Is Step 2 paranoia or prudence?

The Trust Spectrum

Code I trust 100%:

  • Simple CRUD operations
  • Basic UI components
  • Standard integrations I've done before
  • Anything under 100 lines that I can fully grok

Code I trust 80%:

  • Complex business logic with good test coverage
  • API integrations with proper error handling
  • Database migrations with rollback plans
  • Features I've manually reviewed line by line

Code I trust 60%:

  • Sophisticated algorithms I don't fully understand
  • Security-critical functions with many edge cases
  • Performance optimizations using patterns I'm unfamiliar with
  • Anything over 1,000 lines generated in one session

Code I don't trust:

  • AI-generated code I haven't reviewed
  • Complex integrations shipped without testing
  • Security features I can't explain
  • Anything I'd be embarrassed to explain to a senior developer

The problem: AI pushes everything toward the "don't trust" category by default.

Real Examples of AI Overconfidence

Example 1: The Elegant Disaster

Claude built a caching system with Redis that was architecturally beautiful. Sophisticated key invalidation, perfect cache hierarchies, elegant code.

Problem: It cached user permissions indefinitely. Security vulnerability I only caught during manual review.

AI was optimizing for performance. I needed to optimize for security.

Example 2: The Premature Optimization

Asked for a simple search feature. Claude built an advanced full-text search with ranking algorithms, autocomplete, and fuzzy matching.

Problem: We had 47 total records. A basic SQL LIKE query was perfect. The sophisticated solution added 800 lines of unnecessary complexity.

AI was optimizing for "best practice." I needed to optimize for "right now."

Example 3: The Perfect Pattern Wrong Place

Claude implemented a beautiful event-driven architecture for user notifications. Clean separation of concerns, proper abstractions, textbook implementation.

Problem: We send 12 notifications per day. The complexity overhead wasn't worth it.

AI was optimizing for "enterprise scale." I needed to optimize for "startup reality."

The pattern: AI defaults to sophisticated solutions. Sometimes simple is better.

When I Skip Manual Review (And Regret It)

The temptation is real:

  • Deadline pressure
  • "It's just a small feature"
  • "AI code usually works fine"
  • "I can fix bugs if they come up"

Recent skip that bit me:

Simple email template feature. Claude generated 200 lines. Looked reasonable. Shipped without full review.

Three days later: Customer emails weren't sending. Buried in line 127 was a hardcoded development email address.

Cost: 3 hours of debugging + customer confusion + my credibility

Time I would have saved by reviewing: 30 minutes

The math: Skipping review saves 30 minutes, costs 3 hours when it goes wrong.

The Contrarian Take

Everyone says: "AI makes developers 10x faster, just trust it and ship"

I say: "AI makes developers 10x faster, which makes manual review 10x more important"

The reasoning:

1. Velocity Amplifies Impact When you can ship 10x faster, your mistakes also compound 10x faster.

2. Complexity Grows Silently AI can build sophisticated systems you couldn't build alone. But you still have to maintain them alone.

Want the full playbook? I wrote a free 350+ page book on building without VC.
Read the free book·Online, free

3. Understanding Enables Evolution Code you don't understand is code you can't improve. You become dependent on AI for every change.

4. Debugging Requires Comprehension When AI code breaks (and it will), you need to understand what it was trying to do to fix it.

The uncomfortable truth: The faster AI makes you, the more important it becomes to slow down.

My Current Compromise

For features that matter (80% of work):

  • Use AI for initial implementation
  • Mandatory line-by-line review
  • Refactor anything I don't understand
  • Add comments explaining non-obvious decisions
  • Write additional tests for edge cases

For throwaway code (20% of work):

  • Let AI write it
  • Ship with minimal review
  • Accept that I might not understand it fully
  • Plan to rewrite if it becomes important

The rule: Never ship AI code I wouldn't be comfortable debugging at 2 AM.

What This Means Long-Term

The optimistic view: AI gets better at explaining its decisions. Code becomes more self-documenting. Trust gap closes over time.

The pessimistic view: Developers become AI prompt engineers. Deep coding skills atrophy. Technical debt accumulates faster than understanding.

My view: AI is an incredible force multiplier, but it's still a tool. The developer's job is to wield it responsibly.

The skill that matters most now isn't writing code. It's reading code—especially code you didn't write.

Questions I Ask Myself

Before shipping AI-generated code:

  1. Can I explain this to a junior developer? If not, I don't understand it well enough.
  2. Would I be comfortable debugging this at 3 AM? If not, I need more review.
  3. If this breaks, will I know where to look first? If not, I need better documentation.
  4. Is this the simplest solution that works? If not, AI might be over-engineering.
  5. Do I trust this with customer data? If not, security review required.

If any answer is "no," I spend more time with the code before shipping.

The Practical Middle Ground

What I've learned after 18 months of AI-first development:

Do trust AI for:

  • Boilerplate and repetitive patterns
  • Well-established integrations
  • Code following familiar patterns
  • Non-critical features with good test coverage

Don't trust AI for:

  • Security-critical implementations
  • Novel architectural decisions
  • Performance-critical code
  • Anything touching customer data without review

Always review:

  • Authentication and authorization
  • Payment processing
  • Data validation and sanitization
  • Error handling for edge cases
  • Database schema changes

The goal: Use AI to go faster while maintaining the same quality standards I'd have for hand-written code.

What Other Developers Say

When I share this anxiety, responses split:

The "Just Ship It" Camp: "You're overthinking it. AI code works fine. Manual review is waste."

The "Trust But Verify" Camp:
"Same feelings. I review everything critical, ship everything else."

The "AI Skeptic" Camp: "This is why I don't use AI. I want to understand every line."

Interesting pattern: The developers I respect most are in the middle camp. They use AI heavily but maintain high standards for understanding their codebase.

The Uncomfortable Questions

For the industry:

If everyone ships AI code they don't fully understand, what happens when AI gets something fundamentally wrong?

If debugging AI code requires understanding how AI thinks, are we training a generation of developers who can't debug?

If technical debt compounds faster than human understanding, do we end up with unmaintainable codebases?

For individuals:

Am I still improving as a developer if AI is writing most of my code?

What's the difference between being productive and being dependent?

How much code can I not understand before I'm no longer really a developer?

I don't have good answers. But I think asking these questions matters.

My Current Rules

1. The 1000-Line Rule If AI generates more than 1000 lines in one session, mandatory full review before shipping.

2. The Critical Path Rule Anything touching payments, auth, or customer data gets line-by-line review regardless of size.

3. The Explain-It Test If I can't explain the architectural decisions to a teammate, I need to understand it better.

4. The 2 AM Rule Never ship code I wouldn't be comfortable debugging at 2 AM when I'm tired.

5. The Six-Month Rule Could I understand and modify this code if I came back to it in six months? If not, add comments.

Why I'm Sharing This

Because everyone's talking about AI productivity gains, but nobody's talking about AI comprehension gaps.

Because "it works" isn't the same as "I trust it."

Because the developer community needs to have honest conversations about the tradeoffs.

Because speed without understanding might be creating technical debt we don't even recognize yet.

I might be wrong. Maybe full trust in AI code is the future.

But for now, I'm betting my business on code I understand, not just code that works.

The gap between those two things bothers me. And I think it should.