testing

Testing AI-Generated Code: A Minimum Viable Test Strategy

AI tools generate code but never generate tests. Here's the practical, minimum-effort testing strategy that catches the bugs that matter before your users find them.

FinishKit TeamFebruary 22, 202612 min read

You just built an entire app with Cursor over the weekend. Auth works. Dashboard pulls real data. Payments are wired up. It looks and feels like a product. You deploy it Monday morning, feeling genuinely proud.

Tuesday afternoon, you push a "quick fix": renaming a variable, adjusting a redirect. Five minutes of work. You don't think twice.

Wednesday morning, you wake up to an email from a user: "I can't log in." You check. The auth flow is broken. Your redirect fix clobbered the callback URL. The app has been silently broken for 18 hours, and you had no idea because you have zero tests. Zero safety net. Zero way to know that a two-line change just broke the most critical flow in your app.

This isn't a hypothetical. It's the default outcome when you ship AI-generated code without tests.

The Zero-Test Default

Here's the irony that nobody talks about. AI coding tools (Cursor, Copilot, Bolt, Lovable) are extraordinary at generating features. They'll scaffold an entire CRUD app in minutes. But they almost never generate tests alongside those features.

Think about that for a second. The code that most needs testing (code you didn't write yourself, with internals you don't fully understand) gets the fewest tests. If a human engineer handed you 2,000 lines of code with zero tests and said "trust me, it works," you'd push back. But when AI does it, we just deploy it.

The data backs this up. A 2025 developer survey by Pieces found that 63% of developers reported spending more time debugging AI-generated code than they would have spent writing it manually. That's not an indictment of AI tools. It's an indictment of skipping the verification step. Debugging without tests is detective work. Debugging with tests is reading a report.

But here's the good news: you don't need 100% coverage. You don't need to become a testing purist. You need roughly 15 well-placed tests. That's it. Fifteen tests across three layers, and you'll catch the vast majority of regressions before your users do.

The Three-Layer Minimum Viable Test Strategy

Forget the testing pyramid diagrams from university. Here's the version that matters for a solo builder or small team shipping AI-built code. Three layers, each serving a distinct purpose, each easy to implement.

Layer 1: Smoke Tests (Does the App Even Start?)

This is the lowest bar. Before you worry about whether your features work, verify that your app boots. You'd be surprised how often an AI-generated import, a missing env var, or a circular dependency causes the entire app to crash silently in production.

Smoke tests answer one question: does the app render without throwing?

// __tests__/smoke.test.ts
import { describe, it, expect } from 'vitest'
 
describe('Smoke tests', () => {
  it('renders the home page without crashing', async () => {
    const response = await fetch('http://localhost:3000/')
    expect(response.status).toBe(200)
 
    const html = await response.text()
    expect(html).toContain('</html>')
  })
 
  it('returns 200 from the health check endpoint', async () => {
    const response = await fetch('http://localhost:3000/api/health')
    expect(response.status).toBe(200)
 
    const data = await response.json()
    expect(data).toHaveProperty('status', 'ok')
  })
 
  it('returns 401 from protected API routes without auth', async () => {
    const response = await fetch('http://localhost:3000/api/projects')
    expect(response.status).toBe(401)
  })
})

Three tests. Two minutes to write. They'll catch broken builds, missing environment variables, and misconfigured routes before you deploy. That last test is sneaky useful because it verifies that your auth middleware is actually running, not silently bypassed.

Layer 2: Integration Tests (Do the Critical Flows Work?)

This is where you get the real return on investment. Integration tests verify that your core business logic works end to end. Not just that individual functions return the right value, but that the pieces fit together.

Start with auth. If users can't sign in, nothing else matters.

// __tests__/integration/auth.test.ts
import { describe, it, expect, beforeAll } from 'vitest'
import { createClient } from '@supabase/supabase-js'
 
const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY!
)
 
describe('Auth flow', () => {
  const testEmail = `test-${Date.now()}@example.com`
  const testPassword = 'TestPassword123!'
 
  it('creates a new account', async () => {
    const { data, error } = await supabase.auth.signUp({
      email: testEmail,
      password: testPassword,
    })
 
    expect(error).toBeNull()
    expect(data.user).toBeDefined()
    expect(data.user?.email).toBe(testEmail)
  })
 
  it('signs in with valid credentials', async () => {
    const { data, error } = await supabase.auth.signInWithPassword({
      email: testEmail,
      password: testPassword,
    })
 
    expect(error).toBeNull()
    expect(data.session).toBeDefined()
    expect(data.session?.access_token).toBeTruthy()
  })
 
  it('rejects invalid credentials', async () => {
    const { data, error } = await supabase.auth.signInWithPassword({
      email: testEmail,
      password: 'wrong-password',
    })
 
    expect(error).not.toBeNull()
    expect(data.session).toBeNull()
  })
 
  it('accesses protected API with valid session', async () => {
    const { data: authData } = await supabase.auth.signInWithPassword({
      email: testEmail,
      password: testPassword,
    })
 
    const response = await fetch('http://localhost:3000/api/projects', {
      headers: {
        Authorization: `Bearer ${authData.session!.access_token}`,
      },
    })
 
    expect(response.status).toBe(200)
    const body = await response.json()
    expect(Array.isArray(body.data)).toBe(true)
  })
})

Then test your core API endpoints. Not every endpoint. Just the ones that create, modify, or delete data. The ones where a bug means data loss or corruption.

// __tests__/integration/projects-api.test.ts
import { describe, it, expect, beforeAll } from 'vitest'
 
let authToken: string
 
beforeAll(async () => {
  // Sign in and grab a token (helper omitted for brevity)
  const response = await fetch('http://localhost:3000/api/auth/test-token')
  const data = await response.json()
  authToken = data.token
})
 
describe('Projects API', () => {
  let projectId: string
 
  it('creates a project with valid input', async () => {
    const response = await fetch('http://localhost:3000/api/projects', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${authToken}`,
      },
      body: JSON.stringify({
        name: 'Test Project',
        provider: 'github',
        repo_url: 'https://github.com/test/repo',
      }),
    })
 
    expect(response.status).toBe(201)
    const body = await response.json()
    expect(body.data).toHaveProperty('id')
    expect(body.data.name).toBe('Test Project')
    projectId = body.data.id
  })
 
  it('rejects a project with missing required fields', async () => {
    const response = await fetch('http://localhost:3000/api/projects', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${authToken}`,
      },
      body: JSON.stringify({ name: '' }),
    })
 
    expect(response.status).toBe(400)
    const body = await response.json()
    expect(body).toHaveProperty('error')
  })
 
  it('returns the created project by ID', async () => {
    const response = await fetch(
      `http://localhost:3000/api/projects/${projectId}`,
      {
        headers: { Authorization: `Bearer ${authToken}` },
      }
    )
 
    expect(response.status).toBe(200)
    const body = await response.json()
    expect(body.data.id).toBe(projectId)
  })
})

That's about 7-8 tests covering auth and your primary CRUD operations. Combined with the smoke tests, you're already at 10-11. You can feel the safety net forming.

Layer 3: End-to-End Tests (Does the User Journey Work?)

E2E tests are the heavy artillery. They open a real browser, click real buttons, and verify the full user experience. You only need one or two of these, covering your core user journey.

// e2e/core-journey.spec.ts
import { test, expect } from '@playwright/test'
 
test('core user journey: sign up, create project, view dashboard', async ({
  page,
}) => {
  // Step 1: Navigate to the app
  await page.goto('/')
  await expect(page).toHaveTitle(/FinishKit/)
 
  // Step 2: Open auth modal and sign up
  await page.getByRole('button', { name: /get started/i }).click()
  await page.getByPlaceholder('Email').fill(`e2e-${Date.now()}@test.com`)
  await page.getByPlaceholder('Password').fill('TestPassword123!')
  await page.getByRole('button', { name: /sign up/i }).click()
 
  // Step 3: Verify redirect to dashboard
  await page.waitForURL('/dashboard')
  await expect(page.getByText(/mission control/i)).toBeVisible()
 
  // Step 4: Create a new project
  await page.getByRole('button', { name: /new project/i }).click()
  await page.getByLabel('Project name').fill('E2E Test Project')
  await page.getByLabel('Repository URL').fill('https://github.com/test/repo')
  await page.getByRole('button', { name: /create/i }).click()
 
  // Step 5: Verify project appears in dashboard
  await expect(page.getByText('E2E Test Project')).toBeVisible()
})
 
test('unauthenticated user cannot access dashboard', async ({ page }) => {
  await page.goto('/dashboard')
 
  // Should redirect to home or show auth modal
  await expect(page).not.toHaveURL('/dashboard')
})

Two E2E tests. One for the happy path, one for the auth boundary. That's all you need to start.

Total count: roughly 15 tests across three layers. Smoke tests catch catastrophic failures. Integration tests catch logic bugs. E2E tests catch UX regressions. Together, they form a safety net that lets you push code without holding your breath.

Using AI to Write Your Tests

Here's the meta move: use AI to write tests for AI-generated code. It works surprisingly well, with a few caveats.

The key is giving the AI enough context. Don't just say "write tests for my app." Show it the actual code and be specific about what you want verified.

The best prompt pattern for test generation: paste the source code, then ask for tests that verify behavior, not implementation. AI will default to testing internal details unless you redirect it toward inputs and outputs.

Good prompt:

Here's my API route for creating projects (paste code).
Write Vitest integration tests that verify:
1. A valid request returns 201 with the correct response shape
2. Missing required fields return 400 with an error message
3. An unauthenticated request returns 401
4. Duplicate project names are handled gracefully
Test the actual HTTP endpoint, not the internal functions.

Bad prompt:

Write tests for my project creation code.

The bad prompt produces tests that mock everything, test nothing real, and pass regardless of whether your code works. The good prompt produces tests that actually hit your API and verify real behavior.

Watch out for AI-generated tests that always pass. This is the most common pitfall. The AI writes an assertion like expect(result).toBeDefined(), which passes even when result is an error object. Always verify that your tests can fail. Comment out the code they're testing and run them. If they still pass, they're not testing anything.

Another trap: AI loves to test implementation details. It'll assert that a specific internal function was called with specific arguments. These tests break every time you refactor, even when the behavior hasn't changed. Test what goes in and what comes out. Ignore the middle.

Setting Up CI in 10 Minutes

Tests that only run on your machine are tests that stop running the moment you get busy. You need CI (continuous integration) that runs your tests on every push automatically.

GitHub Actions makes this trivial. Here's a complete workflow that runs lint, type checking, unit and integration tests, and build verification on every push:

# .github/workflows/ci.yml
name: CI
 
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
 
jobs:
  ci:
    runs-on: ubuntu-latest
 
    env:
      NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
      NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY }}
      SUPABASE_SECRET_KEY: ${{ secrets.SUPABASE_SECRET_KEY }}
 
    steps:
      - uses: actions/checkout@v4
 
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'
 
      - name: Install dependencies
        run: npm ci
 
      - name: Lint
        run: npm run lint
 
      - name: Type check
        run: npx tsc --noEmit
 
      - name: Run tests
        run: npx vitest run
 
      - name: Build
        run: npm run build

That's it. Paste this file into your repo, add your env vars as GitHub secrets, and you have CI. Every push now runs through four gates: lint, types, tests, and build.

The build is your first test. If your Next.js app doesn't build, nothing else matters. A surprising number of AI-generated apps break on npm run build because the AI wrote code that works in dev mode but fails under the stricter production compiler. TypeScript errors, missing imports, server/client boundary violations. The build catches all of these.

If your next.config has typescript: { ignoreBuildErrors: true }, remove it. That flag was a crutch during prototyping. In CI, you want the build to fail loudly on type errors. A type error caught in CI is infinitely cheaper than a runtime crash caught by a user.

What to Test First

If you're staring at an untested codebase and wondering where to start, here's the priority order. This is based on where production bugs actually cluster, not on theoretical purity:

Auth routes. If auth breaks, your entire app is inaccessible or, worse, insecure.
Payment flows. If billing breaks, you lose money or charge users incorrectly.
Data mutations. Create, update, delete operations where bugs mean data loss.
Public API endpoints. Anything exposed to the internet that could be abused.
Core UI journeys. The primary path a user takes through your app.

Notice what's not on the list: utility functions, formatting helpers, static pages, settings screens. Those matter, but they don't matter first.

The 80/20 of testing: 15 well-placed tests across these five priorities will catch roughly 80% of production regressions. The remaining 20% are edge cases that you'll discover over time and add tests for as they come up. That's fine. Testing is iterative. Start with the high-value targets and expand from there.

FinishKit automates this prioritization. It scans your codebase, identifies the critical paths that lack test coverage, and generates a prioritized list of what to test first, so you're not guessing.

The Compound Effect of Tests

Here's what changes once you have a basic test suite in place.

You push code faster, not slower. It sounds counterintuitive, but developers with tests deploy more frequently because they're not afraid of breaking things. The tests tell you immediately if something is wrong. No more manual click-through-the-whole-app verification before every deploy.

You refactor with confidence. AI-generated code often needs restructuring as your app grows. Without tests, every refactor is a gamble. With tests, you rename, move, and restructure knowing that your safety net will catch regressions.

You sleep through the night. Not a metaphor. When you have CI running on every push and tests covering your critical paths, you stop waking up at 2am wondering if the deploy you pushed at 6pm broke something. You know it didn't. The tests told you.

Start Today, Not Tomorrow

The most common mistake is treating tests as a "later" task. Later never comes. You'll always have a more exciting feature to build, a more urgent bug to fix, a more interesting problem to solve. And then one day your "quick fix" breaks auth, and you find out 18 hours later from a frustrated user.

Here's your assignment. Today. Not tomorrow.

Install Vitest: npm install -D vitest
Write three smoke tests (app renders, health check returns 200, protected route returns 401)
Write one integration test for your auth flow
Add the GitHub Actions workflow above
Push it all and watch CI go green

That's an hour of work. Maybe two. And from that point forward, every push to your repo is verified automatically. Every change runs through lint, types, tests, and build. Every broken auth flow gets caught before it reaches a user.

Tests aren't about perfection. They're not about satisfying some engineering ideal. They're about sleeping through the night after you push to production. Start with 15. Expand from there. Your future self, the one who pushes a "quick fix" at 11pm on a Tuesday, will thank you.