tools

What a FinishKit Scan Actually Finds (Real Results from 5 AI-Built Apps)

We ran FinishKit on five real AI-built apps, from a Cursor SaaS to a Lovable marketplace to a Bolt landing page. Here's every finding, ranked by severity.

FinishKit TeamFebruary 18, 202615 min read

We talk a lot about the gaps AI coding tools leave behind. The missing auth checks. The zero tests. The hardcoded secrets sitting in committed files. But what does that actually look like across real projects?

We ran FinishKit scans on five AI-built apps spanning different tools, different stacks, and different builder experience levels. A SaaS dashboard. A two-sided marketplace. A waitlist landing page. An internal tool. A personal portfolio. Here's what we found: the good, the bad, and the "how is this in production?"

How a FinishKit Scan Works

Before we get into the results, here's what a scan actually does.

FinishKit clones your repo, detects your stack and dependencies, then runs a multi-pass LLM analysis across your entire codebase. It's not looking at files in isolation. It traces data flows across routes, components, and database calls to find the gaps that single-file linting misses.

The scan produces findings across five categories:

Security. Auth gaps, exposed secrets, injection vulnerabilities, missing RLS.
Testing. Test coverage, critical paths without verification, missing assertions.
Error handling. Unhandled exceptions, missing loading/error states, silent failures.
Deployment. Environment config, build issues, missing migrations, CI gaps.
UX. Accessibility, meta tags, performance, user-facing polish.

Each finding gets a severity level:

Critical. Immediate risk. Data exposure, financial manipulation, or total app failure.
High. Fix before launch. Significant gaps that will cause problems under real usage.
Medium. Fix soon. Quality issues that degrade the experience or create maintenance debt.
Low. Nice to have. Polish items that improve the product but don't block shipping.

The output is a prioritized Finish Plan: specific findings with file paths, code suggestions, and estimated fix effort. Not a vague list of best practices. A concrete, ordered to-do list for your actual codebase.

App 1: SaaS Dashboard Built with Cursor

Stack: Next.js 15, Supabase, Stripe, Tailwind Builder: Experienced developer, built over two weekends Total findings: 30 (4 critical, 8 high, 12 medium, 6 low)

This was the most competently built app in the batch. The developer clearly knew what they were doing. Clean component architecture, sensible file structure, good use of server components. The kind of project where you open the repo and think "this person ships."

And yet.

Critical findings:

Three API routes had no authentication checks whatsoever. The developer had built auth into the middleware for most routes but missed three endpoints that handled user settings, subscription data, and usage metrics. Anyone with the URL could read any user's data. No auth header required. No session check. Just a clean GET that returns everything.

The Stripe webhook endpoint accepted payloads without verifying the signature. An attacker could forge webhook events to grant themselves a premium subscription, modify billing amounts, or trigger arbitrary state changes in the app.

A database connection string was hardcoded in a utility file. Not in .env. In the source. Committed to git. Full read/write access to the production database.

And the user_subscriptions table had no RLS policies. The Supabase client was initialized with the anon key on the frontend, which meant any authenticated user could query any other user's subscription status, payment history, and plan details.

High findings included: zero error boundaries (a single component crash takes down the entire dashboard), no tests of any kind, and console.log statements exposing internal application state in the browser console.

Estimated fix time for critical items: ~8 hours.

The takeaway here is instructive. This wasn't a novice mistake. This was an experienced developer who built a solid app but missed the finishing pass. Cursor generated clean, functional code for everything the developer asked for. The developer just didn't ask for auth on every route, webhook verification, or RLS policies. The tool did exactly what it was told. The gaps were in what it wasn't told.

App 2: Marketplace Built with Lovable

Stack: React, Supabase, Stripe Connect Builder: Non-technical founder, built in 3 days Total findings: 36 (6 critical, 11 high, 15 medium, 4 low)

This is the one that made us wince.

Six critical findings in a single app. Every one of them represented an actively exploitable vulnerability. This app was live, accepting real user signups and processing real payments through Stripe Connect.

Critical findings:

Zero RLS policies on any table. Not "some tables were missing RLS." All of them. Every table in the database was fully readable and writable by any authenticated user. User profiles, transaction records, payment details, private messages between buyers and sellers. All of it accessible to anyone who signed up and opened the browser dev tools.

Payment logic was executed entirely client-side. The app calculated order totals, applied discounts, and validated pricing in React components. The server accepted whatever the client sent. An attacker could modify the price of any listing to zero, submit the order, and the backend would process it without question.

The admin panel was accessible without authentication. It was a separate route (/admin) with no auth check. Anyone who guessed the URL could access user management, transaction oversight, and platform settings.

User passwords were stored with MD5 hashing. Not bcrypt. Not argon2. MD5, a checksum algorithm that a modern GPU cracks at billions of hashes per second. (To Lovable's credit, Supabase Auth handles password hashing correctly. But this app had a separate "internal accounts" system that bypassed Supabase Auth entirely and rolled its own password storage.)

File uploads accepted any file type and any file size with no validation. Users could upload executable files, files exceeding storage limits, or files with malicious content.

The Supabase service role key was exposed in the client-side JavaScript bundle. This key bypasses all RLS policies and provides full admin access to the database. It was imported directly into a React component instead of being restricted to server-side code.

High findings included: no input validation on any form field (names, emails, addresses, payment details all accepted raw), and a search endpoint vulnerable to SQL injection through string concatenation.

Estimated fix time for critical items: ~16 hours.

This pattern matches the broader data. A 2025 audit found that 170 out of 1,645 Lovable-created applications had security vulnerabilities that exposed personal data. The issue isn't that Lovable is a bad tool. It's that app generation without a finishing pass produces apps with predictable, exploitable gaps.

App 3: Landing Page + Waitlist Built with Bolt

Stack: Next.js, Supabase (just for waitlist), Resend for emails Builder: Designer, first coding project Total findings: 15 (1 critical, 3 high, 7 medium, 4 low)

The simplest app in the batch, and correspondingly the fewest findings. This is not a coincidence. Scope is the single best predictor of vulnerability count. Less code means less surface area. No auth system means no auth bugs. No payment processing means no payment manipulation. No user-generated content means no XSS.

Critical finding:

The email validation on the waitlist signup form was missing entirely. The form accepted any string: empty submissions, strings without @ symbols, strings with script tags. This matters because the collected emails were sent directly to Resend for a drip campaign. Malformed entries could break the email pipeline, and injected content could potentially be rendered in the admin view of collected signups.

High findings:

No rate limiting on the waitlist signup endpoint. An attacker (or a bot) could submit thousands of fake signups per minute, polluting the waitlist data and potentially hitting Resend's API limits, which would block legitimate confirmation emails.

Meta tags were incomplete. The page had a title and basic description, but no Open Graph image, no Twitter card tags, and the description was truncated at 80 characters. For a landing page whose entire purpose is to be shared and discovered, this is a significant miss.

No error state on form submission failure. If the API returned an error (Supabase down, Resend rate limited, network timeout), the user saw nothing. No message. No retry button. The form just sat there looking functional while silently failing.

Medium findings included: a 3MB unoptimized hero image (the single biggest performance drain on the page), no loading state during form submission, and missing alt text on decorative images.

Estimated fix time for critical items: ~3 hours.

The lesson: if you're building something simple, keep it simple. This app had minimal attack surface because it did minimal things. That's not a weakness. That's good architecture.

App 4: Internal Tool Built with Replit

Stack: Express.js, PostgreSQL, React frontend Builder: Ops manager, built for their team Total findings: 21 (3 critical, 6 high, 9 medium, 3 low)

Internal tools are the most dangerous category of AI-built apps, because "internal" creates a false sense of security. The logic goes: "Only our team uses this, so we don't need auth." That logic holds right up until someone finds the URL, a disgruntled employee decides to exfiltrate data, or the "internal" tool gets shared with a contractor who shares it with their contractor.

Critical findings:

SQL injection in the search endpoint. The query was built with string concatenation:

const result = await pool.query(
  `SELECT * FROM inventory WHERE name LIKE '%${searchTerm}%'`
);

An attacker sends '; DROP TABLE inventory; -- and the table is gone. Or, more subtly, they use a UNION SELECT to extract data from other tables. This isn't theoretical. It's the most well-understood attack vector in web security, and the AI generated it without hesitation because string concatenation is the most common pattern in its training data.

No authentication on admin routes. The app had a concept of "admin" pages for managing users and viewing audit logs, but no actual auth check. Every route was publicly accessible to anyone who knew the URL structure.

Database credentials were committed in a .env file that was tracked by git. The .gitignore didn't include .env. Every developer who cloned the repo got production database credentials. Every fork, every backup, every CI log that printed environment state. All of them contained the keys to the production database.

High findings included: no CORS configuration (the API accepted requests from any origin, meaning any website could make authenticated requests to the backend), no input sanitization on user-generated content displayed in the UI, and the Express error handler returning full stack traces in production (leaking internal file paths, dependency versions, and database connection details to anyone who triggered an error).

Estimated fix time for critical items: ~6 hours.

App 5: Portfolio + Blog Built with v0

Stack: Next.js static site, MDX blog, minimal backend Builder: Junior developer, personal project Total findings: 10 (0 critical, 2 high, 5 medium, 3 low)

Zero critical findings. The cleanest scan of the batch. And it wasn't because v0 generates better code than the other tools. It's because a static site with no authentication, no database, and no API has almost no attack surface.

High findings:

Missing meta tags. No Open Graph tags, no canonical URLs, no structured data. For a portfolio site that exists to be discovered and shared, these gaps directly reduce visibility. A link shared on Twitter or LinkedIn would show a generic preview instead of the developer's name, role, and a compelling image.

No custom 404 page. The site served the default Next.js 404 when users hit a bad URL. Not a security issue, but a missed opportunity. A custom 404 with navigation back to the portfolio and blog would keep visitors on the site instead of losing them.

Medium findings included: images without alt text (accessibility failure and SEO penalty), broken heading hierarchy with multiple H1 tags on a single page, and no sitemap.xml for search engine crawling.

Estimated fix time for all items: ~2 hours.

Static sites are inherently safer. No auth means no auth bypass. No database means no injection. No API means no unauthorized access. If your project can be static, that simplicity is a feature, not a limitation.

The Pattern Across All Five Apps

Here's the summary view:

App	Tool	Critical	High	Medium	Low	Total	Est. Fix Time
SaaS Dashboard	Cursor	4	8	12	6	30	~8 hours
Marketplace	Lovable	6	11	15	4	36	~16 hours
Landing Page	Bolt	1	3	7	4	15	~3 hours
Internal Tool	Replit	3	6	9	3	21	~6 hours
Portfolio	v0	0	2	5	3	10	~2 hours

The patterns that showed up in every app with a backend: auth and RLS gaps, zero tests, missing error handling. More complex apps produced more findings, but not linearly. A marketplace with 2x the features didn't have 2x the issues. It had more like 1.2x, because many issues (no tests, no error boundaries) are binary. You either have them or you don't.

A few things stand out:

Auth and RLS gaps appeared in every app with a backend. The SaaS dashboard missed auth on three routes. The marketplace had no RLS at all. The internal tool had no auth anywhere. Three different tools, three different builders, same category of failure. AI tools don't think adversarially. They build what works, not what's safe.

Zero tests across all five apps. Not a single test file in any of the five repos. No unit tests. No integration tests. No end-to-end tests. This is the norm for AI-generated code, and it's the norm for a reason. None of these tools generate tests unless you explicitly ask for them.

Error handling was missing in all five. No error boundaries. No retry logic. No graceful degradation. No loading states in most cases. When things go wrong (and they will), these apps either crash, hang, or silently fail.

Complexity correlates with risk, but scope matters more. The marketplace (36 findings) was the most complex app. The portfolio (10 findings) was the simplest. But the internal tool (21 findings) was architecturally simpler than the SaaS dashboard (30 findings), but it had worse defaults because Replit's Express.js scaffold doesn't include security middleware by default.

Every app was fixable. None of the five apps needed a rewrite. The critical issues in each case could be resolved in a single focused session. The total fix time across all five apps was approximately 35 hours, less than a single work week. The gap between "risky prototype" and "shippable product" is real, but it's not insurmountable.

The Finish Plan in Action

What does the prioritized fix list actually look like? Here's a simplified version of what the SaaS dashboard's Finish Plan contained:

P0, Critical (fix immediately):

Add auth middleware to /api/settings, /api/usage, /api/subscription. File paths, expected auth pattern, 15 min estimated fix.
Add Stripe webhook signature verification to /api/webhooks/stripe. Code example using stripe.webhooks.constructEvent, 20 min.
Move database connection string from lib/db.ts to .env and rotate credentials. Step-by-step instructions, 10 min.
Add RLS policies to user_subscriptions table. Exact SQL statements, 15 min.

P1, High (fix before launch):

Add React error boundary wrapping dashboard layout. Component code, placement guidance, 30 min.
Add baseline tests for auth flow and critical API routes. Test file scaffolding with specific assertions, 2 hours.
Remove or gate console.log statements in production. File list with line numbers, 15 min.

Each finding includes four things: what's wrong, where it is (file path and line number), why it matters (the actual risk, not a generic warning), and how to fix it (specific code changes or commands). The goal is to turn what would normally be a weekend of detective work into a clear, ordered checklist you can work through methodically.

Most apps go from "risky" to "shippable" in a single focused session. Not perfect. Not fully hardened against nation-state attackers. But safe enough to handle real users and real data without the kinds of failures that end up as cautionary tweets.

What This Means for Your App

Every AI-built app has gaps. The five apps in this post were built by different people, with different tools, at different skill levels, for different purposes. They all had issues. The 47-point finishing checklist covers the categories. The security deep dive explains why the vulnerabilities exist. The Supabase RLS guide walks through one of the most common gaps in detail.

But the question isn't whether your app has gaps. It does. The question is whether you find them before your users do.

The findings in this post weren't discovered by attackers. They were discovered by a scan that took minutes. The fixes weren't heroic engineering efforts. They were focused sessions with a clear list. The difference between an app that leaks data and one that doesn't wasn't talent or budget. It was the decision to look.

Run a scan on your app. See what it finds. Fix the critical items. Then ship with confidence.