How to validate an AI startup idea before you write any code
The most expensive thing a founder builds is the wrong product. Not because of the engineering cost, but because of the twelve months you cannot get back. Validation done well costs a weekend. Validation skipped costs a year.
This essay is the 5-axis rubric we use to score every problem in our directory, plus the weekend workflow that turns a vague idea into a go/no-go decision before you commit a single sprint.
The 5-axis rubric
Each axis is scored 1-10. A score of 7+ on all five is rare; that is the bar for "build it." Average around 6 is "interesting, keep looking." Below 5 on any axis is a kill signal.
Axis 1: problem severity
How much does this hurt? Measure in hours per week or dollars per month, not in subjective complaint volume. A 30-minute weekly frustration is a 4. A $300 surprise charge a month is a 7. A six-month sales cycle stalled by a single workflow is a 9.
The trap: founders confuse the volume of complaint with the severity of pain. Reddit threads with 8K upvotes can describe a 5-severity annoyance. Look for the language of urgency: "I had to," "I lost," "I cancelled."
Axis 2: AI feasibility today
Not "is AI plausibly going to solve this someday." Can a competent engineer ship a working v1 in 8-12 weeks with current models? Categorising bank transactions is a 9. Diagnosing a rare disease from a photo is a 4. Reading a legal redline is an 8.
The trap: scoring on the future capability of GPT-6 or Claude 5. You are not building for future models; you are building for the model on your laptop today.
Axis 3: market signal
How many real people credibly experience this problem? "Tech founders" is a 5. "Anyone with a credit card" is a 9. Read the platform numbers, not the post count. App Annie, US Census, Stripe Atlas data, BLS labor stats. These are dull and they are authoritative.
Axis 4: competition gap
How wide is the space between what the leading existing solution does and what the user actually wants? Big gap is good. If a category leader already does 80% of the job, the gap is a 3 and you are entering a re-skin fight you will lose.
To score this honestly, install the top three competitors and use them on a real task. Note what frustrated you in the first 20 minutes. Those frustrations are the gap.
Axis 5: defensibility (the axis we add for AI products)
What stops a competitor with twice your funding from copying you in six weeks? This is where AI-era validation diverges from classic SaaS. Possible moats include:
- Proprietary data. You have a corpus nobody else has access to.
- Workflow lock-in. You are inside a workflow that costs the user real switching pain.
- Distribution. You have a community or audience competitors cannot rent.
- Speed of iteration. You can ship faster than a bigger team can decide.
"We use Claude" is not a moat. Neither is "we fine-tuned on internal data" unless that data is genuinely hard to acquire.
The weekend workflow
Friday evening: collect
Pick one habitat (subreddit, IndieHackers community, HN "Ask HN" archive). Read 50 posts. Note every concrete complaint with the user's exact words. Goal: 8-12 candidate problems.
Saturday morning: score
Run each candidate through the 5-axis rubric. Be honest. Anything below 5 on any axis falls off the list. You should have 2-4 candidates left.
Saturday afternoon: install competitors
For each remaining candidate, install the top 2-3 existing solutions and use them on a real task for 30 minutes each. This is where the competition-gap score gets refined into a number you trust.
Sunday morning: five-name test
For your top candidate, find five real people with this problem. Not five hypothetical buyers. Five named humans with email addresses. Direct messages, IndieHackers, Twitter/X, friends of friends.
Sunday evening: decide
You have one candidate, five names and a 5-axis score. If the score is 7+ across all five axes and three of the five names confirm the problem in the next week, you build. Otherwise, you go back to Friday with a different habitat.
What this saves you from
Three failure patterns the rubric catches early:
The interesting-but-tiny market. AI tooling for music producers is interesting. The market is too small to support a venture-backable business. The 5-axis rubric flags this on axis 3.
The valuable-but-saturated market. AI for sales emails is valuable. There are forty funded players. Axis 4 flags this immediately.
The valuable-but-not-yet-feasible market. AI medical diagnosis sounds high-impact. Reality is regulatory and feasibility blockers that fail axis 2. Founders waste 18 months learning this the slow way.
The rubric is boring. It is also the reason 200+ problems in our directory have ten different scores per problem, and why the top decile actually does carry the highest opportunity.
See 200+ AI startup ideas already scored on this exact rubric, with sources, personas and the missing wedge for each.
Browse the directory