How to validate an AI startup idea before you write any code

2026-05-27 · 10 min read · By Real Problem AI

The most expensive thing a founder builds is the wrong product. Not because of the engineering cost, but because of the twelve months you cannot get back. Validation done well costs a weekend. Validation skipped costs a year.

This essay is the 5-axis rubric we use to score every problem in our directory, plus the weekend workflow that turns a vague idea into a go/no-go decision before you commit a single sprint.

The 5-axis rubric

Each axis is scored 1-10. A score of 7+ on all five is rare; that is the bar for "build it." Average around 6 is "interesting, keep looking." Below 5 on any axis is a kill signal.

Axis 1: problem severity

How much does this hurt? Measure in hours per week or dollars per month, not in subjective complaint volume. A 30-minute weekly frustration is a 4. A $300 surprise charge a month is a 7. A six-month sales cycle stalled by a single workflow is a 9.

The trap: founders confuse the volume of complaint with the severity of pain. Reddit threads with 8K upvotes can describe a 5-severity annoyance. Look for the language of urgency: "I had to," "I lost," "I cancelled."

Axis 2: AI feasibility today

Not "is AI plausibly going to solve this someday." Can a competent engineer ship a working v1 in 8-12 weeks with current models? Categorising bank transactions is a 9. Diagnosing a rare disease from a photo is a 4. Reading a legal redline is an 8.

The trap: scoring on the future capability of GPT-6 or Claude 5. You are not building for future models; you are building for the model on your laptop today.

Axis 3: market signal

How many real people credibly experience this problem? "Tech founders" is a 5. "Anyone with a credit card" is a 9. Read the platform numbers, not the post count. App Annie, US Census, Stripe Atlas data, BLS labor stats. These are dull and they are authoritative.

Axis 4: competition gap

How wide is the space between what the leading existing solution does and what the user actually wants? Big gap is good. If a category leader already does 80% of the job, the gap is a 3 and you are entering a re-skin fight you will lose.

To score this honestly, install the top three competitors and use them on a real task. Note what frustrated you in the first 20 minutes. Those frustrations are the gap.

Axis 5: defensibility (the axis we add for AI products)

What stops a competitor with twice your funding from copying you in six weeks? This is where AI-era validation diverges from classic SaaS. Possible moats include:

Proprietary data. You have a corpus nobody else has access to.
Workflow lock-in. You are inside a workflow that costs the user real switching pain.
Distribution. You have a community or audience competitors cannot rent.
Speed of iteration. You can ship faster than a bigger team can decide.

"We use Claude" is not a moat. Neither is "we fine-tuned on internal data" unless that data is genuinely hard to acquire.

The weekend workflow

Friday evening: collect

Pick one habitat (subreddit, IndieHackers community, HN "Ask HN" archive). Read 50 posts. Note every concrete complaint with the user's exact words. Goal: 8-12 candidate problems.

Saturday morning: score

Run each candidate through the 5-axis rubric. Be honest. Anything below 5 on any axis falls off the list. You should have 2-4 candidates left.

Saturday afternoon: install competitors

For each remaining candidate, install the top 2-3 existing solutions and use them on a real task for 30 minutes each. This is where the competition-gap score gets refined into a number you trust.

Sunday morning: five-name test

For your top candidate, find five real people with this problem. Not five hypothetical buyers. Five named humans with email addresses. Direct messages, IndieHackers, Twitter/X, friends of friends.

Sunday evening: decide

You have one candidate, five names and a 5-axis score. If the score is 7+ across all five axes and three of the five names confirm the problem in the next week, you build. Otherwise, you go back to Friday with a different habitat.

The rule: never start engineering before you have five named humans who confirmed the problem in their own words. The rule has saved us from three different products that "felt right."

What this saves you from

Three failure patterns the rubric catches early:

The interesting-but-tiny market. AI tooling for music producers is interesting. The market is too small to support a venture-backable business. The 5-axis rubric flags this on axis 3.

The valuable-but-saturated market. AI for sales emails is valuable. There are forty funded players. Axis 4 flags this immediately.

The valuable-but-not-yet-feasible market. AI medical diagnosis sounds high-impact. Reality is regulatory and feasibility blockers that fail axis 2. Founders waste 18 months learning this the slow way.

The rubric is boring. It is also the reason 100 problems in our directory have ten different scores per problem, and why the top decile actually does carry the highest opportunity.

See 100 AI startup ideas already scored on this exact rubric, with sources, personas and the missing wedge for each.

Browse the directory