How We Stopped AI Design Tools from Inventing Features

Key Takeaways 7

Clever prompts can't fix missing context. Give AI design tools the same artifacts you'd brief a senior contractor with.
Adapt corporate-world tools — storymaps, user stories, glossaries — into living specs that live next to the code.
Every brief needs a standing preamble: load these canonical docs first, every single time, no exceptions.
Save the AI's design output as a pull request. Bundle review beats Slack-thread review by a wide margin.
Append brief revisions in place. Don't fragment — the arc of the conversation is the value.
Let the AI designer flag back. The highest-signal line in any spec is what it tells you you didn't ask correctly.
A version-locked glossary will save you a quarter. Vocabulary drift between product, design, and code is the silent budget killer.

The First Spec Came Back Beautiful

The first time I asked Claude Design to mock up a screen, the result was beautiful. It was also wrong. It invented two features we hadn't built, named one of our core entities by the wrong word, and confidently included a tab that didn't exist in our information architecture.

The work wasn't sloppy. It was the opposite — every pixel was intentional. The AI just didn't have what it needed to be intentional about the right things. So it filled in the gaps with what beautiful design usually contains.

I tried a better prompt. Then a longer prompt. Then a prompt with examples. The mocks got prettier and the inventions got more elaborate.

What Actually Works: The Boring Stuff from the Corporate World

I spent most of my career in product roles at companies large enough to have storymaps, user stories, glossaries, and architecture decision records. The artifacts that nobody loves but that quietly keep five teams aligned across a multi-quarter release.

I'd assumed the AI tooling would let us skip past those. We'd just describe what we wanted and Claude would generate it.

That wasn't wrong, exactly. It was incomplete. The AI is brilliant at the act of design. It's terrible at knowing what it doesn't know — what your terms mean, which constraints are load-bearing, which decisions are already locked in. You have to tell it. Not in the prompt — in documents it can read.

The fix was to recreate the same artifacts, but treat them as living specs: documents that live next to the code, get versioned with it, and feed every conversation with the AI design tool.

The Filing Cabinet

Here's what we settled on. Five folders, one rule per folder:

Designer Briefs — one outbound brief per ticket, named after the ticket ID (BIT-686-story-view-screen.md). This is what we send to the AI design tool.
Designer Output — what comes back: the HTML spec, the JSX components, the screenshots. Saved into a PR like any other code change.
Storymap & Documents — canonical vocabulary (glossary.md) and the flow docs that describe how features hang together.
Working Notes — analysis docs, decisions in progress, things-we-haven't-figured-out-yet.
ADRs — architecture decision records, one per decision, dated and linked.
None of it is fancy. It's a folder structure, a few naming conventions, and the discipline to actually use them.

Folder	What Lives There	Rule
Designer Briefs	One outbound brief per ticket, sent to the AI design tool	One file per ticket, named after the ticket ID
Designer Output	The AI's returned spec — HTML, JSX, screenshots	Saved as a PR with a README
Storymap & Documents	Canonical vocabulary, flow docs	Versioned; referenced from every brief
Working Notes	Analysis, in-progress decisions, audits	Plain-text, editable, not yet ADR-worthy
ADRs	Architecture decision records	One per decision, dated, linked, signed

What changed wasn't the tooling. It was that the next brief we sent to the AI assistant started with a five-line preamble: "Load these documents first. Reference them by name." From that round on, the mocks stopped inventing things.

Seven Tips That Made It Stick

One brief per ticket, kept in the repo
Not in Slack. Not in a Notion page that lives somewhere else. The brief is a markdown file named after the ticket and committed to git. When a contributor asks "what did we decide about the dateline?", the answer is one git grep away.
The ticket ID becomes the join key — to your project tracker, to the eventual PR, to the designer's output bundle. Future you will silently thank past you for this.
Use a standing preamble to load the canonical docs
Every brief opens with the same five-line block: "Load these before designing." The glossary. The relevant flow doc. The ADR that governs the system. The brief for any sibling work this depends on.
It feels redundant the third time you write it. It saves you the week when the AI mocks against a stale concept and you have to redo the entire round.
Append revisions in place — don't fragment
When you want to refine a brief, you'll be tempted to start BIT-686-v2.md. Don't. The whole *conversation arc* is the value — what you asked, what the designer pushed back on, what you ended up at.
Keep it in one file. Append a ## v2 — code audit follow-ups (YYYY-MM-DD) section. Leave v1 intact above it. A reader scrolling down sees the discussion, not the final state.

Screenshot of an existing designer brief, showing an incremental version adding and refining details — Multiple rounds of revision in one file. The conversation arc is the value.

Let Claude Design flag things back to you
The single highest-signal section in any designer output is the Flag back to product block — things the AI noticed that your brief got wrong, ambiguous, or stale. Treat that as free QA from the smartest contractor you've ever worked with.
Our most recent brief came back with three flags: a stale vocabulary file in the AI's sandbox (it had v1.0 of the glossary; the canonical was v2.0), a sibling brief we should split off, and a chip on a dateline that mixed two layers of our information model. All three were the kind of thing that would have shipped wrong.
Make addressing the flags part of your "design returned" checklist. Don't just review the mocks — review what the AI told you you didn't ask correctly.
A version-locked glossary is an investment in future time savings
Your shared vocabulary is the highest-leverage document in the repo. If "Collection" means three slightly different things to product, design, and engineering, every brief and every PR pays interest on that confusion forever.
Pick a version. Stamp it (glossary.md v2.0). Reference it from every brief preamble. When the term evolves — and it will — bump the version, write a short rationale into the file itself, and require the next brief to verify the bump landed in the AI's sandbox.
Audit your code between design revisions
Between v1 and v2 of a brief, walk the actual code that implements the surface. Note where the brief is silent, where the code drifted, where edge cases live. Fold those into v2 as additional asks.
This sounds like it doubles your work. It does — once. After that, the brief and the code stay in lockstep because you trained the loop to surface the gaps before the AI wastes a round mocking against them.
Save the AI's output as a pull request
When the design assistant ships you a bundle — HTML, CSS, JSX, screenshots — don't just download and skim. Save it into Designer Output/BIT-XXX/ as its own PR. Add a short README pointing at the entry file and noting any gaps you noticed.

Three benefits stack:

The bundle is reviewable in code review like everything else.
A future contributor can `git checkout` the moment in history when a decision was made.
Writing the README forces you to read the spec carefully enough to find the flag-backs.

What We're Still Figuring Out

We caught the glossary drift this time because the AI flagged it. A scheduled diff between canonical docs and any copies in tooling sandboxes would catch this earlier — but we haven't built that yet.
Ticket-to-brief sync is still mostly manual. We have ticket IDs in filenames, but no automated cross-check that every active ticket has a brief if it needs one.
And we use the term "Project" internally when reasoning about what users call "Collections" — it's crisper for ADRs and decisions, even though the UI never shows the word. That works for us at our size. We're not sure it'd scale to a larger team where the framing might leak into customer support tickets.

The Boring Truth

The Document Vault is a folder. The standing preamble is five lines. The brief-revision convention is a Markdown heading. Nothing about any of this is sophisticated.
What's sophisticated is the realization that AI design tools are a force multiplier on the artifacts you already have. If your decisions live in Slack threads, AI will multiply confusion. If your decisions live in versioned, linkable, durable documents — AI will multiply those decisions into shippable work, fast.