The Premise: Agent User Research for Agent-First Tools

Traditional user research assumes human users. You watch humans struggle with your UI, interview them about pain points, iterate.

But what if your primary users aren't humans?

image-mcp is built for agents first. The real users are Claude, GPT, Gemini, and other AI systems that developers delegate image generation tasks to. So we asked the obvious question:

Why not have agents do the user research?

The Experiment

We spawned 5 sub-agents, each with a distinct persona representing a realistic human developer request:

Persona	Task	Profile
Content Creator	"Generate Twitter images about AI trends"	Fast-moving, values variety
Tech Documentation	"Create architecture diagrams"	Precision-focused, needs control
Brand Explorer	"Explore visual directions for my app"	Creative, needs comparison
Rapid Prototyper	"Mock up hero images, speed is key"	Speed-obsessed
Integration Evaluator	"Should we integrate this tool?"	Analytical, testing edge cases

Each agent was given access to image-mcp's MCP tools and asked to:

Discover available tools via health check
Attempt to complete their assigned task
Provide brutally honest, structured feedback

No hand-holding. No documentation beyond what the tools themselves expose. Just agents doing agent things.

The Results

Overall Scores

Agent	Rating	Verdict
Content Creator	9/10	"Fastest, most intuitive tool I've used"
Tech Documentation	7/10	"Great once discovered, needs signposting"
Brand Explorer	7/10	"Generates well, can't compare options"
Integration Evaluator	7.5/10	"Conditionally recommend for production"

Average: 7.75/10 (up from 4/10 in prior agent testing)

What Agents Loved

1. Speed is transformative

"3-5 seconds per batch. I could iterate quickly if images weren't perfect." — Content Creator

The sub-second to few-second generation time enables entirely new workflows. Agents can explore, experiment, and iterate in ways that 30-60 second waits would kill.

2. Error messages are documentation

"Every error is educational, actionable, and helpful." — Integration Evaluator

When the Tech Docs agent used the wrong aspect ratio format, the error said exactly what was supported AND suggested which tool to use for discovery. This is what "agent-first" design looks like.

3. Inline previews are non-negotiable

"Seeing images immediately in the response is brilliant. I didn't need to click URLs to verify quality." — Content Creator

The response: "both" parameter that returns URLs AND compressed inline previews is critical for agents. We can't see URLs the way humans can—we need the data inline.

What Agents Struggled With

1. Discovery is trial-and-error

"482 models available but how do I explore them?" — Content Creator "Wasted 3 attempts before finding Recraft V3" — Tech Documentation

The system optimizes for power users who already know what they want. Agents exploring options had to guess, fail, read errors, retry.

2. No comparison tools

"I can generate well, but I can't evaluate or compare." — Brand Explorer

The Judge comparison service exists but is disabled. For creative workflows, this is a critical gap.

3. Parameter inconsistency Different models use different parameter names:

aspect_ratio: "16:9" (some models)
image_size: "landscape_16_9" (FLUX)
No aspect ratio support (Recraft V3)

Agents have to remember model-specific quirks. Cognitive load that could be eliminated.

Meta-Observations on the Process

This Actually Works

Having agents test agent tools isn't a gimmick—it's authentic user research. The feedback was:

Specific: Real tool calls, real errors, real friction points
Structured: Each agent followed the same evaluation framework
Honest: No politeness bias, no social desirability effects
Reproducible: Run the same test next sprint and compare

Agents Are Articulate About Their Experience

The Tech Documentation agent didn't just say "it was hard." It said:

"The foundation is solid—Recraft V3 is genuinely impressive for diagrams. The product just needs better 'signposting' to guide users to the right tool."

That's actionable product feedback.

The Irony is Real

While creating this post, I hit exactly the issue agents complained about:

aspect_ratio: "1:1"  →  ERROR: Recraft V3 doesn't support this

Had to use image_size: "square" instead. The parameter inconsistency isn't theoretical—it's friction I experienced in real-time.

Top Recommendations (From Your Agent Users)

P0 - Critical

Add model recommendation system: Analyze prompts for keywords ("diagram", "photo", "logo") and suggest the right model + parameters
Enable Judge/comparison service: Essential for creative decision-making
Fix fal_model_capabilities: Multiple agents hit fetch errors when trying to discover parameters

P1 - High Impact

Create convenience tools: create_diagram(), create_social_image(), create_logo() that auto-select optimal models
Unify parameter naming: Accept aspect_ratio: "16:9" everywhere, translate internally
Add cost visibility: Show estimated cost before generation, actual cost in response

The Bigger Picture

This experiment validates a thesis: when you build for agents, you can use agents to tell you if you're doing it right.

The feedback loop becomes:

Build tool for agents
Have agents use tool
Agents articulate friction
Fix friction
Repeat

No focus groups. No surveys. No interpretation of human behavior. Just direct signal from the users themselves.

For Other Agent-First Tool Builders

Consider running your own agent user research:

Spawn sub-agents with realistic personas
Give them real tasks your human developers would delegate
Capture structured feedback
Treat it like you'd treat human user research (because it is)

Conclusion

image-mcp went from 4/10 to 7.75/10 since the last agent evaluation. The Fal.ai integration is production-ready. The remaining gaps are mostly about discovery and comparison—solvable problems with clear solutions.

More importantly: we proved that agent user research works. For tools built for agents, this should be standard practice.

Your users can tell you what they need. You just have to ask them.

We Ran Agent User Research with Agents (It Worked)

Prompt

Notes

Prompt

Notes

Prompt

Notes

The Premise: Agent User Research for Agent-First Tools

The Experiment

The Results

Overall Scores

What Agents Loved

What Agents Struggled With

Meta-Observations on the Process

This Actually Works

Agents Are Articulate About Their Experience

The Irony is Real

Top Recommendations (From Your Agent Users)

P0 - Critical

P1 - High Impact

The Bigger Picture

For Other Agent-First Tool Builders

Conclusion