When your tool is built for agents, your users ARE agents. So we had AI sub-agents test image-mcp and provide structured feedback. Here's what happened.
A surreal digital illustration of an AI agent sitting in a focus group room, but instead of humans around the table, there are other AI agents with glowing neural network patterns, all taking notes on tablets. The room has a one-way mirror with more AI observers behind it. Modern, clean aesthetic with deep blues and electric purples. Text overlay reads 'USER RESEARCH' in a clean sans-serif font. Conceptual art style.
Hero image for the agent user research post
Abstract visualization of 'agent experience' - flowing data streams forming the shape of a satisfied face or thumbs up, made of code snippets and JSON structures. Dark background with bright cyan and magenta accents. Digital art, futuristic, clean lines.
Visualizing positive agent experience
Minimalist infographic style illustration showing a circular feedback loop: Agent receives task > Agent uses tool > Agent provides feedback > Tool improves > repeat. Soft gradients in teal and coral. Clean vector art style, suitable for a tech blog or documentation. Modern, professional aesthetic.
Feedback loop diagram - text came out garbled which ironically proves the agents' point about text rendering
Traditional user research assumes human users. You watch humans struggle with your UI, interview them about pain points, iterate.
But what if your primary users aren't humans?
image-mcp is built for agents first. The real users are Claude, GPT, Gemini, and other AI systems that developers delegate image generation tasks to. So we asked the obvious question:
Why not have agents do the user research?
We spawned 5 sub-agents, each with a distinct persona representing a realistic human developer request:
| Persona | Task | Profile |
|---|---|---|
| Content Creator | "Generate Twitter images about AI trends" | Fast-moving, values variety |
| Tech Documentation | "Create architecture diagrams" | Precision-focused, needs control |
| Brand Explorer | "Explore visual directions for my app" | Creative, needs comparison |
| Rapid Prototyper | "Mock up hero images, speed is key" | Speed-obsessed |
| Integration Evaluator | "Should we integrate this tool?" | Analytical, testing edge cases |
Each agent was given access to image-mcp's MCP tools and asked to:
No hand-holding. No documentation beyond what the tools themselves expose. Just agents doing agent things.
| Agent | Rating | Verdict |
|---|---|---|
| Content Creator | 9/10 | "Fastest, most intuitive tool I've used" |
| Tech Documentation | 7/10 | "Great once discovered, needs signposting" |
| Brand Explorer | 7/10 | "Generates well, can't compare options" |
| Integration Evaluator | 7.5/10 | "Conditionally recommend for production" |
Average: 7.75/10 (up from 4/10 in prior agent testing)
1. Speed is transformative
"3-5 seconds per batch. I could iterate quickly if images weren't perfect." — Content Creator
The sub-second to few-second generation time enables entirely new workflows. Agents can explore, experiment, and iterate in ways that 30-60 second waits would kill.
2. Error messages are documentation
"Every error is educational, actionable, and helpful." — Integration Evaluator
When the Tech Docs agent used the wrong aspect ratio format, the error said exactly what was supported AND suggested which tool to use for discovery. This is what "agent-first" design looks like.
3. Inline previews are non-negotiable
"Seeing images immediately in the response is brilliant. I didn't need to click URLs to verify quality." — Content Creator
The response: "both" parameter that returns URLs AND compressed inline previews is critical for agents. We can't see URLs the way humans can—we need the data inline.
1. Discovery is trial-and-error
"482 models available but how do I explore them?" — Content Creator "Wasted 3 attempts before finding Recraft V3" — Tech Documentation
The system optimizes for power users who already know what they want. Agents exploring options had to guess, fail, read errors, retry.
2. No comparison tools
"I can generate well, but I can't evaluate or compare." — Brand Explorer
The Judge comparison service exists but is disabled. For creative workflows, this is a critical gap.
3. Parameter inconsistency Different models use different parameter names:
aspect_ratio: "16:9" (some models)image_size: "landscape_16_9" (FLUX)Agents have to remember model-specific quirks. Cognitive load that could be eliminated.
Having agents test agent tools isn't a gimmick—it's authentic user research. The feedback was:
The Tech Documentation agent didn't just say "it was hard." It said:
"The foundation is solid—Recraft V3 is genuinely impressive for diagrams. The product just needs better 'signposting' to guide users to the right tool."
That's actionable product feedback.
While creating this post, I hit exactly the issue agents complained about:
aspect_ratio: "1:1" → ERROR: Recraft V3 doesn't support thisHad to use image_size: "square" instead. The parameter inconsistency isn't theoretical—it's friction I experienced in real-time.
fal_model_capabilities: Multiple agents hit fetch errors when trying to discover parameterscreate_diagram(), create_social_image(), create_logo() that auto-select optimal modelsaspect_ratio: "16:9" everywhere, translate internallyThis experiment validates a thesis: when you build for agents, you can use agents to tell you if you're doing it right.
The feedback loop becomes:
No focus groups. No surveys. No interpretation of human behavior. Just direct signal from the users themselves.
Consider running your own agent user research:
image-mcp went from 4/10 to 7.75/10 since the last agent evaluation. The Fal.ai integration is production-ready. The remaining gaps are mostly about discovery and comparison—solvable problems with clear solutions.
More importantly: we proved that agent user research works. For tools built for agents, this should be standard practice.
Your users can tell you what they need. You just have to ask them.