Skip to content
Go back

Generative UI Doesn't Move the Needle—Steering Does

· 10 min read

After working on multiple generative UI projects over the past year, I’ve reached a somewhat counterintuitive conclusion: the sophistication of AI-generated interfaces often doesn’t translate to meaningful user benefit.

When OpenAI released Canvas in late 2024 and Google demoed Gemini generating Flutter UIs on the fly, many of us (myself included) believed we had reached the inflection point for dynamic, AI-generated interfaces. The vision was compelling: interfaces that adapt in real-time to user needs, powered by LLMs that understand context and intent. But after shipping several generative UI features to production and observing real user behavior, I’ve become skeptical of this vision—at least in its current form.


Three Approaches to Generative UI

From my experience and observing the industry, generative UI implementations fall into three main categories:

1. Full HTML/CSS/JS Generation

The most flexible approach: the LLM generates complete web pages or interactive widgets from scratch. Think of the rich, interactive reports you see in products like Perplexity’s research reports or ChatGPT’s data analysis outputs.

Pros:

Cons:

2. Constrained Widget Composition

Instead of generating raw code, the LLM selects and composes from a predefined catalog of trusted UI components. This is the approach used by Vercel’s AI SDK for React, and more recently formalized by Google’s A2UI (Agent-to-UI) project.

A2UI represents the most mature thinking on this approach. Key design principles:

This architecture is now being adopted across Google products (Gemini Enterprise, Flutter’s GenUI SDK) and has integrations with AG UI/CopilotKit. If you’re building constrained widget composition, you’ll likely end up with something similar to A2UI’s architecture—it solves the right problems.

Pros:

Cons:

3. Offline Template Generation + Runtime Selection

The most latency-optimized approach: use LLMs during development to generate and refine templates offline (with as many agent passes, human reviews, and iterations as needed), then at runtime simply select the appropriate template based on user intent.

Pros:

Cons:


What I Learned After Deploying These Systems

Here’s what I’ve observed: runtime generation often doesn’t benefit users as much as we expected—but the value shows up in different places than anticipated.

For Information-Seeking Tasks

Yes, an interactive visualization of bubble sort can help users understand the algorithm better than a text explanation. But in practice, users often find that a simple Python code snippet in a code block—which they can copy, modify, and run themselves—is more valuable. The “wow factor” of a generated animation wears off quickly, and users gravitate toward the most useful format, not the most impressive one.

Research supports this observation. Studies on multimedia learning (Mayer, 2009) show that additional visual elements only help when they reduce cognitive load—extraneous visuals can actually increase cognitive load and hurt comprehension.

For Widget-Based Interaction

When we deployed constrained widget UIs, we expected users would prefer tapping buttons and interacting with cards over typing. What we found: the generated widgets were rarely more helpful than well-formatted markdown text.

Why? Users can always refine their request through natural language. The flexibility of saying “actually, make it for next Tuesday instead” beats clicking through a date picker that the AI happened to generate. The widget becomes a constraint on the user journey (“you should pick from these three options”) rather than an enhancement. This aligns with findings from Nielsen Norman Group’s research on chatbot UX: users often prefer open-ended text input over constrained choices because it gives them more control over the conversation.

To be fair, constraints aren’t inherently bad. If you’re building a gardening app, you probably don’t want users asking about house renovation—constrained widgets can helpfully guide users toward supported workflows. But this is a product decision about scope, not a user experience win from generative UI. You could achieve the same guardrails with a well-designed traditional interface or prompt engineering on the backend.

For Template-Based UIs

The template approach isn’t “generative” at runtime, but that’s precisely the point. By front-loading the generation to development time—where you can iterate with LLMs, run aesthetic agents, add human review, and refine until perfect—you get production-quality templates without runtime latency.

This significantly shortens the work for UX researchers and front-end engineers: instead of hand-crafting every variant, they can generate candidates and curate the best ones. At serving time, you simply select the right template for the user’s intent. The result can be cached at multiple layers (CDN, edge, client) for the lowest possible latency.

Users experience predictable, polished UI—and that’s often exactly what they want. The “wow factor” of runtime generation matters less than speed and reliability.


What’s Actually Valuable: Steering

If we step back and ask why we even need UI beyond text, voice, and video, the answer isn’t “to make things prettier.” It’s to enable better steering of the AI.

Consider this scenario: An LLM processes your request through 15 reasoning steps. At step 7, it makes a subtle mistake—perhaps misinterpreting an ambiguity in your request. The final output is wrong, but you can’t tell where it went wrong by reading the final answer.

What you need is a way to:

  1. Inspect intermediate steps easily
  2. Point to a specific step and say “this is where you went wrong”
  3. Redirect the model from that point

This is the kind of interaction that text alone struggles with. You want to click on step 7, highlight the problematic assumption, and tell the model to reconsider. This is steering—and it’s where UI can genuinely add value.

OpenAI’s Canvas is actually a good example of steering-focused UI: you can select text, request specific changes, and iterate on portions of the output. Anthropic’s Claude artifacts serve a similar purpose—letting you work with code or documents as objects you can manipulate, not just linear chat responses.


The Steering Test for Generative UI

Based on this experience, I now apply a simple heuristic when evaluating generative UI features:

Does this UI help the user steer the model more effectively?

If the answer is no—if the UI is purely cosmetic or just a fancier way to display information that text would convey equally well—then it’s unlikely to deliver lasting user value, no matter how impressive the technology behind it.

Examples of UI that passes the steering test:

Examples of UI that fails the steering test:


Conclusion

Generative UI is a technically fascinating capability, but technical sophistication isn’t the same as user value. After working in this space, I’ve concluded that the right question isn’t “can we generate this UI?” but “does this UI help users communicate with and steer the AI more effectively?”

The most impactful AI interfaces won’t be the most visually impressive ones—they’ll be the ones that make the collaboration between human and AI more effective. Sometimes that means a simple text box with good affordances for editing and refinement. Sometimes it means structured views that make intermediate reasoning inspectable.

In the end, this isn’t a technology question. It’s a question about communication: what interface makes it easiest for users to express intent and for the AI to respond appropriately? When we keep that question at the center, we build better products—even if they’re less flashy than what pure technological capability would allow.


References


Share this post on:

Previous Post
Adding Ads in LLM/Chatbot: Character Training for Monetization
Next Post
RLHF from an Engineering Perspective: PPO, GRPO, DPO, and Tool-Use Implementation