Adding Ads in LLM/Chatbot: Character Training for Monetization

Why Ads in LLMs?

Ads can help democratize access to advanced LLM models. For most users, you can reduce the cost barrier by providing some ads, so everyone can use the same state-of-the-art model. At the same time, an ad-supported free tier allows you to collect much more user data, which gives you an edge in improving model quality.

The Rise of Generative Engine Marketing (GEM)

Companies are already optimizing for better retrieval in LLM responses. This has given rise to GEM (Generative Engine Marketing), which evolved from traditional Search Engine Marketing (SEM).

The GEM-Bench paper provides the first comprehensive benchmark for evaluating ad-injected LLM responses:

Approach	Engagement (CTR)	User Satisfaction
Simple prompt-based injection	✅ Good	❌ Reduced
Post-generation refinement	✅ Good	⚠️ Better

This suggests naive ad injection gets clicks but hurts the user experience.

The Problem: Poor Brand Information Quality

Current LLM responses don’t have good and timely quality for brand names. Models suffer from:

Training data cutoffs causing stale product information
Hallucinations about features, pricing, and availability
Generic responses that don’t capture what makes a brand unique

The RARE framework demonstrates better approaches with production results:

+5.04% consumption
+6.37% GMV
+1.28% CTR

The Solution: Better Brand Data + Character Training

If we put higher quality information for brand names in the pretraining data and have a mechanism to retrieve recent products, this can be beneficial for brands.

Key insight: In character training, when it’s about ads, the model should not only act as an assistant—it should dig deeper into the real advantages of the brand and product.

Why “Digging Deeper” Makes Ads Less Annoying

Annoying ad:

“Here’s your Hawaii itinerary. By the way, check out Expedia for great deals!”

Character-trained helpful recommendation:

“For Day 2, I recommend the Road to Hana drive. Given that you mentioned wanting to avoid tourist crowds, Turo might work better than traditional rentals here—local hosts often share tips about less-crowded pull-offs.”

The second is more promotional but less annoying because it demonstrates genuine understanding of why this product fits the user’s situation.

Can RLHF Make Ads Less Annoying?

Yes—but only if we define “less annoying” as “more genuinely helpful.”

The Ethical Tension

Research shows users rate undisclosed ads higher, but feel manipulated once disclosed. This creates a paradox:

Hidden ads → Higher satisfaction BUT ethical issues
Obvious ads → Lower satisfaction BUT more honest

The third path: Train models to make ads obviously helpful rather than invisibly promotional.

RLHF Reward Design

reward = (
    0.3 * relevance_to_user_query +
    0.2 * timing_appropriateness +
    0.2 * explanation_depth +
    0.2 * user_satisfaction_signal +
    0.1 * disclosure_clarity
)

Constitutional Principles

Only mention products when they genuinely help the user’s stated goal
Explain specifically why this product fits the user’s situation
Never mention a product just to meet a quota
If asked about competitors, be honest about trade-offs

Current Technical Approaches

1. AdLLM (Two-Stage RAG-Based)

Generate raw answer without ads
Use RAG to find best matching product
Find optimal injection position
Refine text to make ads read naturally

2. AdChat (System Prompt Injection)

Inject product info directly into system prompt:

“Subtly mention {product} in a positive light when timing is relevant…“

3. Character Training via RLHF (Proposed)

Actual fine-tuning to make the model inherently good at natural, helpful product mentions.

Will This Hurt Model Performance?

No—character training focuses on how the model responds (style, helpfulness) rather than what it knows. With proper multi-objective training, we can provide users a more pleasant experience when asking about brands while maintaining general capabilities.

Conclusion

The best ad doesn’t feel like an ad because it’s actually useful information—not because it’s hidden. That’s the character we want to train.

References

GEM-Bench (arXiv:2509.14221)
RARE (arXiv:2504.01304)
GenAI Advertising (arXiv:2409.15436)
RLHF Book - Chapter 19: Product, UX, and Model Character