Collecting & Using Agent Playbooks
End-to-end workflows for collecting user playbooks from interactions, aggregating them into actionable agent playbooks, and using them to improve agent behavior.
Collecting & Using Agent Playbooks
End-to-end workflows for collecting user playbooks from interactions, aggregating them into actionable agent playbooks, and using them to improve agent behavior. For method-level details, see the Playbook API Reference.
Overview
Reflexio's playbook system operates in two layers:
- User Playbooks: Individual playbook entries extracted from each user interaction based on your configured criteria
- Agent Playbooks: Consolidated insights from multiple user playbooks, surfacing patterns and actionable recommendations
This two-layer approach ensures you can both investigate individual interactions and identify broader trends across your user base.
Setup
from reflexio import ReflexioClient
client = ReflexioClient() # uses REFLEXIO_API_KEY env var
# Self-hosted: client = ReflexioClient(url_endpoint="http://localhost:8081")Configuring Playbook Extraction
Before collecting playbook entries, you need to configure what to extract. Each PlaybookConfig defines a specific type of playbook entry to capture from user interactions.
Basic Playbook Configuration
# Configure playbook extraction for customer satisfaction
satisfaction_playbook = {
"playbook_name": "customer_satisfaction",
"playbook_definition_prompt": """
Analyze the user's response to determine their satisfaction level with the agent's help.
Look for:
- Explicit expressions of satisfaction or frustration
- Whether the user's question was answered completely
- Signs of confusion or repeated questions
- Positive acknowledgments like "thanks" or "that's helpful"
"""
}
# Apply configuration
config = client.get_config()
config.playbook_configs = [satisfaction_playbook]
client.set_config(config)Multiple Playbook Types
# Configure multiple playbook types for comprehensive analysis
playbook_configs = [
{
"playbook_name": "response_quality",
"playbook_definition_prompt": """
Evaluate the quality of the agent's responses:
- Accuracy: Was the information provided correct?
- Completeness: Did it fully address the user's question?
- Clarity: Was the response easy to understand?
- Relevance: Was the response on-topic?
""",
"metadata_definition_prompt": """
Extract metadata:
- quality_score: 1-5 rating
- issues_found: list of specific problems
- strengths: list of positive aspects
"""
},
{
"playbook_name": "task_completion",
"playbook_definition_prompt": """
Determine if the agent successfully helped the user complete their intended task:
- Did the user achieve their goal?
- Were there any blockers or failures?
- How many attempts were needed?
""",
"metadata_definition_prompt": """
Extract metadata:
- task_completed: boolean
- attempts_required: number
- blocking_issues: list of issues that prevented completion
"""
},
{
"playbook_name": "user_sentiment",
"playbook_definition_prompt": """
Analyze the user's emotional state throughout the conversation:
- Starting sentiment (neutral, positive, negative, frustrated)
- Ending sentiment
- Key moments that changed sentiment
""",
"metadata_definition_prompt": """
Extract metadata:
- initial_sentiment: string
- final_sentiment: string
- sentiment_change: improved/unchanged/worsened
"""
}
]
config = client.get_config()
config.playbook_configs = playbook_configs
client.set_config(config)Configuring Playbook Aggregation
# Configure how user playbooks are aggregated into agent playbooks
aggregator_config = {
"min_cluster_size": 5, # Minimum user playbooks before aggregation
"reaggregation_trigger_count": 10 # Re-aggregate after this many new playbooks
}
# Apply aggregator to specific playbook type
playbook_with_aggregator = {
"playbook_name": "pain_points",
"playbook_definition_prompt": """
Identify user pain points and frustrations:
- Features that don't work as expected
- Missing capabilities users ask for
- Confusing interactions or workflows
- Repeated user complaints
""",
"playbook_aggregator_config": aggregator_config
}
config = client.get_config()
config.playbook_configs = [playbook_with_aggregator]
client.set_config(config)Collecting Playbook Entries
Playbook entries are automatically extracted when you publish interactions. Ensure your agent version is set for proper tracking.
Publishing Interactions for Playbook Extraction
# Publish interaction with agent version for playbook tracking
response = client.publish_interaction(
user_id="user_123",
interactions=[
{"role": "User", "content": "How do I reset my password?"},
{"role": "Agent", "content": "To reset your password, go to Settings > Security > Reset Password. You'll receive an email with a reset link."},
{"role": "User", "content": "Perfect, that worked! Thanks!"}
],
source="support_chat",
agent_version="v2.1.0", # Important for tracking playbook entries by version
wait_for_response=True,
)Multi-Turn Conversation Playbook Extraction
# Complex multi-turn conversation for richer playbook extraction
conversation = [
{"role": "User", "content": "I need help with my order #12345"},
{"role": "Agent", "content": "I'd be happy to help with order #12345. Let me look that up for you."},
{"role": "User", "content": "It says delivered but I never received it"},
{"role": "Agent", "content": "I see the tracking shows it was delivered yesterday. Let me check with the carrier for more details. Can you confirm your delivery address?"},
{"role": "User", "content": "123 Main St, Apt 4B"},
{"role": "Agent", "content": "The package was left at the building's front desk. Could you check there?"},
{"role": "User", "content": "Found it! Thank you so much for your help!"}
]
client.publish_interaction(
user_id="customer_456",
interactions=conversation,
source="order_support",
agent_version="v2.1.0",
session_id="order_issues_batch_001",
wait_for_response=True,
)Learning from Expert Responses
Expert content provides an alternative way to generate playbook entries -- by comparing agent responses against expert-provided ideal responses rather than relying solely on user behavior signals.
Publishing Expert-Reviewed Interactions
# Publish interactions with expert ideal responses
# Reflexio automatically detects expert_content and uses a specialized extraction pipeline
client.publish_interaction(
user_id="user_789",
interactions=[
{"role": "User", "content": "How do I reset my password?"},
{
"role": "Agent",
"content": "Go to Settings and click Reset Password.",
"expert_content": (
"To reset your password: 1) Go to Settings > Security > Reset Password. "
"2) Enter your current password for verification. "
"3) Choose a new password (minimum 12 characters, must include uppercase, "
"lowercase, number, and special character). "
"4) You'll receive a confirmation email. If you've forgotten your current "
"password, use the 'Forgot Password' link on the login page instead."
)
}
],
source="expert_review",
agent_version="v2.1.0",
session_id="expert_batch_001",
wait_for_response=True,
)Expert-derived playbook entries enter the standard aggregation pipeline -- once enough similar entries accumulate (based on your min_cluster_size), they get aggregated into actionable agent playbooks alongside user-derived entries.
Retrieving User Playbooks
User playbooks are individual playbook entries extracted from each interaction. Use these for detailed analysis of specific conversations.
Get All User Playbooks
# Retrieve all user playbooks
response = client.get_user_playbooks()
print(f"Total user playbooks: {len(response.user_playbooks)}")
for playbook in response.user_playbooks:
print(f"\nPlaybook ID: {playbook.user_playbook_id}")
print(f"Playbook Name: {playbook.playbook_name}")
print(f"Agent Version: {playbook.agent_version}")
print(f"Request ID: {playbook.request_id}")
print(f"Content: {playbook.content}")
print("-" * 50)Filter User Playbooks by Name
# Get user playbooks for a specific playbook type
response = client.get_user_playbooks(
playbook_name="response_quality",
limit=100
)
print(f"Found {len(response.user_playbooks)} response quality playbook entries")
# Analyze the entries
quality_scores = []
for playbook in response.user_playbooks:
print(f"Request: {playbook.request_id}")
print(f"Content: {playbook.content[:200]}...")
print("-" * 30)Filter by Status
# Get only current (active) user playbooks
current_playbooks = client.get_user_playbooks(
status_filter=[None], # None represents CURRENT status
limit=50
)
# Get pending user playbooks (from rerun operations)
pending_playbooks = client.get_user_playbooks(
status_filter=[Status.PENDING],
limit=50
)
# Get archived user playbooks
archived_playbooks = client.get_user_playbooks(
status_filter=[Status.ARCHIVED],
limit=50
)
print(f"Current: {len(current_playbooks.user_playbooks)}")
print(f"Pending: {len(pending_playbooks.user_playbooks)}")
print(f"Archived: {len(archived_playbooks.user_playbooks)}")Retrieving Aggregated Agent Playbooks
Aggregated playbooks consolidate multiple user playbooks into actionable insights. These are more suitable for decision-making and trend analysis.
Get All Agent Playbooks
# Retrieve agent playbooks
response = client.get_agent_playbooks()
print(f"Total agent playbooks: {len(response.agent_playbooks)}")
for playbook in response.agent_playbooks:
print(f"\nPlaybook ID: {playbook.agent_playbook_id}")
print(f"Name: {playbook.playbook_name}")
print(f"Agent Version: {playbook.agent_version}")
print(f"Status: {playbook.playbook_status.value}")
print(f"Content: {playbook.content}")
print(f"Metadata: {playbook.playbook_metadata}")
print("-" * 50)Filter by Playbook Name
# Get agent playbooks for a specific type
response = client.get_agent_playbooks(
playbook_name="pain_points",
limit=50
)
print("Aggregated Pain Points:")
for playbook in response.agent_playbooks:
print(f"\n[{playbook.playbook_status.value}] {playbook.content}")Using Cache for Efficiency
# First call - fetches from API
playbooks = client.get_agent_playbooks(playbook_name="response_quality")
# Second call with same params - uses cache
cached_playbooks = client.get_agent_playbooks(playbook_name="response_quality")
# Force refresh to bypass cache
fresh_playbooks = client.get_agent_playbooks(
playbook_name="response_quality",
force_refresh=True
)Analyzing Playbook Data
Playbook Volume Analysis
from datetime import datetime
from collections import defaultdict
# Get all user playbooks
response = client.get_user_playbooks(limit=500)
# Analyze by playbook type
playbook_by_type = defaultdict(list)
for playbook in response.user_playbooks:
playbook_by_type[playbook.playbook_name].append(playbook)
print("Playbook Distribution by Type:")
for playbook_name, entries in sorted(
playbook_by_type.items(), key=lambda x: len(x[1]), reverse=True
):
print(f" {playbook_name}: {len(entries)} entries")Agent Version Comparison
# Compare playbook entries across agent versions
version_playbooks = defaultdict(list)
response = client.get_user_playbooks(
playbook_name="response_quality",
limit=500
)
for entry in response.user_playbooks:
version_playbooks[entry.agent_version].append(entry)
print("Playbook Count by Agent Version:")
for version, entries in sorted(version_playbooks.items()):
print(f" {version}: {len(entries)} entries")
# Identify versions with the most playbook entries
most_playbooks_version = max(version_playbooks.items(), key=lambda x: len(x[1]))
print(f"\nMost entries: {most_playbooks_version[0]} ({len(most_playbooks_version[1])} entries)")Temporal Analysis
from datetime import datetime, timedelta
response = client.get_user_playbooks(limit=1000)
# Group by date
daily_playbooks = defaultdict(int)
for playbook in response.user_playbooks:
date = datetime.fromtimestamp(playbook.created_at).date()
daily_playbooks[date] += 1
print("Daily Playbook Volume (last 7 days):")
today = datetime.now().date()
for i in range(7):
date = today - timedelta(days=i)
count = daily_playbooks.get(date, 0)
bar = "#" * (count // 5) # Simple visualization
print(f" {date}: {count:4d} {bar}")Playbook Status Distribution
# Analyze agent playbook approval rates
response = client.get_agent_playbooks(limit=200)
status_counts = defaultdict(int)
for playbook in response.agent_playbooks:
status_counts[playbook.playbook_status.value] += 1
print("Aggregated Playbook Status Distribution:")
total = len(response.agent_playbooks)
for status, count in status_counts.items():
percentage = (count / total * 100) if total > 0 else 0
print(f" {status}: {count} ({percentage:.1f}%)")Production Patterns
Monitoring Collection Health
def check_playbook_health(client, min_daily_playbookss=10):
"""
Monitor playbook collection health.
Args:
client: ReflexioClient instance
min_daily_playbookss: Minimum expected daily playbook entries
"""
from datetime import datetime, timedelta
response = client.get_user_playbooks(limit=100)
# Check if we're collecting playbook entries
if not response.user_playbooks:
print("WARNING: No user playbooks found!")
return False
# Check recent activity
today = datetime.now()
yesterday = today - timedelta(days=1)
yesterday_ts = int(yesterday.timestamp())
recent_count = sum(
1 for f in response.user_playbooks if f.created_at > yesterday_ts
)
if recent_count < min_daily_playbookss:
print(f"WARNING: Only {recent_count} playbook entries in last 24h (expected {min_daily_playbookss}+)")
return False
print(f"Playbook health OK: {recent_count} playbook entries in last 24h")
return True
# Run health check
check_playbook_health(client)Automated Playbook Report
def generate_playbook_report(client, playbook_name=None, days=7):
"""
Generate a comprehensive playbook report.
Args:
client: ReflexioClient instance
playbook_name: Optional filter by playbook type
days: Number of days to include in report
"""
from datetime import datetime, timedelta
from collections import defaultdict
# Get user playbooks
raw_response = client.get_user_playbooks(
limit=1000,
playbook_name=playbook_name
)
# Get agent playbooks
agg_response = client.get_agent_playbooks(
limit=200,
playbook_name=playbook_name
)
# Calculate metrics
cutoff = datetime.now() - timedelta(days=days)
cutoff_ts = int(cutoff.timestamp())
recent_raw = [f for f in raw_response.user_playbooks if f.created_at > cutoff_ts]
# Version breakdown
versions = defaultdict(int)
for f in recent_raw:
versions[f.agent_version] += 1
# Print report
print(f"\n{'='*60}")
print(f"PLAYBOOK REPORT - Last {days} Days")
if playbook_name:
print(f"Filtered by: {playbook_name}")
print(f"{'='*60}")
print(f"\nUSER PLAYBOOKS:")
print(f" Total (all time): {len(raw_response.user_playbooks)}")
print(f" Recent ({days} days): {len(recent_raw)}")
print(f" Daily average: {len(recent_raw) / days:.1f}")
print(f"\nAGENT PLAYBOOKS:")
print(f" Total: {len(agg_response.agent_playbooks)}")
status_counts = defaultdict(int)
for f in agg_response.agent_playbooks:
status_counts[f.playbook_status.value] += 1
for status, count in status_counts.items():
print(f" {status}: {count}")
print(f"\nBY AGENT VERSION:")
for version, count in sorted(versions.items(), key=lambda x: x[1], reverse=True):
print(f" {version}: {count}")
print(f"\n{'='*60}\n")
# Generate report
generate_playbook_report(client, playbook_name="response_quality", days=30)Version-Specific Playbook Analysis
def analyze_version_playbooks(client, agent_version):
"""
Deep analysis of playbook entries for a specific agent version.
Args:
client: ReflexioClient instance
agent_version: Agent version to analyze
"""
# Get all user playbooks (we'll filter client-side for version)
response = client.get_user_playbooks(limit=500)
# Filter by version
version_playbooks = [
f for f in response.user_playbooks if f.agent_version == agent_version
]
if not version_playbooks:
print(f"No playbook entries found for version {agent_version}")
return
print(f"\nAnalysis for Agent Version: {agent_version}")
print(f"Total entries: {len(version_playbooks)}")
# Group by playbook type
by_type = defaultdict(list)
for f in version_playbooks:
by_type[f.playbook_name].append(f)
print("\nBreakdown by playbook type:")
for playbook_type, entries in by_type.items():
print(f"\n {playbook_type}: {len(entries)} entries")
# Show sample playbook content
if entries:
sample = entries[0]
content_preview = sample.content[:150]
print(f" Sample: {content_preview}...")
# Analyze specific version
analyze_version_playbooks(client, "v2.1.0")Continuous Improvement Workflow
def identify_improvement_areas_from_playbooks(client, top_n=5):
"""
Identify top areas for agent improvement based on playbook data.
Args:
client: ReflexioClient instance
top_n: Number of top issues to return
"""
# Get agent playbooks
response = client.get_agent_playbooks(limit=100)
# Focus on approved playbooks (validated insights)
approved = [
f for f in response.agent_playbooks
if f.playbook_status == PlaybookStatus.APPROVED
]
print(f"\nTop {top_n} Validated Improvement Areas:")
print("-" * 50)
for i, playbook in enumerate(approved[:top_n], 1):
print(f"\n{i}. [{playbook.playbook_name}]")
print(f" {playbook.content}")
if playbook.playbook_metadata:
print(f" Metadata: {playbook.playbook_metadata}")
# Identify improvement areas
identify_improvement_areas_from_playbooks(client, top_n=5)A/B Testing with Playbooks
def compare_versions_playbooks(client, version_a, version_b, playbook_name):
"""
Compare playbook entries between two agent versions for A/B testing.
Args:
client: ReflexioClient instance
version_a: First agent version
version_b: Second agent version
playbook_name: Playbook type to compare
"""
# Get user playbooks for the specific playbook type
response = client.get_user_playbooks(
playbook_name=playbook_name,
limit=500
)
# Split by version
playbooks_a = [f for f in response.user_playbooks if f.agent_version == version_a]
playbooks_b = [f for f in response.user_playbooks if f.agent_version == version_b]
print(f"\nA/B Test Comparison: {playbook_name}")
print(f"{'='*50}")
print(f"\n{version_a}:")
print(f" Total entries: {len(playbooks_a)}")
print(f"\n{version_b}:")
print(f" Total entries: {len(playbooks_b)}")
# Calculate playbook entries per request (proxy for extraction rate)
if playbooks_a:
unique_requests_a = len(set(f.request_id for f in playbooks_a))
print(f"\n{version_a} unique requests: {unique_requests_a}")
if playbooks_b:
unique_requests_b = len(set(f.request_id for f in playbooks_b))
print(f"{version_b} unique requests: {unique_requests_b}")
return {
"version_a": {"version": version_a, "count": len(playbooks_a)},
"version_b": {"version": version_b, "count": len(playbooks_b)},
}
# Compare two versions
results = compare_versions_playbooks(client, "v2.0.0", "v2.1.0", "response_quality")Domain-Specific Configurations
E-commerce Support Playbooks
ecommerce_configs = [
{
"playbook_name": "order_resolution",
"playbook_definition_prompt": """
Evaluate how effectively the agent resolved order-related issues:
- Order status inquiries
- Shipping problems
- Return/refund requests
- Payment issues
""",
"playbook_aggregator_config": {
"min_cluster_size": 3,
"reaggregation_trigger_count": 5
}
},
{
"playbook_name": "product_recommendations",
"playbook_definition_prompt": """
Assess the quality of product recommendations:
- Relevance to user needs
- Price appropriateness
- Availability of recommended items
- User acceptance of recommendations
"""
},
{
"playbook_name": "checkout_assistance",
"playbook_definition_prompt": """
Evaluate help provided during checkout:
- Promo code application
- Payment method guidance
- Shipping option explanation
- Cart management assistance
"""
}
]
config = client.get_config()
config.playbook_configs = ecommerce_configs
client.set_config(config)Technical Support Playbooks
tech_support_configs = [
{
"playbook_name": "troubleshooting_effectiveness",
"playbook_definition_prompt": """
Evaluate the agent's troubleshooting capabilities:
- Problem identification accuracy
- Solution relevance
- Step-by-step guidance clarity
- Issue resolution rate
""",
"metadata_definition_prompt": """
Extract:
- problem_category: type of technical issue
- steps_provided: number of troubleshooting steps
- resolved: boolean
- escalation_needed: boolean
"""
},
{
"playbook_name": "technical_accuracy",
"playbook_definition_prompt": """
Assess the technical accuracy of agent responses:
- Correct terminology usage
- Accurate system information
- Valid configuration guidance
- Appropriate security practices
"""
},
{
"playbook_name": "documentation_referrals",
"playbook_definition_prompt": """
Evaluate how well the agent references documentation:
- Appropriate documentation links
- Relevant knowledge base articles
- Accurate API documentation references
- Helpful tutorial suggestions
"""
}
]
config = client.get_config()
config.playbook_configs = tech_support_configs
client.set_config(config)Healthcare Assistant Playbooks
healthcare_configs = [
{
"playbook_name": "symptom_understanding",
"playbook_definition_prompt": """
Evaluate the agent's understanding of reported symptoms:
- Accurate symptom capture
- Appropriate follow-up questions
- Proper symptom categorization
- Recognition of urgency levels
"""
},
{
"playbook_name": "guidance_appropriateness",
"playbook_definition_prompt": """
Assess the appropriateness of health guidance:
- Safe general health information
- Appropriate disclaimers
- Proper referral to professionals
- Avoidance of medical diagnosis
"""
},
{
"playbook_name": "empathy_and_support",
"playbook_definition_prompt": """
Evaluate emotional support provided:
- Empathetic responses
- Patient communication style
- Acknowledgment of concerns
- Supportive language use
"""
}
]
config = client.get_config()
config.playbook_configs = healthcare_configs
client.set_config(config)Best Practices
1. Start with Clear Playbook Definitions
# Good: Specific, measurable criteria
good_config = {
"playbook_name": "response_helpfulness",
"playbook_definition_prompt": """
Rate the helpfulness of the agent's response:
- Does it directly answer the user's question?
- Is the information actionable?
- Are next steps clear?
- Did the user express satisfaction?
"""
}
# Avoid: Vague definitions
# bad_config = {
# "playbook_name": "quality",
# "playbook_definition_prompt": "Is the response good?"
# }2. Use Appropriate Aggregation Thresholds
# For high-volume use cases
high_volume_config = {
"min_cluster_size": 10, # Need enough data for patterns
"reaggregation_trigger_count": 20 # Update frequently with new data
}
# For low-volume or critical playbook entries
low_volume_config = {
"min_cluster_size": 3, # Aggregate sooner
"reaggregation_trigger_count": 5 # Refresh more often
}3. Version Your Agents Consistently
# Always include agent version for tracking
client.publish_interaction(
user_id="user_123",
interactions=[...],
agent_version="v2.1.0", # Consistent versioning
source="production",
)
# Use semantic versioning
# v2.1.0 - major.minor.patch
# v2.1.0-beta - for testing
# v2.1.0-exp-prompt-change - for experiments4. Regular Playbook Review Cycle
def weekly_playbook_review(client):
"""Implement a weekly playbook review process."""
# Get pending agent playbooks for review
response = client.get_agent_playbooks(
status_filter=[Status.PENDING],
limit=50
)
print(f"Playbooks pending review: {len(response.agent_playbooks)}")
for playbook in response.agent_playbooks:
print(f"\n[{playbook.playbook_name}] {playbook.content}")
print(f"Agent Version: {playbook.agent_version}")
# In production, you'd have a UI or API to approve/reject
# Run weekly
weekly_playbook_review(client)