Reflexio Docs
Workflow Examples

Collecting & Using Agent Playbooks

End-to-end workflows for collecting user playbooks from interactions, aggregating them into actionable agent playbooks, and using them to improve agent behavior.

Collecting & Using Agent Playbooks

End-to-end workflows for collecting user playbooks from interactions, aggregating them into actionable agent playbooks, and using them to improve agent behavior. For method-level details, see the Playbook API Reference.

Overview

Reflexio's playbook system operates in two layers:

  1. User Playbooks: Individual playbook entries extracted from each user interaction based on your configured criteria
  2. Agent Playbooks: Consolidated insights from multiple user playbooks, surfacing patterns and actionable recommendations

This two-layer approach ensures you can both investigate individual interactions and identify broader trends across your user base.

Setup

from reflexio import ReflexioClient

client = ReflexioClient()  # uses REFLEXIO_API_KEY env var
# Self-hosted: client = ReflexioClient(url_endpoint="http://localhost:8081")

Configuring Playbook Extraction

Before collecting playbook entries, you need to configure what to extract. Each PlaybookConfig defines a specific type of playbook entry to capture from user interactions.

Basic Playbook Configuration

# Configure playbook extraction for customer satisfaction
satisfaction_playbook = {
    "playbook_name": "customer_satisfaction",
    "playbook_definition_prompt": """
    Analyze the user's response to determine their satisfaction level with the agent's help.
    Look for:
    - Explicit expressions of satisfaction or frustration
    - Whether the user's question was answered completely
    - Signs of confusion or repeated questions
    - Positive acknowledgments like "thanks" or "that's helpful"
    """
}

# Apply configuration
config = client.get_config()
config.playbook_configs = [satisfaction_playbook]
client.set_config(config)

Multiple Playbook Types

# Configure multiple playbook types for comprehensive analysis
playbook_configs = [
    {
        "playbook_name": "response_quality",
        "playbook_definition_prompt": """
        Evaluate the quality of the agent's responses:
        - Accuracy: Was the information provided correct?
        - Completeness: Did it fully address the user's question?
        - Clarity: Was the response easy to understand?
        - Relevance: Was the response on-topic?
        """,
        "metadata_definition_prompt": """
        Extract metadata:
        - quality_score: 1-5 rating
        - issues_found: list of specific problems
        - strengths: list of positive aspects
        """
    },
    {
        "playbook_name": "task_completion",
        "playbook_definition_prompt": """
        Determine if the agent successfully helped the user complete their intended task:
        - Did the user achieve their goal?
        - Were there any blockers or failures?
        - How many attempts were needed?
        """,
        "metadata_definition_prompt": """
        Extract metadata:
        - task_completed: boolean
        - attempts_required: number
        - blocking_issues: list of issues that prevented completion
        """
    },
    {
        "playbook_name": "user_sentiment",
        "playbook_definition_prompt": """
        Analyze the user's emotional state throughout the conversation:
        - Starting sentiment (neutral, positive, negative, frustrated)
        - Ending sentiment
        - Key moments that changed sentiment
        """,
        "metadata_definition_prompt": """
        Extract metadata:
        - initial_sentiment: string
        - final_sentiment: string
        - sentiment_change: improved/unchanged/worsened
        """
    }
]

config = client.get_config()
config.playbook_configs = playbook_configs
client.set_config(config)

Configuring Playbook Aggregation

# Configure how user playbooks are aggregated into agent playbooks
aggregator_config = {
    "min_cluster_size": 5,  # Minimum user playbooks before aggregation
    "reaggregation_trigger_count": 10  # Re-aggregate after this many new playbooks
}

# Apply aggregator to specific playbook type
playbook_with_aggregator = {
    "playbook_name": "pain_points",
    "playbook_definition_prompt": """
    Identify user pain points and frustrations:
    - Features that don't work as expected
    - Missing capabilities users ask for
    - Confusing interactions or workflows
    - Repeated user complaints
    """,
    "playbook_aggregator_config": aggregator_config
}

config = client.get_config()
config.playbook_configs = [playbook_with_aggregator]
client.set_config(config)

Collecting Playbook Entries

Playbook entries are automatically extracted when you publish interactions. Ensure your agent version is set for proper tracking.

Publishing Interactions for Playbook Extraction

# Publish interaction with agent version for playbook tracking
response = client.publish_interaction(
    user_id="user_123",
    interactions=[
        {"role": "User", "content": "How do I reset my password?"},
        {"role": "Agent", "content": "To reset your password, go to Settings > Security > Reset Password. You'll receive an email with a reset link."},
        {"role": "User", "content": "Perfect, that worked! Thanks!"}
    ],
    source="support_chat",
    agent_version="v2.1.0",  # Important for tracking playbook entries by version
    wait_for_response=True,
)

Multi-Turn Conversation Playbook Extraction

# Complex multi-turn conversation for richer playbook extraction
conversation = [
    {"role": "User", "content": "I need help with my order #12345"},
    {"role": "Agent", "content": "I'd be happy to help with order #12345. Let me look that up for you."},
    {"role": "User", "content": "It says delivered but I never received it"},
    {"role": "Agent", "content": "I see the tracking shows it was delivered yesterday. Let me check with the carrier for more details. Can you confirm your delivery address?"},
    {"role": "User", "content": "123 Main St, Apt 4B"},
    {"role": "Agent", "content": "The package was left at the building's front desk. Could you check there?"},
    {"role": "User", "content": "Found it! Thank you so much for your help!"}
]

client.publish_interaction(
    user_id="customer_456",
    interactions=conversation,
    source="order_support",
    agent_version="v2.1.0",
    session_id="order_issues_batch_001",
    wait_for_response=True,
)

Learning from Expert Responses

Expert content provides an alternative way to generate playbook entries -- by comparing agent responses against expert-provided ideal responses rather than relying solely on user behavior signals.

Publishing Expert-Reviewed Interactions

# Publish interactions with expert ideal responses
# Reflexio automatically detects expert_content and uses a specialized extraction pipeline
client.publish_interaction(
    user_id="user_789",
    interactions=[
        {"role": "User", "content": "How do I reset my password?"},
        {
            "role": "Agent",
            "content": "Go to Settings and click Reset Password.",
            "expert_content": (
                "To reset your password: 1) Go to Settings > Security > Reset Password. "
                "2) Enter your current password for verification. "
                "3) Choose a new password (minimum 12 characters, must include uppercase, "
                "lowercase, number, and special character). "
                "4) You'll receive a confirmation email. If you've forgotten your current "
                "password, use the 'Forgot Password' link on the login page instead."
            )
        }
    ],
    source="expert_review",
    agent_version="v2.1.0",
    session_id="expert_batch_001",
    wait_for_response=True,
)

Expert-derived playbook entries enter the standard aggregation pipeline -- once enough similar entries accumulate (based on your min_cluster_size), they get aggregated into actionable agent playbooks alongside user-derived entries.

Retrieving User Playbooks

User playbooks are individual playbook entries extracted from each interaction. Use these for detailed analysis of specific conversations.

Get All User Playbooks

# Retrieve all user playbooks
response = client.get_user_playbooks()

print(f"Total user playbooks: {len(response.user_playbooks)}")

for playbook in response.user_playbooks:
    print(f"\nPlaybook ID: {playbook.user_playbook_id}")
    print(f"Playbook Name: {playbook.playbook_name}")
    print(f"Agent Version: {playbook.agent_version}")
    print(f"Request ID: {playbook.request_id}")
    print(f"Content: {playbook.content}")
    print("-" * 50)

Filter User Playbooks by Name

# Get user playbooks for a specific playbook type
response = client.get_user_playbooks(
    playbook_name="response_quality",
    limit=100
)

print(f"Found {len(response.user_playbooks)} response quality playbook entries")

# Analyze the entries
quality_scores = []
for playbook in response.user_playbooks:
    print(f"Request: {playbook.request_id}")
    print(f"Content: {playbook.content[:200]}...")
    print("-" * 30)

Filter by Status

# Get only current (active) user playbooks
current_playbooks = client.get_user_playbooks(
    status_filter=[None],  # None represents CURRENT status
    limit=50
)

# Get pending user playbooks (from rerun operations)
pending_playbooks = client.get_user_playbooks(
    status_filter=[Status.PENDING],
    limit=50
)

# Get archived user playbooks
archived_playbooks = client.get_user_playbooks(
    status_filter=[Status.ARCHIVED],
    limit=50
)

print(f"Current: {len(current_playbooks.user_playbooks)}")
print(f"Pending: {len(pending_playbooks.user_playbooks)}")
print(f"Archived: {len(archived_playbooks.user_playbooks)}")

Retrieving Aggregated Agent Playbooks

Aggregated playbooks consolidate multiple user playbooks into actionable insights. These are more suitable for decision-making and trend analysis.

Get All Agent Playbooks

# Retrieve agent playbooks
response = client.get_agent_playbooks()

print(f"Total agent playbooks: {len(response.agent_playbooks)}")

for playbook in response.agent_playbooks:
    print(f"\nPlaybook ID: {playbook.agent_playbook_id}")
    print(f"Name: {playbook.playbook_name}")
    print(f"Agent Version: {playbook.agent_version}")
    print(f"Status: {playbook.playbook_status.value}")
    print(f"Content: {playbook.content}")
    print(f"Metadata: {playbook.playbook_metadata}")
    print("-" * 50)

Filter by Playbook Name

# Get agent playbooks for a specific type
response = client.get_agent_playbooks(
    playbook_name="pain_points",
    limit=50
)

print("Aggregated Pain Points:")
for playbook in response.agent_playbooks:
    print(f"\n[{playbook.playbook_status.value}] {playbook.content}")

Using Cache for Efficiency

# First call - fetches from API
playbooks = client.get_agent_playbooks(playbook_name="response_quality")

# Second call with same params - uses cache
cached_playbooks = client.get_agent_playbooks(playbook_name="response_quality")

# Force refresh to bypass cache
fresh_playbooks = client.get_agent_playbooks(
    playbook_name="response_quality",
    force_refresh=True
)

Analyzing Playbook Data

Playbook Volume Analysis

from datetime import datetime
from collections import defaultdict

# Get all user playbooks
response = client.get_user_playbooks(limit=500)

# Analyze by playbook type
playbook_by_type = defaultdict(list)
for playbook in response.user_playbooks:
    playbook_by_type[playbook.playbook_name].append(playbook)

print("Playbook Distribution by Type:")
for playbook_name, entries in sorted(
    playbook_by_type.items(), key=lambda x: len(x[1]), reverse=True
):
    print(f"  {playbook_name}: {len(entries)} entries")

Agent Version Comparison

# Compare playbook entries across agent versions
version_playbooks = defaultdict(list)

response = client.get_user_playbooks(
    playbook_name="response_quality",
    limit=500
)

for entry in response.user_playbooks:
    version_playbooks[entry.agent_version].append(entry)

print("Playbook Count by Agent Version:")
for version, entries in sorted(version_playbooks.items()):
    print(f"  {version}: {len(entries)} entries")

# Identify versions with the most playbook entries
most_playbooks_version = max(version_playbooks.items(), key=lambda x: len(x[1]))
print(f"\nMost entries: {most_playbooks_version[0]} ({len(most_playbooks_version[1])} entries)")

Temporal Analysis

from datetime import datetime, timedelta

response = client.get_user_playbooks(limit=1000)

# Group by date
daily_playbooks = defaultdict(int)
for playbook in response.user_playbooks:
    date = datetime.fromtimestamp(playbook.created_at).date()
    daily_playbooks[date] += 1

print("Daily Playbook Volume (last 7 days):")
today = datetime.now().date()
for i in range(7):
    date = today - timedelta(days=i)
    count = daily_playbooks.get(date, 0)
    bar = "#" * (count // 5)  # Simple visualization
    print(f"  {date}: {count:4d} {bar}")

Playbook Status Distribution

# Analyze agent playbook approval rates
response = client.get_agent_playbooks(limit=200)

status_counts = defaultdict(int)
for playbook in response.agent_playbooks:
    status_counts[playbook.playbook_status.value] += 1

print("Aggregated Playbook Status Distribution:")
total = len(response.agent_playbooks)
for status, count in status_counts.items():
    percentage = (count / total * 100) if total > 0 else 0
    print(f"  {status}: {count} ({percentage:.1f}%)")

Production Patterns

Monitoring Collection Health

def check_playbook_health(client, min_daily_playbookss=10):
    """
    Monitor playbook collection health.

    Args:
        client: ReflexioClient instance
        min_daily_playbookss: Minimum expected daily playbook entries
    """
    from datetime import datetime, timedelta

    response = client.get_user_playbooks(limit=100)

    # Check if we're collecting playbook entries
    if not response.user_playbooks:
        print("WARNING: No user playbooks found!")
        return False

    # Check recent activity
    today = datetime.now()
    yesterday = today - timedelta(days=1)
    yesterday_ts = int(yesterday.timestamp())

    recent_count = sum(
        1 for f in response.user_playbooks if f.created_at > yesterday_ts
    )

    if recent_count < min_daily_playbookss:
        print(f"WARNING: Only {recent_count} playbook entries in last 24h (expected {min_daily_playbookss}+)")
        return False

    print(f"Playbook health OK: {recent_count} playbook entries in last 24h")
    return True


# Run health check
check_playbook_health(client)

Automated Playbook Report

def generate_playbook_report(client, playbook_name=None, days=7):
    """
    Generate a comprehensive playbook report.

    Args:
        client: ReflexioClient instance
        playbook_name: Optional filter by playbook type
        days: Number of days to include in report
    """
    from datetime import datetime, timedelta
    from collections import defaultdict

    # Get user playbooks
    raw_response = client.get_user_playbooks(
        limit=1000,
        playbook_name=playbook_name
    )

    # Get agent playbooks
    agg_response = client.get_agent_playbooks(
        limit=200,
        playbook_name=playbook_name
    )

    # Calculate metrics
    cutoff = datetime.now() - timedelta(days=days)
    cutoff_ts = int(cutoff.timestamp())

    recent_raw = [f for f in raw_response.user_playbooks if f.created_at > cutoff_ts]

    # Version breakdown
    versions = defaultdict(int)
    for f in recent_raw:
        versions[f.agent_version] += 1

    # Print report
    print(f"\n{'='*60}")
    print(f"PLAYBOOK REPORT - Last {days} Days")
    if playbook_name:
        print(f"Filtered by: {playbook_name}")
    print(f"{'='*60}")

    print(f"\nUSER PLAYBOOKS:")
    print(f"  Total (all time): {len(raw_response.user_playbooks)}")
    print(f"  Recent ({days} days): {len(recent_raw)}")
    print(f"  Daily average: {len(recent_raw) / days:.1f}")

    print(f"\nAGENT PLAYBOOKS:")
    print(f"  Total: {len(agg_response.agent_playbooks)}")

    status_counts = defaultdict(int)
    for f in agg_response.agent_playbooks:
        status_counts[f.playbook_status.value] += 1
    for status, count in status_counts.items():
        print(f"  {status}: {count}")

    print(f"\nBY AGENT VERSION:")
    for version, count in sorted(versions.items(), key=lambda x: x[1], reverse=True):
        print(f"  {version}: {count}")

    print(f"\n{'='*60}\n")


# Generate report
generate_playbook_report(client, playbook_name="response_quality", days=30)

Version-Specific Playbook Analysis

def analyze_version_playbooks(client, agent_version):
    """
    Deep analysis of playbook entries for a specific agent version.

    Args:
        client: ReflexioClient instance
        agent_version: Agent version to analyze
    """
    # Get all user playbooks (we'll filter client-side for version)
    response = client.get_user_playbooks(limit=500)

    # Filter by version
    version_playbooks = [
        f for f in response.user_playbooks if f.agent_version == agent_version
    ]

    if not version_playbooks:
        print(f"No playbook entries found for version {agent_version}")
        return

    print(f"\nAnalysis for Agent Version: {agent_version}")
    print(f"Total entries: {len(version_playbooks)}")

    # Group by playbook type
    by_type = defaultdict(list)
    for f in version_playbooks:
        by_type[f.playbook_name].append(f)

    print("\nBreakdown by playbook type:")
    for playbook_type, entries in by_type.items():
        print(f"\n  {playbook_type}: {len(entries)} entries")

        # Show sample playbook content
        if entries:
            sample = entries[0]
            content_preview = sample.content[:150]
            print(f"    Sample: {content_preview}...")


# Analyze specific version
analyze_version_playbooks(client, "v2.1.0")

Continuous Improvement Workflow

def identify_improvement_areas_from_playbooks(client, top_n=5):
    """
    Identify top areas for agent improvement based on playbook data.

    Args:
        client: ReflexioClient instance
        top_n: Number of top issues to return
    """
    # Get agent playbooks
    response = client.get_agent_playbooks(limit=100)

    # Focus on approved playbooks (validated insights)
    approved = [
        f for f in response.agent_playbooks
        if f.playbook_status == PlaybookStatus.APPROVED
    ]

    print(f"\nTop {top_n} Validated Improvement Areas:")
    print("-" * 50)

    for i, playbook in enumerate(approved[:top_n], 1):
        print(f"\n{i}. [{playbook.playbook_name}]")
        print(f"   {playbook.content}")
        if playbook.playbook_metadata:
            print(f"   Metadata: {playbook.playbook_metadata}")


# Identify improvement areas
identify_improvement_areas_from_playbooks(client, top_n=5)

A/B Testing with Playbooks

def compare_versions_playbooks(client, version_a, version_b, playbook_name):
    """
    Compare playbook entries between two agent versions for A/B testing.

    Args:
        client: ReflexioClient instance
        version_a: First agent version
        version_b: Second agent version
        playbook_name: Playbook type to compare
    """
    # Get user playbooks for the specific playbook type
    response = client.get_user_playbooks(
        playbook_name=playbook_name,
        limit=500
    )

    # Split by version
    playbooks_a = [f for f in response.user_playbooks if f.agent_version == version_a]
    playbooks_b = [f for f in response.user_playbooks if f.agent_version == version_b]

    print(f"\nA/B Test Comparison: {playbook_name}")
    print(f"{'='*50}")
    print(f"\n{version_a}:")
    print(f"  Total entries: {len(playbooks_a)}")

    print(f"\n{version_b}:")
    print(f"  Total entries: {len(playbooks_b)}")

    # Calculate playbook entries per request (proxy for extraction rate)
    if playbooks_a:
        unique_requests_a = len(set(f.request_id for f in playbooks_a))
        print(f"\n{version_a} unique requests: {unique_requests_a}")

    if playbooks_b:
        unique_requests_b = len(set(f.request_id for f in playbooks_b))
        print(f"{version_b} unique requests: {unique_requests_b}")

    return {
        "version_a": {"version": version_a, "count": len(playbooks_a)},
        "version_b": {"version": version_b, "count": len(playbooks_b)},
    }


# Compare two versions
results = compare_versions_playbooks(client, "v2.0.0", "v2.1.0", "response_quality")

Domain-Specific Configurations

E-commerce Support Playbooks

ecommerce_configs = [
    {
        "playbook_name": "order_resolution",
        "playbook_definition_prompt": """
        Evaluate how effectively the agent resolved order-related issues:
        - Order status inquiries
        - Shipping problems
        - Return/refund requests
        - Payment issues
        """,
        "playbook_aggregator_config": {
            "min_cluster_size": 3,
            "reaggregation_trigger_count": 5
        }
    },
    {
        "playbook_name": "product_recommendations",
        "playbook_definition_prompt": """
        Assess the quality of product recommendations:
        - Relevance to user needs
        - Price appropriateness
        - Availability of recommended items
        - User acceptance of recommendations
        """
    },
    {
        "playbook_name": "checkout_assistance",
        "playbook_definition_prompt": """
        Evaluate help provided during checkout:
        - Promo code application
        - Payment method guidance
        - Shipping option explanation
        - Cart management assistance
        """
    }
]

config = client.get_config()
config.playbook_configs = ecommerce_configs
client.set_config(config)

Technical Support Playbooks

tech_support_configs = [
    {
        "playbook_name": "troubleshooting_effectiveness",
        "playbook_definition_prompt": """
        Evaluate the agent's troubleshooting capabilities:
        - Problem identification accuracy
        - Solution relevance
        - Step-by-step guidance clarity
        - Issue resolution rate
        """,
        "metadata_definition_prompt": """
        Extract:
        - problem_category: type of technical issue
        - steps_provided: number of troubleshooting steps
        - resolved: boolean
        - escalation_needed: boolean
        """
    },
    {
        "playbook_name": "technical_accuracy",
        "playbook_definition_prompt": """
        Assess the technical accuracy of agent responses:
        - Correct terminology usage
        - Accurate system information
        - Valid configuration guidance
        - Appropriate security practices
        """
    },
    {
        "playbook_name": "documentation_referrals",
        "playbook_definition_prompt": """
        Evaluate how well the agent references documentation:
        - Appropriate documentation links
        - Relevant knowledge base articles
        - Accurate API documentation references
        - Helpful tutorial suggestions
        """
    }
]

config = client.get_config()
config.playbook_configs = tech_support_configs
client.set_config(config)

Healthcare Assistant Playbooks

healthcare_configs = [
    {
        "playbook_name": "symptom_understanding",
        "playbook_definition_prompt": """
        Evaluate the agent's understanding of reported symptoms:
        - Accurate symptom capture
        - Appropriate follow-up questions
        - Proper symptom categorization
        - Recognition of urgency levels
        """
    },
    {
        "playbook_name": "guidance_appropriateness",
        "playbook_definition_prompt": """
        Assess the appropriateness of health guidance:
        - Safe general health information
        - Appropriate disclaimers
        - Proper referral to professionals
        - Avoidance of medical diagnosis
        """
    },
    {
        "playbook_name": "empathy_and_support",
        "playbook_definition_prompt": """
        Evaluate emotional support provided:
        - Empathetic responses
        - Patient communication style
        - Acknowledgment of concerns
        - Supportive language use
        """
    }
]

config = client.get_config()
config.playbook_configs = healthcare_configs
client.set_config(config)

Best Practices

1. Start with Clear Playbook Definitions

# Good: Specific, measurable criteria
good_config = {
    "playbook_name": "response_helpfulness",
    "playbook_definition_prompt": """
    Rate the helpfulness of the agent's response:
    - Does it directly answer the user's question?
    - Is the information actionable?
    - Are next steps clear?
    - Did the user express satisfaction?
    """
}

# Avoid: Vague definitions
# bad_config = {
#     "playbook_name": "quality",
#     "playbook_definition_prompt": "Is the response good?"
# }

2. Use Appropriate Aggregation Thresholds

# For high-volume use cases
high_volume_config = {
    "min_cluster_size": 10,  # Need enough data for patterns
    "reaggregation_trigger_count": 20  # Update frequently with new data
}

# For low-volume or critical playbook entries
low_volume_config = {
    "min_cluster_size": 3,  # Aggregate sooner
    "reaggregation_trigger_count": 5  # Refresh more often
}

3. Version Your Agents Consistently

# Always include agent version for tracking
client.publish_interaction(
    user_id="user_123",
    interactions=[...],
    agent_version="v2.1.0",  # Consistent versioning
    source="production",
)

# Use semantic versioning
# v2.1.0 - major.minor.patch
# v2.1.0-beta - for testing
# v2.1.0-exp-prompt-change - for experiments

4. Regular Playbook Review Cycle

def weekly_playbook_review(client):
    """Implement a weekly playbook review process."""

    # Get pending agent playbooks for review
    response = client.get_agent_playbooks(
        status_filter=[Status.PENDING],
        limit=50
    )

    print(f"Playbooks pending review: {len(response.agent_playbooks)}")

    for playbook in response.agent_playbooks:
        print(f"\n[{playbook.playbook_name}] {playbook.content}")
        print(f"Agent Version: {playbook.agent_version}")
        # In production, you'd have a UI or API to approve/reject


# Run weekly
weekly_playbook_review(client)