All

Playbook Optimization

Optional GEPA-backed refinement for pending playbook wording.

Playbook optimization is an optional refinement step for playbooks that already exist. It asks a narrow question: would this playbook steer the assistant better if the guidance were worded differently?

It does not replace extraction, aggregation, or human review. Reflexio still learns signals from interactions, aggregates repeated user-playbook patterns into agent playbooks, and requires approval before an agent playbook becomes production guidance. The optimizer sits between aggregation and approval, sharpening pending guidance before a human decides whether to approve it.

Where Optimization Fits

The normal playbook path is:

Publish interactions from your agent.
Extract user playbooks from correction, preference, expert, or outcome signals.
Aggregate repeated patterns into PENDING agent playbooks.
Optionally optimize the pending playbook's content.
Review and approve the final wording.
Retrieve approved playbooks to steer the agent for future users.

Optimization is intentionally late in the path. By the time it runs, Reflexio already knows which real interaction windows produced the playbook. That gives the optimizer scenarios it can replay when comparing the original wording against candidate wording.

What It Changes

The optimizer focuses on playbook content: the natural-language instruction injected into the assistant. It can make the guidance clearer, more specific, less ambiguous, or less likely to cause regressions.

It is not a discovery engine for brand-new lessons. If the underlying playbook is about respecting a user's stated budget, optimization should improve how that budget rule is phrased. It should not turn the row into an unrelated recommendation strategy.

For agent playbooks, a winning candidate replaces only a current PENDING row. The successor remains PENDING, so the normal approval workflow still decides whether it can affect production retrieval.

How a Run Works

Each optimizer run targets one playbook.

Load the incumbent — the current playbook content.
Resolve source windows — the interaction windows that produced the playbook.
Split training and validation — GEPA mutates candidates on training windows and commits only if validation windows pass.
Run paired assistant rollouts — Reflexio calls your assistant backend twice with the same user turns: once with the incumbent playbook and once with a candidate.
Judge the pair — a judge model compares the two assistant responses and records the winner, score, Likert rating, and feedback.
Commit only if thresholds pass — the candidate must beat the incumbent on enough held-out windows and clear the configured score thresholds.

Only the injected playbook content changes between the paired rollouts. The scenario, messages, and assistant backend stay the same, so the comparison is about whether the candidate guidance produced better behavior.

Source Windows

Source windows are the evidence behind optimization. For agent playbooks, Reflexio snapshots the contributing interaction windows at aggregation time. That makes later optimization replayable even if the original user playbooks are archived or deduplicated.

By default:

Setting	Default	Meaning
`max_validation_windows`	`2`	Hold out up to two windows for commit checks.
`min_commit_windows`	`2`	Require the candidate to win on at least two validation windows.
`reflection_minibatch_size`	`2`	Number of training windows GEPA samples while mutating candidates.
`max_metric_calls`	`20`	Maximum candidate/window evaluations per run.
`max_turns`	`4`	Maximum GEPA optimization turns.

With the defaults, an agent playbook needs at least three source windows to run: two validation windows and one training window. If a target has only one or two windows, Reflexio skips it because no candidate could satisfy the default commit threshold.

To allow one-window or two-window optimization, lower min_commit_windows to 1. When only one source window is available, Reflexio uses that same window for training and validation. That is useful for small local workflows, but it is weaker evidence than a held-out validation set.

Reflexio caches evaluations during a run, so the same candidate/window pair is not scored twice.

Safety Model

Optimization is gated at several points:

Gate	Behavior
Master switch	`enabled` is `false` by default.
Target switch	`optimize_agent_playbooks` and `optimize_user_playbooks` are both `false` by default.
Assistant backend	If neither `webhook_url` nor `assistant_script_path` is configured, the run is skipped.
Adoption path	A winner is only written when the matching auto-update flag is enabled.
Approval	Optimized agent-playbook successors remain `PENDING`; humans still approve or reject them.

APPROVED agent playbooks are never modified, archived, or replaced by this optimizer. This keeps approved production guidance stable until a human intentionally changes it.

User-playbook optimization is separately gated with optimize_user_playbooks and auto_update_user_playbooks. With the default auto_update_user_playbooks=false, Reflexio skips user-playbook optimization because there is no configured adoption path.

Enterprise offline-tuner edits use a separate audit path. Rows created by that tuner use source="offline_optimizer" so operators can distinguish offline tuner mutations from extracted, reflected, manual, or GEPA optimizer rows.

Assistant Backend

Optimization needs a way to ask your assistant, "how would you respond with this playbook?" Configure exactly one backend:

webhook_url for an HTTP assistant endpoint
assistant_script_path for a local executable

Both receive the same JSON payload:

{
  "messages": [{ "role": "user", "content": "..." }],
  "playbooks": [{ "id": 123, "content": "...", "trigger": "..." }]
}

Both must return:

{ "content": "assistant response" }

For a local script, Reflexio invokes [assistant_script_path, ...assistant_script_args], writes the JSON payload to stdin, and reads the JSON response from stdout.

Minimum Configuration

The optimizer is off by default. To optimize pending agent playbooks, enable the master switch, enable the agent-playbook target, and provide one assistant backend.

{
  "playbook_optimizer_config": {
    "enabled": true,
    "optimize_agent_playbooks": true,
    "optimize_user_playbooks": false,
    "auto_update_pending_agent_playbooks": true,
    "auto_update_user_playbooks": false,
    "webhook_url": "https://assistant.example.com/reflexio-rollout",
    "assistant_script_path": null,
    "assistant_script_args": [],
    "max_metric_calls": 20,
    "max_turns": 4,
    "early_stop_score": 0.9,
    "max_validation_windows": 2,
    "min_commit_windows": 2,
    "min_commit_score": 0.75,
    "min_commit_likert": 4
  }
}

Use assistant_script_path instead of webhook_url for local evaluation. Do not set both; Reflexio rejects configurations with two assistant backends.

When To Use It

Use playbook optimization when you already have pending agent playbooks and want stronger wording before review. It is especially useful when:

a pending playbook is directionally right but too vague
similar source windows suggest a more specific instruction
you want replayable evidence that a rewrite improves assistant behavior
you are tuning local or nightly workflows and can afford the extra evaluations

Skip it when you do not yet have enough source windows, when the assistant backend is not representative of production, or when the pending playbook needs human product judgment rather than wording refinement.