Skip to main content
After you run a prompt like /onyx Tune my PID controller gains, minimize error, the agent starts turning that goal into measured git history.

From Prompt to Branch

The agent first converts your prompt into branch metadata:
FieldExample
Nametune-pid-gains
Metrictracking_error
Uniterror or %
Directionminimize
DescriptionTune PID gains while keeping overshoot below 5%.
It then creates an append-only git branch named onyx/{name}.

Baseline Setup

Before changing product code, the agent should create and commit:
  • onyx/onyx.md: the research brief and steering file;
  • onyx/eval.sh: the measurement script;
  • optionally onyx/checks.sh: correctness checks.
Then it runs a baseline experiment so future attempts have something to compare against.

Experiment Loop

Each attempt follows the same shape:
1

Edit code

The agent changes files in scope based on the strategy in onyx.md.
2

Commit

It commits the attempt to the onyx/{name} branch.
3

Measure

It runs onyx/eval.sh through onyx exp run and parses METRIC lines.
4

Record

It logs the result, metric, status, output summary, and agent notes.
5

Sync

It pushes the branch and flushes records to the Onyx app.

What Gets Recorded

Every experiment points to an exact commit and includes:
  • status, such as succeeded, failed, or checks_failed;
  • primary metric value;
  • secondary metrics emitted by eval.sh;
  • output summary;
  • optional checks result;
  • agent notes about what was learned.
These records build the branch timeline and best-so-far view in the app.

How to Watch Progress

In the Onyx app:
  • Graph view shows the branch and current best metric.
  • Timeline shows every attempt and the best-so-far steps.
  • File tree shows changed files plus onyx/ context.
  • Diff view compares an experiment to the previous best when possible.
In the terminal:
onyx status
onyx exp list --limit 20
onyx listen

Manual Debugging

The agent runs the CLI for you, but these commands are useful when debugging a setup:
onyx exp run --no-log
onyx exp run
onyx exp log --description "manual baseline check"
onyx push
If the agent cannot measure progress, inspect onyx/eval.sh first. It must print a numeric line such as:
METRIC tracking_error=0.18