Understand onyx.md and eval.sh

When you start with /onyx ..., the agent creates the repo-side files it needs to run auto research. You can review and edit these files to steer the loop.

`onyx/onyx.md`: Research Brief and Steering File

onyx.md is the durable context for the agent. It should explain what the agent is optimizing, how to measure progress, what files are in scope, and what constraints matter. The agent creates the first version. You can edit it at any time between runs. Common sections:

objective;
primary metric and direction;
secondary metrics or tradeoffs;
how to run the eval;
files in scope;
off-limits files or APIs;
constraints;
what has already been tried.

Example:

# Onyx Research: Tune PID Gains

## Objective

Minimize tracking error for the arm controller without increasing overshoot.

## Metrics

- Primary: tracking_error, error, minimize
- Secondary: overshoot_percent, percent, minimize

## Files in Scope

- `src/control/pid.ts`: PID gains and gain scheduling.
- `scripts/evaluate_pid.sh`: existing controller evaluation script.

## Constraints

- Do not change the `MotorCommand` interface.
- Keep overshoot under 5%.
- Prefer simple gain changes before adding new control structure.

## What's Been Tried

- Higher `ki` reduced steady-state error but caused overshoot.

`onyx/eval.sh`: Measurement Script

eval.sh is the repeatable measurement entry point. The agent creates it so onyx exp run can produce comparable results. It must print at least one metric line:

METRIC name=value

Example:

#!/bin/bash
set -euo pipefail

./scripts/evaluate_pid.sh > /tmp/onyx-pid-results.txt
tracking_error=$(awk '/tracking_error/ {print $2}' /tmp/onyx-pid-results.txt)
overshoot=$(awk '/overshoot_percent/ {print $2}' /tmp/onyx-pid-results.txt)

echo "METRIC tracking_error=$tracking_error"
echo "METRIC overshoot_percent=$overshoot"

The primary metric should match the branch metric the agent chose from your prompt.

Optional: `onyx/checks.sh`

The agent may create checks.sh when your constraints require correctness backpressure. Checks run after a passing eval and do not affect eval timing.

#!/bin/bash
set -euo pipefail

bun run typecheck
bun test

If checks fail, the experiment is recorded as checks_failed.

How to Steer the Agent

Prefer editing onyx.md when you want to change agent behavior. Prefer editing eval.sh when the measurement itself is wrong or missing useful metrics. Examples:

You want to change	Edit
What strategy the agent should try next	`onyx/onyx.md`
Which files are safe to modify	`onyx/onyx.md`
The benchmark command	`onyx/eval.sh`
Metric parsing	`onyx/eval.sh`
Correctness tests after successful evals	`onyx/checks.sh`

After editing, tell the agent:

/onyx Continue with the updated onyx.md and eval script

Protected During Measurement

During a measured run, agents should not modify:

onyx/eval.sh
onyx/checks.sh

The agent can improve these files between runs, but should commit those changes before using them to compare experiments.

​onyx/onyx.md: Research Brief and Steering File

​onyx/eval.sh: Measurement Script

​Optional: onyx/checks.sh

​How to Steer the Agent

​Protected During Measurement

`onyx/onyx.md`: Research Brief and Steering File

`onyx/eval.sh`: Measurement Script

Optional: `onyx/checks.sh`

How to Steer the Agent

Protected During Measurement