Simulation errors are common. SimPilot handles them with an evaluator-optimizer loop: run a command, evaluate the outcome, gather targeted context, apply a narrow fix, and retry with full awareness of what has already been tried.
Command evaluation
Every recovery cycle starts from the actual command result, not a canned pattern:
- Explicit failure: the command returned a non-zero
exitCode
- Silent solver failure: the command exited cleanly but solver-aware log analysis still detected a failed run
- Success: no failure markers are present, so recovery is skipped
This keeps recovery grounded in concrete evidence from the workspace instead of speculative rewrites.
Recovery chain
When a command fails, the agent gathers context in a fixed order and stops as soon as it has enough evidence:
1. Skills and local references
Load the relevant skill, then inspect agent_resources/ for the exact tutorial, reference dictionary, or reusable script that matches the failed command.
2. Knowledge, memory, and docs
Search the internal knowledge base, query personal or organization memory when prior fixes matter, and consult external docs through searchDocs for authoritative syntax or solver behavior.
3. Targeted fix
Edit only the specific file or setting implicated by the evidence. Recovery explicitly avoids regenerating the entire case when a narrow correction is sufficient.
4. Retry and compare
Rerun the command and compare the new result against the previous failure. Retries are tracked as fixed, still_failing, or different_error.
5. Late escalation
If local sources are exhausted, the agent can escalate to webSearch / retrieveUrl or delegate focused troubleshooting to agent("error-diagnostician").
Retry history
Recovery is retry-aware. Every completed repair cycle records:
- The failure summary
- The fix that was attempted
- The retry outcome (
fixed, still_failing, or different_error)
This prevents the agent from repeating the same losing edit and makes later retries more deliberate.
Error-diagnostician subagent
For repeated or ambiguous failures, SimPilot can delegate to a dedicated error-diagnostician subagent. That diagnostician can:
- Inspect logs and workspace files with read-only
runCommand
- Search internal knowledge, organization memory, and personal memory
- Consult
searchDocs, webSearch, and retrieveUrl
- Return a focused diagnosis with the next targeted fix to try
The diagnostician is prompt-guided, not mandatory. It is used when deeper investigation is warranted, not for every routine failure.
Web search as late fallback
Web search is available during recovery, but it is intentionally late in the chain. SimPilot first exhausts local skills, workspace references, internal knowledge, memory, and official docs. Only then does it search the broader web for edge cases, version-specific behavior, or missing documentation.
When web search is used, the sources are surfaced in the chat so you can see exactly what informed the fix.
Debugging protocols
The error recovery system still follows disciplined debugging protocols that prevent guesswork:
Pre-simulation inspection
Before running any solver (simpleFoam, pimpleFoam, blockMesh, snappyHexMesh, etc.), the agent must complete a mandatory checklist:
- Mesh verification -- Run
checkMesh, verify non-orthogonality < 70 degrees, max skewness < 4, aspect ratio < 100, and confirm all expected boundary patches exist
- Field file consistency -- Verify dimensions match the solver type, patch names in
0/ files match constant/polyMesh/boundary, all required turbulence fields exist, and initial values are physically plausible
- Scheme and solver consistency -- Confirm
fvSchemes time scheme matches solver type, every div(phi,X) term has an explicit entry, fvSolution covers all solved fields, and the algorithm block name matches the solver
- controlDict validation -- Confirm
application keyword matches the intended solver, endTime is appropriate, and writeInterval/purgeWrite are set
The solver only runs after all checks pass.
Investigation-before-edit
When a simulation error occurs, the agent must investigate before modifying any files:
- Read the actual error output (
tail -n 50 log.<solver>)
- Check mesh state (
checkMesh)
- Inspect residuals to identify divergence patterns
- Check boundary conditions against the mesh
- Only after diagnosing the root cause does the agent edit case files
This prevents the agent from rewriting entire cases when a targeted fix would suffice.
Forensic debugging
For persistent or complex failures, the agent follows a structured backward trace:
- Isolate the symptom -- Identify exactly what failed, which field diverged, and at what iteration
- Backward trace -- Follow the computation chain backward using actual numerical values
- Quantitative physicality test -- Compare actual values against physics expectations
- Classify the originating error -- Trace to mesh, boundary condition, numerical scheme, solver configuration, or physical setup
- Prove before fix -- State specific evidence, explain the causal chain, predict what the fix will change, then implement
The agent is prohibited from trying random edits without evidence.
Post-success protocol
When a simulation converges with physically plausible results:
- Stop making changes -- The agent does not optimize or retune a working simulation unless explicitly asked
- Report clearly -- Final residuals, iteration count, key quantities, and any warnings
- State the validation level -- Level 1 (case setup complete), Level 2 (solver converged), or Level 3 (physics validated)
- Ask before proceeding -- No unsolicited parameter tuning, mesh refinement, or physics additions