How-to guide

Diagnosing Failures

When materialization, meshing, or solving fails, how to find the real root cause without overfitting to the error message.

1 min read

Find the failed layer first

Look at where the failure happened: materialization, meshing, solving, post-processing, or reporting. Each layer has its own evidence (CaseSpec validation errors, mesh log, solver log, function-object output, report build log). Diagnose from the right layer — not from the chat transcript.

Use the hypothesis ledger

ADR 0016 requires a recorded hypothesis before any rerun. The agent will write a RecoveryDecisionRecord with the suspected cause, the change it intends to make, and how it will know the fix worked. This prevents blind-retries that mask the real bug.

Recognize common patterns

  • Divergence — usually solver-stiffness, Courant number, or under-relaxation. Check residual history, Co-number, and physics regime.
  • Floating-point failure — bad initial condition, missing reference cell, malformed BC.
  • Mesh quality — failed mesh-quality gate before the run even starts. Solver pack will refuse to launch.
  • Marker mismatch (SU2) — boundary marker tags in config don't match the mesh.
  • Missing function-object — a QoI extraction asked for runtime sampling that wasn't declared at run time.

Fix the spec, not the file

If you change a generated dictionary by hand, the next materialization will wipe it out. Always trace the fix back to the CaseSpec, fix it there, and re-materialize.

Spawn a diagnose subagent

For thorny failures, the kernel can spawn a focused diagnose subagent with elevated read tools (solver log, parser output, knowledge search, web search) but no write tools. It returns a typed finding; the parent agent decides whether to act on it.

Was this page helpful?

Edit this page on GitHub

Search docs

Find pages across the SimPilot docs.