After a batch completes, SimPilot automatically extracts metrics from every completed simulation and builds a structured comparison. You can also trigger comparison manually for any existing batch.
How comparison works
Metric extraction
The compareResults tool iterates over all completed jobs in a batch. For each job, it calls the solver plugin's extractResults() function, which reads solver logs from the case directory and extracts:
- Convergence status: Whether residuals dropped below the convergence threshold (1e-3 for OpenFOAM)
- Iteration count: Total solver iterations completed
- Final residuals: Per-field residual values at the last iteration (e.g.,
finalResidual.Ux, finalResidual.p)
- Custom metrics: Solver-specific quantities like drag coefficient, heat flux, or mass flow rate
Comparison table
Extracted metrics are assembled into a markdown table with columns for case label, solver, convergence status, and every extracted metric key. This table is returned both as structured data and as rendered markdown in the chat.
Rankings
For each metric, the system computes best and worst rankings across all simulations. Rankings are sorted by metric value, making it easy to identify which configuration performed best.
Regression alerts
If a baseline simulation is specified (or the first simulation is used as default), the system computes deltas for every metric against the baseline and flags values that exceed configurable tolerance thresholds.
Storage
The full comparison result -- including summary, regression alerts, and any extraction errors -- is stored in the batch record's comparisonResult field with a generation timestamp.
Regression alerts
Regression detection compares each simulation's metrics against a baseline and applies tolerance thresholds:
| Metric type | Default tolerance |
|---|
| Iterations | 20% |
| Residuals | 50% |
| All other metrics | 10% |
You can override tolerances per metric when invoking the comparison. Alerts are classified by severity:
- Warning: Delta exceeds the tolerance threshold
- Critical: Delta exceeds 2x the tolerance threshold
Pass a tolerances map when comparing results to set per-metric thresholds. For example, you might set a tight 5% tolerance on drag coefficient but allow 30% variation in iteration count.
Comparison view UI
The batch comparison renders directly in the chat interface with:
- Side-by-side metric tables: Every simulation in a row, every metric in a column, with convergence status highlighted
- Best/worst rankings: For each metric, the best and worst performers are identified with their labels and values
- Regression alerts: Color-coded warnings and critical alerts showing which simulations deviated from the baseline and by how much
- CSV export: Download the comparison table as CSV for external analysis in spreadsheets or plotting tools
Handling extraction errors
Not every simulation in a batch may yield extractable metrics. Common reasons include:
- The simulation failed before producing any output
- Log files were not written due to an early crash
- Transient runtime errors (rate limits, timeouts) during metric extraction
The comparison system handles these gracefully:
- Extraction errors are collected and reported alongside the comparison results
- Failed extractions do not block the comparison -- metrics from successful jobs are still compared
- Transient errors are retried up to 4 times with exponential backoff before being marked as failures
Golden baselines
Golden baselines are version-controlled, validated baseline configurations that serve as reference points for regression testing. You will be able to pin a simulation result as a golden baseline, and all future runs of similar cases will be automatically compared against it. This feature is under active development.
Case history search
SimPilot stores metadata for every simulation run, enabling search across your simulation history. You can search past simulations by:
- Physics type: CFD, heat transfer, multiphase, reactive flow
- Geometry: Similar shapes or uploaded geometry files
- Solver: Which OpenFOAM application was used
- Turbulence model: k-epsilon, k-omega SST, laminar, LES variants
- Reynolds number: Approximate flow regime
- Convergence status: Filter for converged or failed runs
This makes it easy to find previous runs that are similar to your current problem, compare approaches, and build on past work rather than starting from scratch.