Results Comparison

Compare results across simulations with automated metrics and regression alerts.

After a batch completes, SimPilot automatically extracts metrics from every completed simulation and builds a structured comparison. You can also trigger comparison manually for any existing batch.

How comparison works

Metric extraction

The compareResults tool iterates over all completed jobs in a batch. For each job, it calls the solver plugin's extractResults() function, which reads solver logs from the case directory and extracts:

Convergence status: Whether residuals dropped below the convergence threshold (1e-3 for OpenFOAM)
Iteration count: Total solver iterations completed
Final residuals: Per-field residual values at the last iteration (e.g., finalResidual.Ux, finalResidual.p)
Custom metrics: Solver-specific quantities like drag coefficient, heat flux, or mass flow rate

Comparison table

Extracted metrics are assembled into a markdown table with columns for case label, solver, convergence status, and every extracted metric key. This table is returned both as structured data and as rendered markdown in the chat.

Rankings

For each metric, the system computes best and worst rankings across all simulations. Rankings are sorted by metric value, making it easy to identify which configuration performed best.

Regression alerts

If a baseline simulation is specified (or the first simulation is used as default), the system computes deltas for every metric against the baseline and flags values that exceed configurable tolerance thresholds.

Storage

The full comparison result -- including summary, regression alerts, and any extraction errors -- is stored in the batch record's comparisonResult field with a generation timestamp.

Regression alerts

Regression detection compares each simulation's metrics against a baseline and applies tolerance thresholds:

Metric type	Default tolerance
Iterations	20%
Residuals	50%
All other metrics	10%

You can override tolerances per metric when invoking the comparison. Alerts are classified by severity:

Warning: Delta exceeds the tolerance threshold
Critical: Delta exceeds 2x the tolerance threshold

Custom tolerances

Pass a tolerances map when comparing results to set per-metric thresholds. For example, you might set a tight 5% tolerance on drag coefficient but allow 30% variation in iteration count.

Comparison view UI

The batch comparison renders directly in the chat interface with:

Side-by-side metric tables: Every simulation in a row, every metric in a column, with convergence status highlighted
Best/worst rankings: For each metric, the best and worst performers are identified with their labels and values
Regression alerts: Color-coded warnings and critical alerts showing which simulations deviated from the baseline and by how much
CSV export: Download the comparison table as CSV for external analysis in spreadsheets or plotting tools

Handling extraction errors

Not every simulation in a batch may yield extractable metrics. Common reasons include:

The simulation failed before producing any output
Log files were not written due to an early crash
Transient runtime errors (rate limits, timeouts) during metric extraction

The comparison system handles these gracefully:

Extraction errors are collected and reported alongside the comparison results
Failed extractions do not block the comparison -- metrics from successful jobs are still compared
Transient errors are retried up to 4 times with exponential backoff before being marked as failures

Golden baselines

Coming soon

Golden baselines are version-controlled, validated baseline configurations that serve as reference points for regression testing. You will be able to pin a simulation result as a golden baseline, and all future runs of similar cases will be automatically compared against it. This feature is under active development.

Case history search

SimPilot stores metadata for every simulation run, enabling search across your simulation history. You can search past simulations by:

Physics type: CFD, heat transfer, multiphase, reactive flow
Geometry: Similar shapes or uploaded geometry files
Solver: Which OpenFOAM application was used
Turbulence model: k-epsilon, k-omega SST, laminar, LES variants
Reynolds number: Approximate flow regime
Convergence status: Filter for converged or failed runs

This makes it easy to find previous runs that are similar to your current problem, compare approaches, and build on past work rather than starting from scratch.

PreviousBatch & Sweeps NextOverview

Results Comparison

Compare results across simulations with automated metrics and regression alerts.

After a batch completes, SimPilot automatically extracts metrics from every completed simulation and builds a structured comparison. You can also trigger comparison manually for any existing batch.

How comparison works

Metric extraction

Convergence status: Whether residuals dropped below the convergence threshold (1e-3 for OpenFOAM)
Iteration count: Total solver iterations completed
Final residuals: Per-field residual values at the last iteration (e.g., finalResidual.Ux, finalResidual.p)
Custom metrics: Solver-specific quantities like drag coefficient, heat flux, or mass flow rate

Comparison table

Rankings

For each metric, the system computes best and worst rankings across all simulations. Rankings are sorted by metric value, making it easy to identify which configuration performed best.

Regression alerts

Storage

The full comparison result -- including summary, regression alerts, and any extraction errors -- is stored in the batch record's comparisonResult field with a generation timestamp.

Regression alerts

Regression detection compares each simulation's metrics against a baseline and applies tolerance thresholds:

Metric type	Default tolerance
Iterations	20%
Residuals	50%
All other metrics	10%

You can override tolerances per metric when invoking the comparison. Alerts are classified by severity:

Warning: Delta exceeds the tolerance threshold
Critical: Delta exceeds 2x the tolerance threshold

Custom tolerances

Pass a tolerances map when comparing results to set per-metric thresholds. For example, you might set a tight 5% tolerance on drag coefficient but allow 30% variation in iteration count.

Comparison view UI

The batch comparison renders directly in the chat interface with:

Side-by-side metric tables: Every simulation in a row, every metric in a column, with convergence status highlighted
Best/worst rankings: For each metric, the best and worst performers are identified with their labels and values
Regression alerts: Color-coded warnings and critical alerts showing which simulations deviated from the baseline and by how much
CSV export: Download the comparison table as CSV for external analysis in spreadsheets or plotting tools

Handling extraction errors

Not every simulation in a batch may yield extractable metrics. Common reasons include:

The simulation failed before producing any output
Log files were not written due to an early crash
Transient runtime errors (rate limits, timeouts) during metric extraction

The comparison system handles these gracefully:

Extraction errors are collected and reported alongside the comparison results
Failed extractions do not block the comparison -- metrics from successful jobs are still compared
Transient errors are retried up to 4 times with exponential backoff before being marked as failures

Golden baselines

Coming soon

Case history search

SimPilot stores metadata for every simulation run, enabling search across your simulation history. You can search past simulations by:

Physics type: CFD, heat transfer, multiphase, reactive flow
Geometry: Similar shapes or uploaded geometry files
Solver: Which OpenFOAM application was used
Turbulence model: k-epsilon, k-omega SST, laminar, LES variants
Reynolds number: Approximate flow regime
Convergence status: Filter for converged or failed runs

This makes it easy to find previous runs that are similar to your current problem, compare approaches, and build on past work rather than starting from scratch.

PreviousBatch & Sweeps NextOverview

Search Documentation

Results Comparison

How comparison works

Metric extraction

Comparison table

Rankings

Regression alerts

Storage

Regression alerts

Comparison view UI

Handling extraction errors

Golden baselines

Case history search

Search Documentation

Results Comparison

How comparison works

Metric extraction

Comparison table

Rankings

Regression alerts

Storage

Regression alerts

Comparison view UI

Handling extraction errors

Golden baselines

Case history search