VIOVIO Survey BenchmarkGitHub Pages reproducible benchmark

Lesson 10 / Advanced

Advanced fair comparison

Rules for reviewers who need to judge whether a public row is comparable, reproducible, and eligible for aggregation.

Learning outcomes

  • Audit denominator and missing-data rules
  • Check association, alignment, and path-length policies
  • Connect Docker reproducibility to public table eligibility

A row is a claim

A numeric row claims that a specific system, configuration, dataset sequence, trajectory, timing file, EPA output, and resource context belong together.

If any part is missing, the public value should stay TBD, N/A, Partial, or Unresolved.

Aggregation is conditional

Averages require a visible denominator and a rule for which rows are included.

Do not mix rows with different output scopes, dataset conversions, timing policies, or resource platforms.

Checklist

  • Denominator is visible
  • Missing rows are not silently averaged
  • EPA policy is attached
  • GitHub/Docker reproduction path exists

Advanced notes

OpenVINS compatibility outputs are useful for migration and sanity checks. EPA remains the active public evaluation reference.

A reproducibility gate is stricter than a successful local run: the command, container, adapter, and artifacts must survive repository checkout.

Practice task

Audit one TBD row and list exactly what artifact must exist before it can become numeric.