Visual reasoning benchmark for reading analog gauges

GaugeBench evaluates whether multimodal models can accurately interpret real-world gauges under challenging visual conditions. We provide curated analog gauge images, scoring scripts, and a transparent leaderboard to track frontier model progress over time.

Gauge IDs
Bench Runs
Average Accuracy
Top Score

Leaderboard

See how leading multimodal models perform on GaugeBench. Scores reflect exact-match accuracy on gauge reading tasks.

Powered by Onvo AI

Dataset

GaugeBench covers consumer, industrial, and scientific gauges spanning psi, bar, kPa, and custom scales. We emphasise lighting variety, occlusions, and reflections to stress-test visual reasoning.

Analog gauge from GaugeBench dataset

Pressure Gauges

Classic dial-style gauges with varying needle positions, glare, and bezels sourced from real devices.

Digital gauge from GaugeBench dataset

Industrial Panels

High range indicators with dual units and multi-needle layouts to probe parsing of dense readouts.

Gauge with low-light conditions

Low-Light Scenarios

Dimly lit or partially obscured gauges challenging models to reason under visual noise.

Gauge with colorful dial

Custom Scales

Non-linear and color-coded scales requiring reasoning beyond uniform tick spacing.

Questions we ask

  • Exact readout: Does the model return the precise gauge value and units?
  • Range awareness: Are minimum and maximum dial ranges identified correctly?
  • Consistency: Can the model produce structured JSON suitable for automatic scoring?
  • Robustness: Do predictions remain stable under reflections, blur, and cropped images?

Try it yourself

Download the dataset and evaluation script to benchmark your own multimodal model.

Partner with GaugeBench

We welcome research collaborations and eval contributions. Reach out if you are integrating GaugeBench into internal eval suites or would like to feature your model results on the public leaderboard.