Benchmarking in Span

Last updated: May 14, 2026

What Is Benchmarking?

Benchmarking in Span allows you to compare your engineering metrics against reference baselines — either within your own organization or against the broader industry. Instead of looking at metrics in isolation, benchmarking gives them context: Is a 2-day PR cycle time fast or slow? How does our deployment frequency compare to teams like ours?

Span surfaces benchmark comparisons at three percentile levels:

Percentile

What it means

P50

Median performance — half of the comparison population performs better, half worse

P75

Better than 75% of the comparison population

P90

Better than 90% of the comparison population — top-tier performance

Where to Access Benchmarks

Benchmarks are available throughout the Span platform wherever supported metrics are displayed. Key locations include:

  • Report pages — Metric data tables on report pages will show a benchmark selector in the filter controls when the metric supports benchmarking

  • Team & individual pages — Time-based metrics for specific people or teams can be compared against benchmarks

  • AI Tools / AI Transformation reports — Benchmark filtering is available on AI analytics pages

  • Metric Digest emails — Scheduled metric email reports can be configured to include benchmark comparisons

  • Survey analytics — DX survey question and theme analysis pages include benchmark comparisons (e.g., against Q4 2024 industry benchmarks)

How to enable benchmarks on a page:

  1. Navigate to a report or metrics page

  2. Look for the "Organization" filter in the metrics controls area

  3. Click it to open the benchmark selector and choose your preferred benchmark mode

  4. If no benchmark selector appears, that metric does not currently support benchmarking

Understanding Benchmarking Colors in Span

The benchmarking section on the Organization page uses colored dots to give you an at-a-glance signal of how your metrics are performing relative to a reference population. Here's how it all works.

The Color Scale

Each dot maps to one of five performance tiers based on where your metric ranks within the selected benchmark population:

Dot

Level

Percentile range

🔴 Red

Worst

0 – 20th percentile

🟠 Orange

Bad

20 – 40th percentile

🟡 Yellow

Average

40 – 60th percentile

🟢 Light green

Good

60 – 80th percentile

🟢 Dark green

Best

80 – 100th percentile

Metric Direction Matters

Not all metrics are equal — for some, a higher value is better; for others, lower is better. Span accounts for this automatically:

  • Positive connotation (higher = better, e.g. PRs merged per week): a higher percentile rank → greener dot.

  • Negative connotation (lower = better, e.g. PR cycle time): the color scale is reversed — a high percentile rank means you're slower than most of your peers, so the dot correctly shows red.

  • Neutral connotation: colors are applied without any reversal.

This ensures the dots always read intuitively:

🟢 green means you're performing well

🔴 red means there's room for improvement — regardless of the metric's raw direction.

How Benchmark Selection Affects the Colors

The color of a dot depends on your percentile rank within a chosen population. Switching the benchmark selection changes that population, which changes your rank, which changes the dot color. The same raw metric value (e.g. a 3-day PR cycle time) can produce different colored dots depending on what you're comparing against.

Available Benchmark Populations

Option

What you're compared against

Organization

Your org's own internal distribution

Industry

Aggregated data across Span customers (requires enablement)

Dimension-based (e.g. team or group)

A specific slice of your organization

What the P50 / P75 / P90 Selector Does

This selector controls which reference line is drawn on benchmark charts — it does not affect the dot color. The dot always reflects your full percentile rank within the chosen population, independent of that selector.

Putting It All Together

When you view a metric on the Organization page:

  1. Span retrieves the percentile distribution for your selected benchmark population.

  2. Your metric value is ranked within that distribution, producing a percentile between 0 and 1.

  3. That percentile is mapped to one of the five color tiers.

  4. If the metric has a negative connotation (lower is better), the color scale is reversed so green still means good.

Switching benchmark populations reruns this process against the new distribution — so the colors reflect a meaningful comparison no matter which reference you choose.

Why Benchmarking Matters

Without context, engineering metrics can be misleading. Benchmarking solves this by answering the question: "Is this number actually good?"

Key benefits:

  • Set realistic goals — Use P50, P75, and P90 benchmarks to define what "good" and "great" look like for your team, grounded in real-world data

  • Identify improvement areas — Spot where your org is consistently below the median and prioritize those workflows for investment

  • Celebrate wins — Recognize when teams are performing in the top quartile or top decile of the industry

  • Inform executive conversations — Benchmark data gives engineering leaders credible, data-backed narratives when discussing performance with leadership

  • Track progress over time — As you improve processes, track how your percentile rank shifts relative to the industry

Common Use Cases

1. Evaluating PR review health Compare your PR cycle time and review turnaround against industry P50/P75 to identify if slow reviews are a systemic issue or within normal range.

2. DORA metrics context Understand where your deployment frequency, change failure rate, and lead time sit relative to other engineering orgs — and set targets to reach the next percentile tier.

3. Developer experience surveys After running a DX survey, benchmark your scores against industry norms to understand whether reported friction is common across the industry or unique to your organization.

4. AI tool adoption On the AI Transformation report, benchmark AI code adoption metrics to understand if your AI utilization is ahead of or behind peer organizations.

5. Team-level internal comparisons Use the Organization benchmark mode to compare teams within your company — useful for identifying internal best practices and replicating success patterns across teams.

Tips for Using Benchmarks Effectively

  • P50 as a baseline, P75 as a target — A common approach is to treat the 50th percentile as "acceptable" and the 75th as an aspirational short-term goal

  • Don't over-index on P90 — Top-decile performance often reflects very specific organizational contexts; P75 is a more sustainable target for most teams

  • Pair benchmarks with qualitative data — A metric below the P50 isn't necessarily a problem; always combine benchmark data with context from retrospectives, surveys, and team conversations

  • Use org benchmarks before industry benchmarks — Start with internal comparisons to identify relative strengths and weaknesses before zooming out to industry data

Note: Industry benchmarks are drawn from aggregated data across Span customers. Availability of the Industry benchmark mode depends on your organization's plan and settings. Contact your Span administrator if you don't see the Industry option in your benchmark selector.