Here’s a complete Help Center page draft for the AI Code Insights feature, based
AI Code Insights helps your organization understand how AI-assisted coding is impacting productivity and code review dynamics. It provides data-driven metrics based on Span’s ML-powered AI Code Detector, which identifies AI-generated code at the chunk level.
Read our launch announcement for details about the model.
What you can use this report for
1. Track AI adoption
Monitor how much of your organization’s code is AI-generated and how usage trends change over time.
2. Improve adoption and proficiency
Compare AI usage by team, an individual contributor or in a repository to identify where enablement or training could increase effectiveness.
3. Identify the optimal AI “sweet spot”
Analyze productivity metrics across different AI usage levels to understand where AI adds the most value.
4. Measure productivity impact
Correlate AI use with development metrics such as PR speed, throughput , lifecycle time, and review cycles to assess efficiency gains or friction.
5. Inform policy and enablement strategy
Use data to shape guidelines, measure ROI, and prioritize AI-related initiatives across the engineering organization.
How It Works
The insights in this report are powered by Span’s AI Code Detector, a machine learning model that classifies code chunks as AI-generated or human-authored.
Chunk-level classification: Each contiguous block of added or modified code (≥700 characters) in supported languages is analyzed by the model.
Estimation-based analysis: Because AI classification is probabilistic, all metrics should be viewed as estimates, not exact counts.
Supported languages: Python, JavaScript, TypeScript, Java and Ruby.
Support for other languages is planned for future releases.
AI Code Ratio
The AI Code Ratio measures the share of new or modified code lines that are AI-generated within merged pull requests (PRs).
How it’s calculated
Scope
Only includes modified or added lines in supported languages.
Each PR’s lines are grouped into contiguous chunks.
Only chunks ≥700 characters are eligible for classification.
Inclusion rule
A PR is counted toward the AI Code Ratio if it contains at least one classified chunk.
Computation logic
Based on the Telltale AI Code Ratio Calculation document:
Numerator: the expected number of AI-classified lines merged.
Combines two parts:
Classified Portion: sum of all lines classified as AI, weighted by model confidence.
Too-Small Portion: smaller chunks are not classified directly; instead, the upper bound is imputed from the observed classified ratio.
Denominator: all supported-language lines merged in PRs that had at least one classified chunk.
Sensitivity Analysis: How AI Impacts Productivity
AI Code Insights doesn’t just measure AI usage — it examines how different levels of AI use ("dose") relate to performance metrics.
Each metric is analyzed per-PR and compared across AI “dose” levels:
No AI: <5% AI code ratio
Low AI: 5–25%
Medium AI: 25–50%
High AI: 50–100%
Inclusion rule: A PR is qualified for sensitivity analysis if it contains at least one classified chunk and at least 30% of its total line changes (additions and modifications) are classifiable.
This dose-response framing helps leaders understand whether higher levels of AI assistance correlate with productivity gains or code review friction.
Velocity Metrics
Weighted PRs / dev week
This metric measures development throughput by grouping PRs by developer week and measuring their average weighed PRs.
Each dev week is a 7 day period identified by its start date and a code contributor ID.
Interpretation: A higher value means that on average developers are merging more “work-weight” per week.
Example: A 10% increase in weighted PRs / dev week suggests faster throughput for AI-assisted code.
Weighted PRs / cycle time day
This metric measures development speed, similar to “distance ÷ time”.
Distance → a PR’s weight, between 0.1–2.0, normalized by global PR complexity and line count.
Time → duration from first commit to merge.
Interpretation: A higher value means teams are merging more “work-weight” per day.
Example: A 10% increase in weighted PRs / cycle time day suggests faster throughput for AI-assisted code.
PR life cycle
Shows the average time PRs spend in each lifecycle stage (e.g., development, review, rework, merge).
Use this to identify bottlenecks — for example, if High-AI PRs spend longer in the “review” stage.
Code Review Dynamics
PR review cycles
Measures the back-and-forth between author and reviewer:
1 Cycle = open → approve (no revisions)
Multiple Cycles = open → comment → commit → approve (each full loop adds 1)
If High-AI PRs have higher review cycles, it may indicate increased reviewer pushback or uncertainty about AI-generated changes.
A corresponding rise in the rework stage of the lifecycle often supports this interpretation.
Investment Areas
Measures how AI is being utilized in your team. Shows the difference in weighted PRs between High AI classified PRs and No AI PRs in each investment category.
Use this to understand which areas are improved by introducing AI usage, how your team is developing new features, fixing bugs, improving performance and stability. You can compare this to your goals and figure out whether more training or configuration is needed in order to incorporate AI in the areas you’d like to focus on.
Breakdown Table
The Breakdown view allows you to explore:
AI Code Ratio and productivity metrics
Grouped by team, repository, or individual contributor
Use this to identify:
Teams adopting AI effectively
Individuals or groups with unusually high or low AI usage patterns
Opportunities for knowledge sharing or training
Individual PR AI Analysis
This view is located in our catalog and is accessible by selecting a specific AI usage dose bucket from each chart. It showcases each PR’s AI analysis breakdown by language or file chunk. It allows for higher levels of auditability and helps to connect all the dots by digging deeper and seeing the actual code that developers are generating.
🔍 Important Notes
AI detection is probabilistic — treat metrics as directional estimates.
Currently supports Python, JavaScript, TypeScript, Java and Ruby.
Confidence intervals are shown to account for model uncertainty.
Smaller code chunks (<700 chars) are excluded from classification to reduce false positives.
📚 Related Articles
Introducing Span Detect – overview of the ML model used for AI detection