Keeping Strava’s Segment Leaderboards Fair: An Engineer’s Perspective
Multi-Sport
, by James Wang

During the pandemic, when gyms closed and routines fell apart, running became one of the most accessible ways for many athletes to keep moving. That surge in participation didn’t fade as the world reopened. Through 2021, Strava saw a 38% year-over-year increase in activity, even on top of the previous year’s spike—totaling 1.8 billion uploads over the past 12 months.
But alongside genuine effort by athletes using Strava, inconsistencies became harder to ignore. Times at the top of some leaderboards didn’t align with what athletes were experiencing on the ground, numbers that told the wrong story.
That disconnect is familiar to many Strava athletes. It’s also what James Wang, an engineer at Strava, came to understand more comprehensively after joining the company. With billions of activities uploaded across tens of millions of segments, Strava operates at a scale unlike any other. Because Strava introduced segments to the world, maintaining leaderboard integrity is a challenge uniquely our own, and one that requires constant technical investment.
In the letter below, James explains how Strava approaches these integrity challenges, the systems behind detecting anomalous activity, and the ongoing work required to ensure leaderboards reflect real effort.
...
Hello, my name is James, and I’m a runner and engineer at Strava (not necessarily in that order). Our recent Reddit post received a lot of interest from the community, so the powers that be have asked me to come back and write a longer form essay detailing our data integrity efforts at Strava.
To understand how Strava protects leaderboard integrity today, it helps to start with where we began, and why that system wasn’t enough.
From Rules to Machine Learning
For years, Strava relied on a rules-based system that flagged activities breaking known world records and superhuman climbing feats, specifically for rides. While effective in some cases, many anomalous activities were not being consistently caught because they did not break records.
To address this, Strava introduced a new approach using machine learning (ML). Specifically, supervised learning models that rely on a labeled dataset of examples (like an answer key to an exam) to identify patterns associated with anomalous activities. And thanks to our community, who has been manually flagging activities for years, we have a large labeled dataset we can use to train our models.
These ML models focus on detecting activities recorded in vehicles, bike rides uploaded as runs, and e-bike rides uploaded as regular rides.
This marked a shift in how leaderboards are protected. But preventing new anomalies from appearing was only part of the challenge.
Cleaning Up Years of Data
To truly move forward with fairer leaderboards, Strava also needed to address the past.
That meant reprocessing the top 100 activities on every run and ride segment leaderboard—running billions of historical activities through the new ML systems. It was a large-scale engineering effort that required careful coordination and attention to detail.
The results:
-In May 2025, Strava reprocessed run segment leaderboards, removing 4.45 million anomalous activities
-In January 2026, ride segment leaderboards were reprocessed, removing 3.9 million anomalous activities
-Prior to these ML efforts, an update to our rules-based system in December 2024 removed 6.5 million anomalous activities
For the teams involved, these numbers represent a successful clean up effort that took over a year and a half to complete. And as a user myself, I can appreciate all the effort put into cleaning up segment leaderboards, though challenges still persist.
Note: Some leaderboards have lots of anomalous activities, and cleaning the top 100 will not be enough. We plan on doing more backfills in the future to clean up the remaining anomalous activities on leaderboards, particularly as we release bigger step changes in our anomaly detection system.
Remaining Challenges
Detecting anomalous activity on segments isn’t straightforward.
Strava captures everything from elite performances to everyday training, making it difficult to distinguish a fast cyclist from a vehicle or a slow cyclist from a fast runner. From the start, our priority has been to minimize false positives so athletes aren’t penalized for legitimate (and extraordinary) performances.
Context also matters. Our systems can’t always tell whether an effort benefited from drafting, tailwinds, or peloton dynamics. They can’t reliably detect velodrome riding. These are known gaps the team continues to work on.
Data availability varies too. Some activities include heart rate, power, or cadence; others don’t. Strava designs its systems so athletes without access to expensive equipment can still participate, accepting added complexity in exchange for broader inclusion.
And then there’s data quality. Corrupted device data and loss of GPS signal while recording an activity can produce many anomalies that can show up on segment leaderboards. Short segments—especially those under 500 meters for ride segments and 250 meters for run segments—are particularly susceptible to corrupted data, which is why Strava no longer allows new segments below these lengths. Our ability to account for corrupted data has improved over the years, so newer activities with corrupted data are less likely to affect leaderboards. However, short segments and older activities are more likely to still contain these data quality issues.
Beyond Leaderboards
Beyond honest mistakes and GPS anomalies, we also encounter fake activities created by bad actors on Strava. These spam accounts masquerade as legitimate athletes to gain attention and potentially scam others, including creating activities that can influence segment leaderboards. Strava invests significantly in automated systems to detect and remove these fraudulent accounts and their misleading activities, striving to evolve these systems with new behavioral signals to combat such deceptive practices and ensure the platform remains safe and equitable for all users. Data integrity spans beyond segments, as Strava dedicates resources to identifying and eliminating abusive content and inauthentic accounts that could negatively impact the community. By developing new signals to detect evolving behavioral patterns and enhancing their models, Strava aims to effectively eradicate both fraudulent accounts and the fake activities they generate, upholding fairness within the community.
Why it Matters
Segments are competitive, but they’re also communal. They’re a way athletes measure progress, challenge friends, and return to the same stretch of road with purpose. Maintaining trust in this system requires constant effort, technical innovation, and close collaboration with the community.
When a segment leaderboard represents authentic performances, it tells the right story. And that’s the standard we continue to work toward.
Written by
James Wang

