"What makes an open source project sustainable? Finding meaningful indicators," presented by Max Grzanna (Bosch Digital) and Sven Erik Jeroschewski (Robert Bosch GmbH), explores how data-driven indicators can be used to assess the long-term sustainability and risk of open source projects. This session was recorded at Open Community Experience 2026 (OCX26) in Brussels, Belgium, as part of the Open Community for Research.
This session examines how to identify meaningful indicators of open source project sustainability using data-driven approaches applied to large-scale software ecosystems.
It starts from the problem of dependency risk in open source supply chains. Modern software relies on deep dependency trees, where issues such as security vulnerabilities, licensing changes, or abandoned maintenance can propagate across multiple layers and impact downstream systems.
The talk introduces the idea of treating the open source ecosystem as a system that can be analysed using observable signals. Instead of relying on opinion-based metrics, the approach focuses on extracting indicators from historical data to detect early warning signs of instability.
The methodology combines Software Bill of Materials (SBOMs) with project metadata collected from multiple sources, including GitHub activity, package registries, and vulnerability databases. A large dataset of repositories is analysed to identify patterns across development activity, community engagement, licensing, and risk factors.
The session outlines the construction of a data pipeline that aggregates time-series data over a defined period and processes it to extract features such as contributor trends, release frequency, and interaction patterns. This data is then used to build models that classify projects based on their characteristics.
Clustering techniques are applied to group projects into distinct archetypes. These include categories such as flagship projects with strong communities and low risk, resource-constrained projects with limited maintainers, and critical infrastructure components with high dependency exposure but concentrated ownership.
The analysis also highlights which indicators have the strongest predictive value. Factors such as licensing clarity and the presence of community documentation are shown to have a higher impact than raw activity metrics like commit counts.
Finally, the session demonstrates how these models can be used in practice. By analysing SBOMs, organisations can identify high-risk dependencies, prioritise mitigation efforts, monitor risk over time, and make more informed decisions when selecting or supporting open source components.
Key topics covered
open source project sustainability
software supply chain risk
SBOM (Software Bill of Materials) analysis
dependency risk propagation
data-driven project health indicators
GitHub activity and metadata analysis
clustering and project archetypes
community and governance signals
vulnerability data (OSV) integration
risk classification and monitoring
Why this matters
Most teams only realise a dependency is a problem when it breaks. This approach shifts that mindset. Instead of reacting to failures, it gives you a way to detect risk early and decide where to invest time, support, or funding before it becomes a production issue.
About OCX26
Open Community Experience 2026 is the Eclipse Foundation’s flagship event, held in Brussels, Belgium. It brings together developers, architects, and industry leaders to explore open source technologies across domains including IoT, AI, automotive, and security, with a focus on practical implementation and collaboration. Learn more at https://www.ocxconf.org/
Chapters
00:00 introduction and sustainability problem
00:29 dependency risk in open source supply chains
01:27 real-world supply chain incidents
02:28 ecosystem perspective and early detection
03:09 existing approaches (CHAOSS, OpenSSF)
04:22 data-driven indicator approach
05:31 dataset construction and project selection
07:25 defining sustainability indicators
08:44 data sources and aggregation
10:04 data pipeline and preprocessing
11:36 challenges in data collection (PURLs, licensing)
14:25 clustering and project archetypes
17:48 defining risk categories
21:20 key predictive indicators
23:06 classification models and outputs
24:30 applying the model to real projects
26:21 risk monitoring and decision-making
29:25 funding and sustainability discussion