You are here

CNCF Blog Projects Category

Subscribe to CNCF Blog Projects Category feed
Updated: 2 min 31 sec ago

Tekton Becomes a CNCF Incubating Project

Tue, 03/24/2026 - 04:00

The CNCF Technical Oversight Committee (TOC) has voted to accept Tekton as a CNCF incubating project. 

What is Tekton?

Tekton is a powerful and flexible open source framework for creating continuous integration and delivery (CI/CD) systems that allows developers to build, test, and deploy across multiple cloud providers and on-premises systems by abstracting away the underlying implementation details.

While widely adopted for CI/CD, Tekton serves as a general-purpose, security-minded, Kubernetes-native workflow engine. Its composable primitives (Steps, Tasks and Pipelines) allow developers to orchestrate any type of sequential or parallel workload on Kubernetes. Tekton provides a standard, Kubernetes-native interface for defining these workflows, making them portable and reusable.

Tekton’s Key Milestones

The project has matured into a leading framework for Kubernetes-native CI/CD, reaching its stable v1.0 release for the core Pipelines component.

By joining the CNCF, Tekton aligns itself more closely with the ecosystem it powers. It integrates deeply with other CNCF projects like Argo CD (for GitOps) and SPIFFE/SPIRE (for identity), and also Sigstore via OpenSSF (for signing and verification), creating a robust supply chain security story.

Tekton is widely adopted in the industry and used by companies like Puppet and Ford Motor Company. Additionally, Tekton powers major commercial CI/CD offerings, including, but not limited to: Red Hat OpenShift Pipelines and IBM Cloud Continuous Delivery.

A Message from the Tekton Team

“One of the accomplishments I’m most proud of is the broad adoption of Tekton across open source projects, commercial products, and in-house platforms. Seeing teams rely on it in production and build on it within their own ecosystems has been especially rewarding. As a Kubernetes-native project that integrates naturally with other CNCF technologies, Tekton has benefited from close collaboration within the Cloud Native Computing Foundation community. I’m looking forward to deepening those partnerships, learning from our peers across CNCF projects, and meeting more Tekton users who are shaping what cloud native delivery looks like in practice.”

— Andrea Frittoli, Tekton Governing Board Member

“What I’m most proud of is how Tekton has shown that CI/CD can be a true Kubernetes-native primitive, not just another layer on top. Seeing projects like Shipwright—itself a CNCF project—and Konflux build on Tekton as their foundation validates that vision. Building all of this alongside a diverse, multi-vendor community with Red Hat, Google, IBM, and many individual contributors has been one of the most rewarding open source experiences of my career. I’m looking forward to what comes next. The future of Tekton is Trusted Artifacts changing how tasks share data, a simpler developer experience through Pipelines as Code, and deeper collaboration with CNCF projects like Sigstore and Argo CD. Tekton is fundamentally a Kubernetes project, and CNCF is its natural home.”

— Vincent Demeester, Tekton Governing Board Member

Support from TOC Sponsors

The CNCF Technical Oversight Committee (TOC) provides technical leadership to the cloud native community. It defines and maintains the foundation’s technical vision, approves new projects, and stewards them across maturity levels. The TOC also aligns projects within the overall ecosystem, sets cross-cutting standards and best practices and works with end users to ensure long-term sustainability. As part of its charter, the TOC evaluates and supports projects as they meet the requirements for incubation and continue progressing toward graduation.

“Tekton has proven itself as core infrastructure for Kubernetes-native delivery. Its move to incubation reflects strong multi-vendor governance and deep alignment with CNCF projects focused on GitOps, identity and software supply chain security.”

— Chad Beaudin, TOC Sponsor, Cloud Native Computing Foundation

“Tekton’s composable design and broad adoption make it an important part of the cloud native workflow landscape. The TOC’s vote recognizes a healthy contributor community and a clear roadmap.”
— Jeremy Rickard, TOC Sponsor, Cloud Native Computing Foundation

The Main Components of Tekton

  • Pipelines: The core building blocks (Tasks, Pipelines, Workspaces) for defining CI/CD workflows.
  • Triggers: Allows pipelines to be instantiated based on events (like Git pushes or pull requests).
  • CLI: A command-line interface for interacting with Tekton resources.
  • Dashboard: A web-based UI for visualizing and managing pipelines.
  • Chains: A supply chain security tool that automatically signs and attests artifacts built by Tekton.

Community Highlights

These community metrics signal strong momentum and healthy open source governance. For a CNCF project, this level of engagement builds trust with adopters, ensures long-term sustainability and reflects the collaborative innovation that defines the cloud native ecosystem. Tekton’s notable milestones include:

  • 11,000+ GitHub Stars (across all repositories)
  • 5,000+ Pull Requests
  • 2,500+ Issues
  • 600+ Contributors
  • 1.0 Stable Release of Pipelines

The Future of Tekton

The Tekton roadmap focuses on stability, security and scalability. Key initiatives from the project board and enhancement proposals (TEPs) include:

  • Supply Chain Security: Enhancing Tekton Chains to meet SLSA Level 3 requirements by default, including better provenance for build artifacts.
  • Trusted Artifacts: Introducing a secure and efficient way to pass data between tasks without relying on shared storage (PVCs), significantly improving performance and isolation (TEP-0139). 
  • Concise Syntax: Exploring less verbose syntax for referencing remote tasks and pipelines to improve developer experience (TEP-0154).
  • Advanced Scheduling: Integrating with Kueue for better job queuing and priority management of PipelineRuns.
  • Tekton Results: Moving the Results API to stable to provide long-term history and query capabilities for PipelineRuns and TaskRuns.
  • Catalog Evolution: Transitioning reusable tasks to Artifact Hub for better discoverability and standardized distribution.
  • Pipelines as Code: Continued investment in Git-based workflows, improving the “as code” experience for defining and managing pipelines.

For more details, see the Tekton Project Board and approved TEPs (Tekton Enhancement Proposals).

As a CNCF-hosted project, Tekton is committed to the principles of open source, neutrality and collaboration. We invite global developers and ecosystem partners to join us in enabling data to flow and be efficiently used freely anywhere, anytime. For more information on maturity requirements for each level, please visit the CNCF Graduation Criteria.

Categories: CNCF Projects

Fluid Becomes a CNCF Incubating Project

Tue, 03/24/2026 - 04:00

The CNCF Technical Oversight Committee (TOC) has voted to accept Fluid as a CNCF incubating project. 

What is Fluid?

Kubernetes provides a data access layer through the Container Storage Interface (CSI), enabling workloads to connect to storage systems. However, certain use cases often require additional capabilities such as dataset versioning, access controls, preprocessing, dynamic mounting, and data acceleration.

To help address these needs, Nanjing University, Alibaba Cloud, and the Alluxio community introduced Fluid, a cloud native data orchestration and acceleration system that treats “elastic datasets” as a first-class resource. By adding a data abstraction layer within Kubernetes environments, Fluid enhances data flow and management for data-intensive workloads.

Fluid’s vision is Data Anyway, Anywhere, Anytime:

  • Anyway: Fluid focuses on data accessibility. Storage vendors can flexibly and simply integrate various storage clients without needing deep or extensive knowledge of Kubernetes CSI or Golang programming.
  • Anywhere: Fluid facilitates efficient data access across diverse infrastructure by supporting heterogeneous computing environments (cloud, edge, and serverless). It accelerates access to various storage systems like HDFS, S3, GCS, and CubeFS by utilizing caching engines such as Alluxio, JuiceFS, and Vineyard.
  • Anytime: Runtime dynamic adjustment of data sources allows data scientists to add and remove storage data sources on-demand in Kubernetes environments without service interruption.

Fluid’s Key Milestones and Ecosystem Development

Fluid originated as a joint project from Nanjing University, Alibaba Cloud, and the Alluxio community in September 2020. The project aims to provide efficient, elastic, and transparent data access capabilities for data-intensive AI applications in cloud native environments. In May 2021, Fluid was officially accepted as a CNCF sandbox project.

Since joining the CNCF, Fluid has rapidly grown, continuously releasing multiple important updates, achieving significant breakthroughs in key capabilities such as elastic data cache scaling, unified access to heterogeneous data sources, and application-transparent scheduling, while also improving the operational efficiency of AI and big data workloads on cloud native platforms.

Fluid’s core design concepts and technological innovations have received high-level academic recognition, with related results published in top conferences and journals in the database and computer systems fields, such as IEEE TPDS 2023.

In December 2024, at KubeCon + CloudNativeCon North America, CNCF released the 2024 Technology Landscape Radar Report, where Fluid, along with projects such as Kubeflow, was listed as “Adopt,”becoming one of the de facto standards in the cloud native AI and big data field.

Image of Batch/AI/ML Radar 2024. Fluid was listed as "Adopt"

Now, Fluid has been widely adopted across multiple industries and regions worldwide, with users covering major cloud service providers, internet companies, and vertical technology companies. Some Fluid users include Xiaomi, Alibaba Group, NetEase, China Telecom, Horizon, Weibo, Bilibili, 360, Zuoyebang, Inceptio Technology, Huya, OPPO, Unisound, DP Technology, JoinQuant, among others. Use cases cover a wide range of application scenarios, including, but not limited to, Artificial Intelligence Generated Content (AIGC), large models, big data, hybrid cloud, cloud-based development machine management, and autonomous driving data simulation.

A Word from the Maintainers

“We are deeply honored to see Fluid promoted to an incubating project. Our original intention in initiating Fluid was to fill the gap between compute and storage in cloud native architectures, allowing data to flow freely in the cloud like ‘fluid.’ The vibrant community development and widespread user adoption validate our vision. We will continue to drive the evolution of cloud native data orchestration technology, especially when it comes to exploring intelligent scheduling and orchestration of KVCache for large model inference scenarios and dedicating ourselves to making data serve various applications more efficiently and intelligently.”

— Gu Rong (Nanjing University), Chair and Co-Founder of the Fluid Community

“From sandbox to incubation, the concept of ‘caches also needing elasticity’ has gained widespread recognition. In the future, we will continue to drive Fluid toward becoming the standard for cloud native data orchestration, allowing data scientists to focus on model innovation.”

— Che Yang (Alibaba Cloud), Fluid Community Maintainer and Co-Founder

“Fluid is a key bridge to connecting AI computing frameworks and distributed storage systems. Seeing Fluid grow from a sandbox to an incubating project makes us extremely proud. This milestone proves that building a standardized data abstraction layer on Kubernetes is keeping up with industry trends.”

— Fan Bin (Alluxio Inc.), Alluxio Open Source Community Maintainer

Support from TOC Sponsors

The TOC provides technical leadership to the cloud native community. It defines and maintains the foundation’s technical vision, approves new projects, and stewards them across maturity levels. The TOC also aligns projects within the overall ecosystem, sets cross-cutting standards and best practices, and works with end users to ensure long-term sustainability. As part of its charter, the TOC evaluates and supports projects as they meet the requirements for incubation and continue progressing toward graduation.

“Fluid’s progression to incubation reflects both its technical maturity and the clear demand we’re seeing for stronger data orchestration in cloud native environments. As AI and data-intensive workloads continue to grow on Kubernetes, projects like Fluid help bridge compute and storage in a way that is practical, scalable, and community-driven. The TOC looks forward to supporting the project’s continued evolution within the CNCF ecosystem.”

Alex Chircop, CNCF TOC Member

“Fluid has demonstrated a strong level of maturity that aligns well with CNCF Incubation expectations. Adopter interviews showcase that Fluid has been deployed successfully in large-scale production environments for several years and provides standardized APIs that enable multiple applications to efficiently access and cache diverse datasets. Additionally, Fluid benefits from a healthy, engaged community, with a roadmap clearly shaped by adopter feedback.”

Katie Gamanji, CNCF TOC Member

Main Components in Fluid

  • Dataset Controller: Responsible for dataset abstraction and management, maintaining the binding relationship and status between data and underlying storage.
  • Application Scheduler: The application scheduling component is responsible for perceiving data cache location information and scheduling application pods to the most suitable nodes.
  • Runtime Plugins: Pluggable runtime interface responsible for deployment, configuration, scaling, and failure recovery of specific caching engines (such as Alluxio, JuiceFS, Vineyard, etc.), with excellent extensibility.
  • Webhook: Utilizes the Mutating Admission Webhook mechanism to automatically inject sidecar or volume mount information into application pods, achieving zero intrusion into applications.
  • CSI Plugin: Enables lightweight, transparent dataset mounting support for application pods, enabling them to access cached or remote data via local file system paths.
Image showing the components of Fluid

Community Highlights

These community metrics signal strong momentum and healthy open source governance. For a CNCF project, this level of engagement builds trust with adopters, ensures long-term sustainability, and reflects the collaborative innovation that defines the cloud native ecosystem.

  • 1.9k GitHub Stars 
  • 116 pull requests 
  • 250 issues
  • 979 contributors
  • 28 Releases

The Journey Continues

Becoming  a CNCF incubating project is a turning point for Fluid’s journey. Fluid will continue to deepen its data orchestration capabilities for generative AI and big data scenarios. To meet the exponential growth demands of GenAI applications, Fluid’s next goal is to evolve into an intelligent elastic data platform, allowing users to focus on model innovation and data value mining, while Fluid handles the underlying data distribution, cache acceleration, resource management, and elastic scaling.

As a CNCF incubating project, Fluid will continue to uphold the principles of open source, neutrality, and collaboration, working together with global developers and ecosystem partners to enable data to flow and be efficiently used freely anywhere, anytime.

Hear from Users

“Fluid’s Anytime capability allows our data scientists to self-service data switching without restarting Pods, truly achieving data agility. This is the core reason we chose Fluid over a self-built solution.”

— Liu Bin, Technical Lead at DP Technology

“Fluid’s vendor neutrality and cross-namespace cache sharing capabilities help us avoid cloud vendor lock-in and save approximately 40% in cross-cloud bandwidth costs. It has been deeply integrated into all of our data workflows.”

— Zhao Ming, Head of Horizon AI Platform

“In LLM model inference, remote Safetensors file reading often leads to low I/O utilization. Fluid’s intelligent prefetching and local caching technology allows us to fully saturate bandwidth without modifying code, fully unleashing GPU computing power.”

— Zhang Xiang, Head of NetEase MaaS
As a CNCF-hosted project, Fluid is committed to the principles of open source, neutrality and collaboration. We invite global developers and ecosystem partners to join us in enabling data to flow and be efficiently used freely anywhere, anytime. For more information on maturity requirements for each level, please visit the CNCF Graduation Criteria.

Categories: CNCF Projects

Cloud Native Computing Foundation Announces Kyverno’s Graduation

Tue, 03/24/2026 - 04:00

Kyverno reaches graduation after demonstrating broad enterprise adoption as platform teams adopt declarative governance

Key Highlights:

  • Kyverno graduates from the Cloud Native Computing Foundation after demonstrating production readiness and strong adoption.
  • Kyverno’s declarative policy-as-code solution makes it easier for platform and security teams to define and enforce guardrails across Kubernetes and cloud native environments.
  • Since joining CNCF in 2020, the Kyverno community has grown significantly, expanding from 574 GitHub stars to more than 9,000 and attracting contributors and end users worldwide.

KUBECON + CLOUDNATIVECON NORTH EUROPE, AMSTERDAM, The Netherlands – March 24, 2026 – The Cloud Native Computing Foundation® (CNCF®), which builds sustainable ecosystems for cloud native software, today announced the graduation of Kyverno, a Kubernetes-native policy engine that enables organizations to define, manage and enforce policy-as-code across cloud native environments.


Originally created by Nirmata and contributed to the CNCF in 2020, Kyverno (which means “to govern” in Greek) has achieved the highest maturity level after demonstrating widespread production adoption and significant community growth. The project’s declarative policy-as-code solution makes it easier for platform and security teams to define and enforce guardrails across Kubernetes and cloud native environments.

“Kyverno’s graduation highlights how important policy-as-code has become for organizations running cloud native in production at scale,” said Chris Aniszczyk, CTO of CNCF. “The project makes it easier for platform teams to enforce governance and security practices using familiar Kubernetes constructs, and the strong community behind Kyverno shows how critical this capability is across the ecosystem.”

Since joining the CNCF, Kyverno has experienced exponential growth and adoption across the Kubernetes ecosystem. The project has grown from 574 to more than 9,000 GitHub stars, and Kyverno continues to attract a growing number of contributors and end users worldwide. Today, Kyverno helps platform and security teams enforce policy, security and operational guardrails across some of the world’s largest Kubernetes environments. Organizations such as Bloomberg, Coinbase, Deutsche Telekom, Groww, LinkedIn, Spotify, Vodafone and Wayfair publicly rely on Kyverno to help secure and manage their Kubernetes platforms.

The project offers multiple ways for organizations to integrate policy management into their workflows, including running as a Kubernetes admission controller, command-line interface (CLI), container image or software development kit (SDK). While Kyverno began as a Kubernetes-native admission controller, it has evolved into a broader policy engine used across the cloud native stack. Declarative policies can now be applied to a wide range of payloads and enforcement points. It integrates deeply with the broader CNCF ecosystem and is commonly used alongside projects such as Argo CD, Backstage, Flux and Kubernetes to help platform teams implement policy-driven governance as part of modern GitOps and platform engineering practices.

To achieve graduation, Kyverno successfully completed a third party security audit and a comprehensive security assessment led by CNCF TAG Security & Compliance. The project also passed a formal governance review, demonstrating mature open source practices. Further, the community introduced contributor guidelines addressing the responsible use of AI-assisted development tools.

The CNCF Technical Oversight Committee (TOC) provides technical leadership to the cloud native community, defining its vision and stewarding projects through maturity levels up to graduation. Kyverno’s graduation was supported by TOC sponsor Karena Angell, who conducted a thorough technical due diligence.

“Graduation is reserved for projects that demonstrate strong governance, sustained community growth and widespread production use,” said Karena Angell, chair of the Technical Oversight Committee, CNCF. “Kyverno met that bar through its technical maturity, security posture and the growing number of organizations relying on it to manage policy across Kubernetes environments.”

With its latest release, Kyverno has fully adopted Common Expression Language (CEL), aligning with the future direction of Kubernetes admission controls for improved performance and enhanced expressiveness. Upcoming releases will focus on extending policy enforcement to additional control points across the cloud native stack, including support for artificial intelligence and Model Context Protocol (MCP) gateways. These innovations will help organizations apply policy-as-code consistently across infrastructure, applications and emerging AI-driven workloads.

“As AI adoption accelerates, policy-as-code provides the essential guardrails for autonomous governance at scale without stifling innovation,” said Jim Bugwadia, Kyverno co-creator and CEO of Nirmata. “We built Kyverno to champion developer agility and self-service, and we are honored by its massive growth and success within the CNCF ecosystem.”

Learn more about Kyverno and join the community: https://kyverno.io 

Supporting Quotes

“Kyverno has become a core part of how I help platform teams take control of their Kubernetes environments. What used to require manual intervention and custom scripts is now policy-as-code that teams can own without learning a separate language. For organisations running Kubernetes at scale, Kyverno’s graduation reflects what I’ve seen firsthand – it’s production-ready, battle-tested and it makes platform teams faster.” 

– Steve Wade, Founder at Platform Fix and Ex-Technical Advisory Board Member at Cisco

“At Deutsche Telekom, Kyverno has played an important role in helping our platform teams implement Kubernetes-native policy management in a scalable and developer-friendly way. Its declarative approach to policy enforcement allows us to embed security, compliance and operational best practices directly into our Kubernetes environments without adding unnecessary complexity for application teams. The project’s strong community, rapid innovation and focus on usability have made Kyverno a valuable tool for organizations operating Kubernetes at scale. We’re excited to see the project reach this stage and look forward to its continued growth in the cloud native ecosystem.” 

– Mamta Bharti, VP of Engineering at Deutsche Telekom

Kyverno has become a critical component of LinkedIn’s Kubernetes admission control pipeline, enforcing consistent security and configuration policies across 230+ clusters with 500K+ nodes. Its YAML-native approach means our platform teams can author and maintain policies without learning a new language. Kyverno has proven its reliability at enterprise scale, handling over 20K admission requests per minute under stress without degradation.” 

– Shan Velleru, Senior Software Engineer at LinkedIn

About Cloud Native Computing Foundation

Cloud native computing empowers organizations to build and run scalable applications with an open source software stack in public, private, and hybrid clouds. The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure, including Kubernetes, Prometheus, and Envoy. CNCF brings together the industry’s top developers, end users, and vendors and runs the largest open source developer conferences in the world. Supported by nearly 800 members, including the world’s largest cloud computing and software companies, as well as over 200 innovative startups, CNCF is part of the nonprofit Linux Foundation. For more information, please visit www.cncf.io.

###

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page. Linux is a registered trademark of Linus Torvalds.

Media Contact

Haley White

The Linux Foundation

[email protected] 

Categories: CNCF Projects

CNCF and SlashData Report Finds Platform Engineering Tools Maturing as Organizations Prepare for AI-Driven Infrastructure

Tue, 03/24/2026 - 04:00

New CNCF Technology Radar survey shows which cloud native tools developers view as mature and ready for broad adoption

Key Highlights:

  • CNCF and SlashData release findings from the Q1 2026 CNCF Technology Radar survey based on responses from more than 400 professional developers.
  • CNCF and SlashData’s new report highlights which cloud native platform engineering tools developers who were surveyed view as mature, useful and ready for broad adoption.
  • Helm, Backstage and kro are the three technologies placed in the ‘Adopt’ position of the application delivery technology radar, based on survey responses.
  • Hybrid platform approaches are emerging as the dominant model for AI workflows, reflecting how organizations are adapting existing developer platforms to support AI workloads.

AMSTERDAM, KUBECON + CLOUDNATIVECON EUROPE– March 24, 2026 – The Cloud Native Computing Foundation®  (CNCF®), which builds sustainable ecosystems for cloud native software, released new findings from the Q1 2026 CNCF Technology Radar report with SlashData, uncovering how developers are evaluating platform engineering technologies for workflow automation, application delivery, security and compliance management. 

The survey findings provide an overview of how cloud native teams select internal platform tooling as organizations scale application delivery and prepare infrastructure for artificial intelligence (AI) workloads and increasingly automated development environments.

“Cloud native platforms have reached a point where developers are not just experimenting but standardizing on CNCF projects that make software delivery reliable at scale,” said Chris Aniszczyk, CTO, CNCF. “What’s especially notable about this research is how organizations are extending those same platforms to support AI workloads, showing how cloud native is the base layer of powering the next era of applications.”

Platform Engineering Shapes AI Workflow Strategies


The report explores how organizations structure internal developer platforms (IDPs) and how these decisions influence their approach to AI workflows.

  • 28% of organizations report having a dedicated platform engineering team responsible for internal platforms. 
  • The most common IDP model, reported by 41% of organizations, is multi-team collaboration for managing platform capabilities. 
  • 35% of organizations report using a hybrid platform to integrate AI workloads, combining existing developer platforms with specialized AI tooling. 

These survey findings suggest that many organizations are integrating AI capabilities directly into their cloud native platforms, rather than creating entirely new infrastructure stacks.

Workflow Automation Tools Show Strong Developer Confidence

In the workflow automation category, developers identify several technologies as reliable options for production environments, placing ArgoCD, Armada, Buildpacks, GitHub Actions, Jenkins in the ‘Adopt’ category.

  • GitHub Actions received high recommendations across maturity and usefulness metrics, with 91% of developers claiming that they would recommend it to peers. 
  • Jenkins demonstrated strong maturity scores, reflecting its long standing role in CI/CD
  • Developers gave Karmada and other newer tools high maturity ratings. Karmada achieved the highest usefulness rating among workflow automation tools.

The report also highlights that emerging tools are attracting developer interest, even as they continue to mature, suggesting strong developer enthusiasm for multicluster management solutions despite the perception that the technology is still evolving.

Security and Compliance Tooling Becomes Core Platform Infrastructure

According to the survey findings, security and compliance technologies are emerging as core components of modern developer platforms. Developers placed cert-manager, Keycloak, Open Policy Agent (OPA) in the ‘Adopt’ category.

  • cert-manager received the highest maturity ratings, with 87% of developers rating it four to five stars for stability and reliability.
  • Tools addressing emerging areas such as software supply chain security are gaining attention but remain early in their maturity cycle. For example, in-toto and Sigstore showed lower maturity ratings with little negative sentiment.

These findings suggest that developers are still evaluating how these solutions fit into their development pipelines.

Application Delivery Platforms Continue to Standardize
In the application delivery category, Backstage, Helm, and kro were placed in the ‘adopt’ position, reflecting strong developer confidence in these projects.

  • Helm received the highest maturity ratings among application delivery tools, with 94% of developers giving it the greatest number of four- and five-star ratings for reliability and stability.
  • Helm’s widespread usage across the ecosystem reinforces its role as a foundational component of Kubernetes application deployment.
  • Backstage and kro performed strongly in usefulness ratings. 

These findings indicate continued developer demand for tools that simplify Kubernetes complexity and improve developer experience across internal platforms.

“Developers are increasingly evaluating tools based on how well they fit into their internal platform architectures,” said Liam Bollmann-Dodd, principal market research consultant at SlashData. “What we see in this data is those technologies gaining traction are the ones that are reducing operational friction while enabling teams to standardize application delivery and management.”

Methodology

In Q4 2025, more than 400 professional developers using cloud native technologies were surveyed about their experiences with workflow automation, application delivery and security and compliance management tools. Respondents evaluated technologies they were familiar with based on their maturity, usefulness and the likelihood of recommending them.

Additional Resources:

About Cloud Native Computing Foundation

Cloud native computing empowers organizations to build and run scalable applications with an open source software stack in public, private, and hybrid clouds. The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure, including Kubernetes, Prometheus, and Envoy. CNCF brings together the industry’s top developers, end users, and vendors and runs the largest open source developer conferences in the world. Supported by nearly 800 members, including the world’s largest cloud computing and software companies, as well as over 200 innovative startups, CNCF is part of the nonprofit Linux Foundation. For more information, please visit www.cncf.io.

About SlashData

SlashData is an analyst firm with more than 20 years of experience in the software industry, working with the top Tech brands. SlashData helps platform and engineering leaders make better product, marketing and strategy decisions through best-in-class research, benchmarks, and foresight into how developers, tools, and software are changing. 

###

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page. Linux is a registered trademark of Linus Torvalds.

Media Contact

Haley White

The Linux Foundation

[email protected] 

Categories: CNCF Projects

Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure

Tue, 03/24/2026 - 03:45

We are thrilled to announce that llm-d has officially been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project!

As generative AI transitions from research labs to production environments, platform engineering teams are facing a new frontier of infrastructure challenges. llm-d is joining the CNCF to lead the evolution of Kubernetes and the broader CNCF landscape into State of the Art (SOTA) AI infrastructure, treating distributed inference as a first-class cloud native workload. By joining the CNCF, llm-d secures the trusted stewardship and open governance of the Linux Foundation, giving organizations the confidence to build upon a truly neutral standard.

Launched in May 2025 as a collaborative effort between Red Hat, Google Cloud, IBM Research, CoreWeave, and NVIDIA, llm-d was founded with a clear vision: any model, any accelerator, any cloud. The project was joined by industry leaders AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI and university supporters at the University of California, Berkeley, and the University of Chicago. 

“At Mistral AI, we believe that optimizing inference goes beyond just the engine, and requires solving challenges like KV cache management and disaggregated serving to support next-generation models such as Mixture of Experts (MoE). Open collaboration on these issues is essential to building flexible, future-proof infrastructure. We’re supporting this effort by contributing to the llm-d ecosystem, including the development of a DisaggregatedSet operator for LeaderWorkerSet (LWS), to help advance open standards for AI serving.” – Mathis Felardos, Inference Software Engineer, Mistral AI

What llm-d brings to the CNCF landscape

The CNCF is the natural home for solving complex workload orchestration challenges. AI serving is highly stateful and latency-sensitive, with request costs varying dramatically based on prompt length, cache locality, and model phase. Traditional service routing and autoscaling mechanisms are unaware of this inference state, leading to inefficient placement, cache fragmentation, and unpredictable latency under load. llm-d solves this by providing a pre-integrated, Kubernetes-native distributed inference framework that bridges the gap between high-level control planes (like KServe) and low-level inference engines (like vLLM). llm-d plans to work with the CNCF AI Conformance program to ensure critical capabilities like disaggregated serving are interoperable across the ecosystem.

By building on open APIs and extensible gateway primitives, llm-d introduces several critical capabilities to the CNCF ecosystem:

  • Inference-Aware Traffic Management: Acting as a primary implementation of the Kubernetes Gateway API Inference Extension (GAIE), llm-d utilizes the Endpoint Picker (EPP) for programmable, prefix-cache-aware routing.
  • Native Kubernetes Orchestration: Leveraging primitives like LeaderWorkerSet (LWS), llm-d orchestrates complex multi-node replicas and wide expert parallelism, transforming bespoke AI infrastructure into manageable cloud native microservices.
  • Prefill/Decode Disaggregation: llm-d addresses the resource-utilization asymmetry between prompt processing and token generation by disaggregating these phases into independently scalable pods.

Advanced State Management: The project introduces hierarchical KV cache offloading across GPU, TPU, CPU, and storage tiers.

Kubernetes flow chart showcasing the inference gateway route to selected pods, body-based routing, inference scheduled and variant autoscaler (featuring Nvidia, Google, AMD and Intel nodes).

SOTA inference performance on any accelerator

A core tenet of the cloud native philosophy is preventing vendor lock-in. For AI infrastructure, this means serving capabilities must be hardware-agnostic. 

We believe that democratizing SOTA inference with an accelerator-neutral mindset is the most important enabler for broad LLM adoption. The primary mission of llm-d is to Achieve SOTA Inference Performance On Any Accelerator. By introducing model- and state-aware routing policies that align request placement with specific hardware characteristics, llm-d maximizes utilization and delivers measurable gains in critical inference metrics like Time to First Token (TTFT), Time Per Output Token (TPOT), token throughput, and KV cache utilization. Whether you are running workloads on accelerators from NVIDIA, AMD, or Google, llm-d ensures that high-performance AI serving remains a core, composable capability of your stack.
Crucially, clear benchmarks that prove the value of these optimizations are core to the project. The AI industry often lacks standard, reproducible ways to measure inference performance, relying instead on marketing claims or commercial analysts. llm-d aims to be the neutral, de facto standard for defining and running inference benchmarks through rigorous, open benchmarking. For example, in a ‘Multi-tenant SaaS’ use case, shared customer contexts enable significant computational savings through prefix caching. As demonstrated in the most recent v0.5 release, llm-d’s inference scheduling maintains near-zero latency and massive throughput compared to a baseline Kubernetes service:

 TTFT and throughput vs QPS on Qwen3-32B (8×vLLM pods, 16×NVIDIA H100). 
llm-d inference scheduling maintains near-zero TTFT and scales to ~120k tok/s, while baseline Kubernetes service degrades rapidly under load.

Figure 1: TTFT and throughput vs QPS on Qwen3-32B (8×vLLM pods, 16×NVIDIA H100).
llm-d inference scheduling maintains near-zero TTFT and scales to ~120k tok/s,
while baseline Kubernetes service degrades rapidly under load.

Bridging cloud native and AI native ecosystems

To build the ultimate AI infrastructure, we must bridge the gap between Kubernetes orchestration and frontier AI research. llm-d is actively building deep relationships with AI/ML leaders at large foundation model builders and AI natives, along with traditional enterprises that are rapidly integrating AI throughout their organizations.  Furthermore, we are committed to increasing collaboration with the PyTorch Foundation to ensure a seamless, end-to-end open ecosystem that connects model development and training directly to distributed cloud native serving.

Get involved: Follow the “well-lit paths”

At its core, llm-d follows a “well-lit paths” philosophy. Instead of leaving platform teams to piece together fragile black boxes, llm-d provides validated, production-ready deployment patterns—benchmarked recipes tested end-to-end under realistic load.

We invite developers, platform engineers, and AI researchers to join us in shaping the future of open AI infrastructure:

  • Explore the Well-Lit Paths: Visit the llm-d guides to start deploying SOTA inference stacks on your infrastructure today.
  • Learn More: Check out the official website at llm-d.ai.
  • Contribute: Join the community on slack and get involved in our GitHub repositories at https://github.com/llm-d/

Welcome to the CNCF, llm-d! We look forward to building the future of AI infrastructure together.

Categories: CNCF Projects

Beyond Batch: Volcano Evolves into the AI-Native Unified Scheduling Platform

Mon, 03/23/2026 - 04:00

The world of AI workloads is changing fast. A few years ago, “AI on Kubernetes” mostly meant running long training jobs. Today, with the rise of Large Language Models (LLMs), the focus has shifted to include complex inference services and Autonomous Agents. The industry consensus, backed by CNCF’s latest Annual Cloud Native Survey, is clear: Kubernetes has evolved to become the essential platform for intelligent systems. This shift from traditional training jobs to real-time inference and agents is transforming cloud native infrastructure.

This shift creates new challenges:

  • Complex Inference Demands: Serving LLMs requires high-performance GPU resources and sophisticated management to control costs and latency.
  • Distinct Agent Requirements: AI Agents introduce “bursty” traffic patterns, requiring instant startup times and state preservation—capabilities not natively optimized in Kubernetes.

The Volcano community is responding to these needs. With the release of Volcano v1.14, Kthena v0.3.0, and the new AgentCube, Volcano is transforming from a batch computing tool into a Full-Scenario, AI-Native Unified Scheduling Platform.

1. Volcano v1.14: Breaking Limits on Scale and Speed

As clusters expand and workloads diversify, scheduler bottlenecks can degrade performance. Volcano v1.14 introduces a major architectural evolution to address this.

Scalable Multi-Scheduler Architecture

Traditional setups often rely on static resource division, leading to wasted capacity. Volcano v1.14 introduces a Sharding Controller that dynamically calculates resource pools for different schedulers (Batch, Agent, etc.) in real-time.

  • Key Benefit: Enables running latency-sensitive Agent tasks alongside massive training jobs on the same cluster without resource contention, ensuring high cluster utilization and cost efficiency.

High-Throughput Agent Scheduling

Standard Kubernetes scheduling often struggles with the high churn rate of AI Agents. The new Agent Scheduler (Alpha) in v1.14 provides a high-performance fast path designed specifically for short-lived, high-concurrency tasks.

Enhanced Resource Efficiency

To optimize infrastructure costs, v1.14 adds support for generic Linux OSs (Ubuntu, CentOS) and democratizes enterprise features like CPU Throttling and Memory QoS. Additionally, native support for Ascend vNPU maximizes the utilization of diverse AI hardware.

2. Kthena v0.3.0: Efficient and Scalable LLM Serving

The CNCF survey has identified AI inference as the next major cloud native workload, representing the bulk of long-term cost, value, and complexity. Kthena v0.3.0 directly addresses this challenge, introducing a specialized Data Plane and Control Plane architecture to solve the speed and cost balance for serving large models.

Optimized Prefill-Decode Disaggregation

Separating “Prefill” and “Decode” phases improves efficiency but introduces heavy cross-node traffic.

  • Key Benefit: Kthena leverages Network Topology Awareness to co-locate interdependent tasks (e.g., on the same switch). Combined with a Smart Router that recognizes KV-Cache and LoRA adapters, it ensures requests are routed with minimal latency and maximum throughput.

Simplified Deployment with ModelBooster

Deploying large models typically involves managing fragmented Kubernetes resources.

  • Key Benefit: The new ModelBooster feature offers a declarative, one-stop deployment experience. Users define the model intent once, and Kthena automates the provisioning and lifecycle management of all underlying resources, significantly reducing operational complexity.

Cost-Efficient Heterogeneous Autoscaling

Running LLMs exclusively on top-tier GPUs can be cost-prohibitive.

  • Key Benefit: Kthena’s autoscaler supports Heterogeneous Scaling, allowing the mixing of different hardware types (e.g., high-end vs. cost-effective GPUs) within strict budget constraints, optimizing the balance between performance and expenditure.

3. AgentCube: Serverless Infrastructure for AI Agents

While Kubernetes provides a solid infrastructure foundation, it lacks specific primitives for AI Agents. AgentCube bridges this gap with specialized capabilities.

Instant Startup via Warm Pools

Agents require immediate responsiveness that standard container startup times cannot match.

  • Key Benefit: AgentCube utilizes a Warm Pool of lightweight MicroVM sandboxes. This mechanism reduces startup latency from seconds to milliseconds, delivering the snappy experience users expect.

Native Session Management

AI Agents require state persistence across multi-turn interactions, unlike typical stateless microservices.

  • Key Benefit: Built-in Session Management automatically routes conversations to the correct context, seamlessly enabling stateful interactions within a stateless Kubernetes environment.

Serverless Abstraction

Developers need to focus on agent logic rather than server management.

  • Key Benefit: AgentCube provides a streamlined API for requesting secure environments (like Code Interpreters). It handles the entire lifecycle—secure creation, execution, and automated recycling—offering a true serverless experience.

Conclusion

Volcano has evolved beyond batch jobs. With v1.14, Kthena, and AgentCube, we now provide a comprehensive platform for the entire AI lifecycle—from training foundation models to serving them at scale  to powering the next generation of intelligent agents.

By embracing cloud native principles to deliver scalable, reliable infrastructure for the AI lifecycle, Volcano is contributing to the community’s goal of ensuring AI workloads behave predictably at scale. As organizations seek consistent and portable AI infrastructure (a concept championed by initiatives like the Kubernetes AI Conformance Program), Volcano is positioning itself as a core component of that solution.

We invite you to explore these new features and join us in building the future of AI infrastructure.

If you are attending KubeCon + CloudNativeCon Europe, we encourage you to stop by our booth, P-14A, in the Project Pavilion to say hi and learn more about the latest updates.

Categories: CNCF Projects

Metal3 at KubeCon + CloudNativeCon Europe 2026: Meet the CNCF’s Freshly Incubated Bare Metal Project

Mon, 03/23/2026 - 04:00

Metal3 (pronounced “metal cubed”) entered 2026 as one of the newest incubating projects in the CNCF. As the foundational layer for infrastructure management in self-hosted Kubernetes clouds, Metal3 and its ‘stack’ offer essential solutions for cloud service providers, AI-focused distributed systems, edge cloud deployments, and telecom infrastructure. Given the increasing investment in compute infrastructure worldwide, Metal3 addresses a growing number of issues faced by the modern IT industry.

From the start, Metal3 set the ambitious goal of becoming the primary tool for Kubernetes bare metal cluster management across the broader cloud native ecosystem. Real-world feedback is necessary to achieve this, and the community remains committed to increasing the project’s visibility and adoption. Metal3 is at the forefront of automated bare metal lifecycle management and the community is aiming to assist others in achieving the same level of success.

If you’re attending, KubeCon + CloudNativeCon Europe is the perfect opportunity to get better acquainted with Metal3, ask questions, and connect with maintainers and community members. This year’s conference will be one of the most active events yet for Metal3 ever, with a record number of talks and touchpoints for anyone interested in learning about the project.

A packed Metal3 presence at KubeCon + CloudNativeCon Europe

Metal3 has organized a packed presence at the conference, offering a variety of opportunities for attendees to engage with the project. For a quick overview, a concise project status update will be delivered during the lightning talk. For those interested in deeper engagement, there are two in-depth sessions focusing on the project’s governance and path to CNCF Incubation and a real-world adoption use case from the Sylva Project. Additionally, you can meet maintainers and community members for questions and hallway-track conversations at the Metal3 kiosk on the Solutions Showcase floor.

Lightning talk

The first event of the week, a lightning talk, will take place on Monday, 23 March. In classic Metal3 fashion, the community will share a quick status report of the Metal3 project, focusing on future plans toward graduation and beyond, along with highlights of major developments on the roadmap.

If you’re new to Metal3, this session is a great entry point; it’s short, focused, and gives you the “what’s happening” overview you need before you take a deeper dive.

Two in-depth sessions: governance and adoption

In addition to the lightning talk, community members will be presenting two more in-depth sessions around Metal3 governance and adoption.

1) Metal3.io’s Path to CNCF Incubation: Governance, Processes, and Community

Presented by Metal3 maintainers, this session focuses on Metal3’s journey from CNCF Sandbox to Incubation through the lens of governance, processes, and community building.

Be sure to attend if you’re interested in:

  • How Metal3 is run as an open-source project
  • What changed (or matured) during incubation readiness
  • How decisions are made and contributions flow

2) Beyond the Cloud: Managing Bare Metal the Kubernetes Way Using Metal3.io: Sylva Project as a Use Case

This talk approaches Metal3 from the viewpoint of an adopter. The hosts will explain the operational reality and practical use cases of a telco project and Metal3’s role.

Don’t miss this session if you care about:

  • What adopting Metal3 looks like in practice
  • The value proposition of Kubernetes-native bare metal lifecycle management
  • Lessons learned and patterns from real usage in a telco project

Visit the Metal3 kiosk

You can also meet maintainers and community members at the Metal3 kiosk P-21B on the Solutions Showcase floor, from Tuesday, 24 March, to the morning of Thursday, 26 March. This is a great opportunity to connect directly with the people building and operating the project. Whether you have technical queries about implementation, operational questions about running Metal3 in production, governance-related inquiries about its CNCF journey, or if you are simply curious about the project’s future, the kiosk is one of the easiest ways to get answers and context quickly.

Join the conversation

Whether you’re attending KubeCon + CloudNativeCon Europe to learn, evaluate, contribute to, or compare approaches for managing the lifecycle of bare metal Kubernetes, this event is shaping up to be a key moment for Metal3. 

Stop by the kiosk, catch the lightning talk, and join one (or both!) of the longer sessions

The community is eager to meet users and contributors and to discuss the future of bare metal Kubernetes. We welcome new contributors and adopters to our continuously growing community, inviting everyone working with bare metal Kubernetes to share their use cases and feedback. Whether you are already running Metal3 in production or just starting to explore, the community welcomes everyone’s input as an adopter, operator, or contributor. Learn more about how you can get active by visiting: https://metal3.io/contribute.html 

See you at the conference!

Categories: CNCF Projects

Crossplane and AI: The case for API-first infrastructure

Fri, 03/20/2026 - 07:00

AI-assisted development has changed the way engineers create and commit code. But writing code is no longer the bottleneck. The bottleneck is everything that happens after git push.

From infrastructure provisioning, policy enforcement, day-two operations, drift, compliance, to cross-team coordination. That still requires multiple steps, and no new tool will fix it. This is an architecture problem. AI needs APIs, not UIs, and most platforms still aren’t built that way.

Current platforms

Talk to almost any organization, and you’ll hear that the desired state lives in Git, while the actual state lives in cloud providers. Policies are buried in pipeline configs. Organizational knowledge exists in wikis no one reads and in engineers who eventually leave.

This has worked up to now because humans worked with humans to navigate the context switching and informal coordination required to get the job done. People fill in the gaps, ask the questions, and translate intent across systems.

But in a world where AI agents are embedded into our organizations, this workflow breaks down. The agent hits a wall, not because it lacks capability, but because the platform wasn’t built for programmatic access. It was built for humans who can compensate for inconsistency.

Agents require a unified, structured, machine-readable interface. They need explicit governance rules, readable historical patterns, and discoverable dependencies. Without that structure, autonomy stalls.

This screenshot titled 'One API, Everything the agent needs' details a list of human engineer versus AI agent activities including 'checks Slack' for a human engineer activity, versus 'calls platform API' for an AI agent activity.

Platforms built on declarative control

Kubernetes introduced a simple but powerful control pattern that changes this entirely. Every resource follows a consistent schema:

yaml

apiVersion: example.crossplane.io/v1
kind: Database
metadata:
  name: user-db
spec:
  engine: postgres
  storage: 100Gi

Desired state lives in spec, actual state is reflected in status, and controllers observe the difference and reconcile continuously. That reconciliation is consistent and automatic; no human is required to coordinate convergence.

Crossplane extends this model beyond containers to all infrastructure and applications: cloud databases, object storage, networking, SaaS systems, clusters, and custom platform APIs. The result isn’t just infrastructure-as-code. It’s your entire platform, infrastructure, and applications as a single API. That difference matters.

The three core elements that make this work in practice:

  • Desired state: the declarative specification of what we think the world should be. (Example: The frontend service should have 3 replicas with 2 GB of memory each.)
  • Actual state: the operational reality of what exists in the infrastructure. (Example: The frontend service has 2 healthy replicas, 1 pending.)
  • Policy: the rules and governance that constrain operations. (Example: Production changes require approval between 9 AM and 5 PM PST.)

Controllers continuously reconcile desired state with actual state, and policy is enforced at execution rather than left to manual review. Context becomes part of the system, not something external to it.

Why this model works for agents

An AI agent interacting with a Crossplane-managed platform doesn’t need to orchestrate workflows across multiple systems. It interacts with a single API surface.

It can discover resource types via the Kubernetes API, inspect status fields for real-time operational state, watch resources for change events, and submit declarative intent. Since reconciliation handles mechanical execution, agents don’t need to coordinate step-by-step logic; they just declare intent and let controllers handle convergence.

This separation of concerns is critical. Controllers handle mechanics, while agents focus on higher-level reasoning. Without a control plane, agents become fragile orchestrators. With one, they become declarative participants.

When the entire platform is accessible through a single, consistent API, the agent has everything it needs. No Slack messages and no tribal knowledge required.

Policy at the point of execution

In fragmented platforms, governance follows lots of procedures: reviews, tickets, Slack threads. In a Kubernetes-native control plane, governance is architectural.

RBAC controls who can act. Admission controllers validate changes before they’re persisted. Policy engines such as OPA and Kyverno enforce constraints at runtime. Crossplane compositions encode organizational patterns directly into APIs. Every change flows through the same enforcement path, no hidden approval steps, no undocumented exception paths.

This removes ambiguity for agents entirely. The system defines what is allowed. Agents operate within clearly defined boundaries, and the platform enforces them automatically.

Crossplane 2.0: Full-stack control

With Crossplane 2.0, compositions can include any Kubernetes resource, not just managed infrastructure. That means a single composite API can provision infrastructure, deploy applications, configure networking, set up observability, and define operational workflows,  all in one place.

apiVersion: platform.acme.io/v1
kind: Microservice
metadata:
  namespace: team-api
  name: user-service
spec:
  image: acme/user-service:v1.2.3
  database:
    engine: postgres
    size: medium
  ingress:
    subdomain: users

Behind that abstraction may live RDS instances, security groups, deployments, services, ingress rules, and monitoring resources. To a human developer or an AI agent, it’s a single API. That consistency is what enables automation to scale safely.

Day-two operations follow the same pattern. Crossplane’s Operation types bring declarative control to scheduled upgrades, backups, maintenance, and event-driven automation:

apiVersion: ops.crossplane.io/v1alpha1
kind: CronOperation
metadata:
  name: weekly-db-maintenance
spec:
  schedule: "0 2 * * 0"
  operationTemplate:
    spec:
      pipeline:
        - step: upgrade
          functionRef:
            name: function-database-upgrade

Operational workflows are now first-class API objects. Agents can inspect them, trigger them, observe their status, and propose modifications. No need for hidden runbooks.

Where to start

This doesn’t require a start-from-scratch migration. Bring core infrastructure under declarative control first. Your existing resources don’t need to be replaced; they just need to be unified behind a consistent API.

For teams using AI-assisted development, engineers express intent and iterate quickly as tools accelerate implementation. As deployment decouples from release, with changes shipping behind feature flags and systems reconciling toward the desired state, the platform must be deterministic and self-correcting, not reliant on someone catching drift or running the right command at the right time.

That is what a declarative control plane provides. Crossplane ensures that intent has somewhere safe, structured, and deterministic to land. Without it, AI will always be bolted onto human-centric workflows. With it, agents become first-class participants in infrastructure operations.

And that starts with a consistent API.

And that starts with a consistent API. Get started by checking out the Crossplane Docs, attending a community meeting, or watching CNCF’s Cloud Native Live on Crossplane 2.0 – AI-Driven Control Loops for Platform Engineering.

Categories: CNCF Projects

Sustaining open source in the age of generative AI

Tue, 03/10/2026 - 07:00

Open source has always evolved alongside shifts in technology.

From distributed version control and CI/CD, from containers to Kubernetes, each wave of tooling has reshaped how we build, collaborate, and contribute. Generative AI seems to be the newest wave and it introduces a tension that open source communities can no longer afford to ignore.

AI has made it simple to generate contributions. It has not however made the necessary review process simpler.

Recently, the Kyverno project introduced an AI Usage Policy. This decision was not driven by resistance to AI. It was driven by something far more practical: the scaling limits of human attention.

Where this conversation began

Like many governance changes in open source, this one didn’t begin with theory. It began with a Slack message.

“20 PRs opened in 15 minutes ?”

What followed was a mixture of humor, curiosity, and a familiar undertone many maintainers recognize immediately as discomfort.

“Were they good PRs?”
“Maybe they were generated by bots?”
“Are any of them helpful or are mostly they noise?”

One maintainer captured the sentiment perfectly:

“Just seeing this number is discouraging enough.”

Another jokingly suggested we might need a:

“Respect the maintainers’ life policy.”

Behind the jokes was something deeply real. Our Maintainers and our project at large were feeling the weight of something very new, very real, and clearly on the verge of changing how open source projects like ours will be maintained.

The maintainer reality few people see

Modern AI tools are extraordinary productivity amplifiers.

They generate code, documentation, tests, refactors, and design suggestions in seconds. But while output scales infinitely, review does not. The bottleneck in open source has never been code generation.

It has always been human cognition.

Every pull request, regardless of how it was produced must still be:

  • Read
  • Understood
  • Evaluated for correctness
  • Assessed for security implications
  • Considered for long-term maintainability
  • More often than not, commented on, questioned, or simply clarified
  • Viewed by more than one set of eyes
  • Merged

In open source, there is always a human in the loop. That human is typically a maintainer, a reviewer, or a combination of both.

When low-effort or poorly understood AI-generated PRs flood a project, the burden of validation shifts entirely onto the humans who bear the majority of the weight in this loop. Even the most well-intentioned contributions become costly when they lack clarity, context, demonstrated understanding, and ownership.

Low-effort AI contributions don’t just exhaust maintainers, they quietly tax every thoughtful contributor waiting in the queue.

AI boomers, AI rizz, and the reality of change

We’re currently living through a fascinating cultural split in the developer ecosystem.

On one side, we see what might playfully be called “AI boomers” otherwise known as those folks deeply skeptical of AI, hesitant to adopt it, or resistant to its growing presence in development workflows. While it might be hard to believe, there are many of these people working in and contributing to open source software development.

On the other side, we see contributors with undeniable “AI rizz.” These are enthusiastic adopters of AI eager to automate, generate, accelerate, and experiment with AI and AI tooling in the open source space and everywhere else possible.

Both reactions are understandable.

Both are human.

But history has taught us something consistent about technological change:

Projects, like businesses, that refuse to adapt rarely remain relevant.

It’s become clear that AI is not a passing trend. It is a structural shift in how software is created. Resisting it entirely is unlikely to be sustainable and blindly embracing it without guardrails is equally risky.

AI as acceleration vs. AI as substitution

Open source contributions have traditionally served as one of the most powerful learning engines in our industry. Developers deepen expertise, explore systems, build portfolios, and give back to the communities they rely on.

But it seems that the arrival of AI has changed how many contributors produce work. The unfortunate thing is that this hasn’t happened in a globally productive way, rather it has happened in a way that undermines the one thing that a meaningful contribution requires:

Understanding.

Using AI to bypass understanding is not acceleration. It’s debt for both the contributor and the project.

Superficially correct code that cannot be explained, reasoned about, or defended introduces risk. It also deprives contributors of the very growth that open source participation has historically enabled.

Across open source communities, we’re hearing the same message shared with AI touting contributors: AI can amplify learning but it cannot replace learning.

Ownership still matters — perhaps more than ever

During an internal discussion about AI-generated contributions, Jim Bugwadia, Nirmata CEO and Kyverno founder, made a deceptively simple observation about what needs to happen with AI generated and assisted contributions:

“Own your commit.”

In a world of AI-assisted development, that idea expands naturally.

If AI helped generate your contribution, you must also own your prompt and whatever is generated by it.

Ownership means:

  • Understanding intent
  • Verifying correctness
  • Taking responsibility for outcomes
  • Standing behind the change

AI can generate output but it can’t and shouldn’t assume accountability. The idea of having a human in the loop isn’t something that can or should ever be only Maintainer facing. To be fair, this concept must be Contributor facing too.

Disclosure as trust infrastructure

Transparency has always been foundational to open source collaboration.

AI introduces new complexities around licensing, copyright, provenance, and tool terms of service. Legal frameworks are still evolving, and uncertainty remains a defining characteristic of this space.

Disclosure is not about tools or bureaucracy.

Disclosure is about accountability. It is trust infrastructure.

Requiring contributors to disclose meaningful AI usage helps preserve:

  • Transparency
  • Reviewer trust
  • Licensing integrity
  • Contribution clarity
  • Responsible authorship

This approach aligns with guidance from the Linux Foundation and discussions across the broaderCNCF community, both of which acknowledge that AI-generated content can be contributed provided contributors ensure compliance with licensing, attribution, and intellectual property obligations.

When AI meets open source: Kyverno’s approach

Kyverno is not a hobby project. Our project is used globally, in production, across organizations ranging from startups to enterprise-scale companies. Adoption continues to grow, and the project is actively moving toward CNCF Graduation.

Kyverno itself exists to create:

  • Clarity
  • Safety
  • Consistency
  • Sustainable workflows 

All through policy as code.

In this case, we are applying the same philosophy to something new: AI usage.

If policy as code provides guardrails and golden paths in platform engineering, then we should be considering how to provide similar guidance in the AI-assisted development space.

Developers can’t sustainably leverage AI within open source ecosystems if projects fail to define the appropriate expectations for them to keep in mind as they develop.

AI-friendly does not mean AI-unbounded

There is an important distinction emerging across open source communities: Being AI-friendly does not mean accepting unreviewed AI output.

Maintainers themselves are often enthusiastic adopters of AI tools and rightly so. Across projects, maintainers are using AI to:

  • Accelerate repetitive tasks
  • Improve documentation
  • Generate scaffolding
  • Explore design alternatives

One emerging pattern is the use of AGENT.md-style configurations, designed to guide how AI tools interact with repositories and project conventions.

Kyverno is actively exploring similar approaches. The goal is not simply to manage AI-assisted contributions, but to improve their quality at the source.

Discomfort, growth, and privilege

AI is forcing open source communities to confront unfamiliar challenges:

  • Scaling review processes
  • Defining authorship norms
  • Navigating licensing uncertainty
  • Re-thinking contributor workflows

Discomfort is inevitable. But as Jim often reminds our team:

“Discomfort in newness is typically a sign of growth.”

The pressure to navigate these new challenges and answer these pressing questions is not a burden. Raising to this challenge is a privilege. It means:

  • Our project matters
  • The ecosystem is evolving
  • We’re participating in shaping the future

A shared challenge across open source

Kyverno’s AI policy work was informed by thoughtful discussions and examples across the ecosystem. We dove into a variety of projects, each reflecting different constraints and priorities for us to keep in mind as we embark on our own journey.

Moving forward, what matters most, is that communities and community members from different projects and industries around the globe engage deliberately with these questions rather than simply responding reactively to the tooling.

Open source sustainability increasingly depends on shared governance patterns, not isolated experimentation.

An invitation to the ecosystem

AI is not going away, nor should it.

The question is not whether AI belongs in open source. The question is how we integrate it responsibly.

Sustainable open source in the AI era requires:

  • Human ownership
  • Transparent authorship
  • Respect for reviewer time
  • Context-aware contributions
  • Community-driven guardrails

AI is a powerful tool. But open source remains at its core, a human system.

While AI changes the tools and accelerates output, it does not change the responsibility.

Acknowledgements and influences

Kyverno’s AI Usage Policy was shaped by the openness and thoughtfulness of many communities and leaders, including:

Open source benefits enormously when governance knowledge is shared. Thanks to everyone who has already shared and to those who will help us continue to adapt our AI policies as we grow our project.

Categories: CNCF Projects

Scaling organizational structure with Meshery’s expanding ecosystem

Wed, 03/04/2026 - 07:00

An image of the green Meshery logo and 'Meshery Extensions' title, with a CNCF logo in the bottom right hand corner of the image.

As a high velocity project and one of the fastest-growing projects in the CNCF ecosystem, Meshery’s increasing scale and community contributions necessitates this recognition, which requires a revision to its governance and organizational structure that better aligns with the scale of its growing complexity and community contributions. To best serve its expansive ecosystem, Meshery maintainers have opted to partition the numerous GitHub repositories into two distinct organizations: github.com/meshery for the core platform and github.com/meshery-extensions for extensions and integrations.

This post explains the rationale behind the shift, outlining the proposed governance structure, setting expectations around support, and describing project mechanics, drawing inspiration from other successful CNCF projects.

Rationale for Repository Partitioning

The decision to partition repositories aims to improve project structure, manageability, scalability, and community engagement. 

Project architecture

Meshery is a highly extensible, self-service management platform. Every feature is developed with extensibility in mind, as is evident by the ubiquity of extension points throughout Meshery’s architecture.

Modularity and focus

Separating the core platform from extensions allows the Meshery core team to concentrate on maintaining and enhancing the primary platform, which includes critical components like Meshery Operator and MeshSync. Extensions, such as adapters for specific cloud native technologies, can be developed and maintained independently by community contributors or specialized teams. This modularity ensures that the core platform remains robust and focused.

Project scalability

With support for over 300 integrations and counting, managing everything under one GitHub organization has become impractical. A separate organization for extensions simplifies permission management, contribution processes, and release cycles, making the ecosystem more scalable.

  • Community ownership and maintenance: Projects within meshery-extensions are generally initiated, developed, and maintained by members of the community, rather than the core maintainers. This allows the ecosystem to scale beyond what the core team can directly support.
  • Clearer support expectations: Distinguishing between the core and extensions makes it clear that projects in meshery-extensions have different maintenance levels, release cadences, and support guarantees compared to the core components. This clarifies that users are relying on community support for these specific integrations.

Community engagement

By providing a dedicated space for extensions, Meshery encourages community contributions, as developers can create and maintain extensions without needing deep involvement in the core platform’s development.  With this approach, meshery-extensions fosters a vibrant ecosystem around Meshery by providing a designated, community-centric space for extensions, integrations, and tooling, keeping the core project focused and manageable while enabling broad community participation.

  • Incubation and experimentation: The separate organization acts as an incubator for new ideas, providers, or tooling related to Meshery. Projects can start here and, if they gain significant traction and stability, will be considered for migration or closer integration with the core project.
  • Ecosystem growth: Part of Meshery’s power lies in its ability to manage any infrastructure via Providers, Models, Adapters, and its other extension points. Since there are countless APIs and services, meshery-extensions serves as the place where the community can build and share Providers for less common cloud services, specific SaaS platforms, or even internal company APIs, without needing official endorsement or maintenance from the core maintainers.

Governance Structure

The new structure allows for different governance models and maintainer structures for community projects compared to the core project. Meshery can adopt a governance model that balances control over the core platform with flexibility for extensions, drawing from its existing governance and the Kubernetes’ SIG model.

Core Platform (github.com/meshery)

  • Governance: Governed by the core Meshery maintainers, as outlined in the project’s governance document. Roles include contributors, organization members, and maintainers, with clear processes for becoming a maintainer (e.g., nomination, voting by existing maintainers).
  • Responsibilities: Maintainers review, approve, and merge pull requests, manage releases, and ensure the platform’s stability and alignment with CNCF standards.
  • Decision-making: Decisions are made through consensus among maintainers, with regular meetings and transparent communication via Slack and community forums.

Extensions (github.com/meshery-extensions)

  • Governance: Each extension may have its own maintainers and a lighter governance structure to encourage innovation. A review process by the core team ensures extensions meet quality and compatibility standards.
  • Maintainer selection: Extension maintainers can be nominated by community members or self-nominated, with approval from the core team based on contribution history and technical expertise.
  • Autonomy: Extension teams have autonomy over their development processes, provided they adhere to Meshery’s code of conduct and integration guidelines.

Oversight and Coordination

  • Steering committee: A steering committee, composed of core maintainers and representatives from active extension teams, oversees cross-organization alignment, resolves conflicts, and approves new extensions.
  • Transparency: Both organizations maintain open communication with public meeting minutes, discussion forums, and regular updates to the community.
AspectCore PlatformExtensionsGovernanceStructured, led by core maintainersFlexible, per-extension maintainersMaintainer SelectionNomination, 2/3rds majority voteNomination, core team approvalDecision-MakingConsensus among maintainersExtension team consensus, core oversightCommunicationPublic meetings, Slack, forumsPublic issues, Slack, optional meetings

Delineated support expectations

Support expectations differ between the core platform and extensions to reflect their distinct roles and maintenance models.

Core platform

  • Full support: The core team provides regular updates, bug fixes, and feature enhancements, ensuring stability for critical components like Meshery Operator and MeshSync.
  • Documentation: Comprehensive guides, such as installation instructions and CLI usage, are maintained (Meshery Documentation).
  • Community support: Active engagement through Slack, forums, and weekly newcomer meetings to support users and contributors.

Extensions

  • Variable support: Core team-maintained extensions receive robust support, while community-maintained ones may have limited support.
  • Clear labeling: Documentation should indicate the support level (e.g., “Official” vs. “Community”) for each extension.
  • Integration support: The core platform provides stable APIs and extension points, ensuring compatibility, with guidelines for developers (Meshery Extensions).

Project mechanics

Managing two organizations involves distinct development, testing, and integration processes to ensure a cohesive ecosystem.

Development process

  • Platform: Follows a structured release cycle with stable and edge channels. Changes undergo rigorous review to sustain stability. Notify platform extenders and system integrators of upcoming changes in the underlying framework to ensure time is afforded to maintain compatibility.
  • Extensions: Operate on independent release cycles, allowing rapid iteration. Developers use Meshery’s extension points to integrate with the core platform, following contribution guidelines.

Integration testing

  • Compatibility testing: Extensions are tested against multiple core platform versions to deliver compatibility, using guidance for verifying compatibility between core platform and extensions. 
  • Automated pipelines: GitHub Actions automate testing and snapshot generation, as seen in extensions like Helm Kanvas Snapshot.
  • Performance testing: Meshery’s performance management features can be used to benchmark extensions, ensuring they meet efficiency standards.

Documentation and resources

  • Comprehensive guides: Documentation covers core platform usage, extension development, and integration (Meshery Docs). The Newcomers’ Guide and MeshMates program aid onboarding (Meshery Community).
  • Catalog and templates: Meshery’s catalog of design templates includes extension configurations, and promoting best practices (Meshery Catalog).
  • Community resources: Weekly meetings, Slack channels, and the community handbook provide ongoing support.

Reflections on other projects

Meshery’s expansion strategy draws inspiration from successful models within the Cloud Native Computing Foundation (CNCF), like Argo, Crossplane, and Kubernetes. These projects demonstrate effective approaches to decentralized governance and focused development through the separation of core and community-contributed components.

Meshery aims to emulate Crossplane’s model of maintaining a clear distinction between its core platform (github.com/crossplane) and community contributions (github.com/crossplane-contrib). This separation allows third-party developers to extend Crossplane’s capabilities without affecting the core’s stability, a model that supports Meshery’s approach to fostering innovation while maintaining a reliable core.

Similarly, Meshery Extension teams operate with autonomy over their development processes, provided they adhere to Meshery’s core component frameworks and integration guidelines. This mirrors Argo’s model (github.com/argoproj-labs), where projects function independently but align with broader project goals.

Kubernetes provides a robust model for decentralized governance through its use of github.com/kubernetes for core components and github.com/kubernetes-sigs for Special Interest Groups (SIGs). Each SIG acts as a mini-community with its own charter, leadership, and processes, all while aligning with overarching project goals, as outlined in theKubernetes Governance. Meshery’s extension organization can adopt a similar structure, enabling extension teams to operate autonomously within defined guidelines.

Meshery umbrella expands

See the current list of repositories under each organization: meshery org repos and meshery-extensions org repos.

By partitioning repositories into github.com/meshery andgithub.com/meshery-extensions, Meshery is taking a strategic step towards the overarching goal of improved modularity, scalability, and community engagement.

By adopting a governance structure that balances control and flexibility, delineating clear support expectations, and implementing robust project mechanics, Meshery can effectively manage its growing ecosystem. Drawing inspiration from graduated projects, Meshery is poised to remain a leading CNCF project—empowering collaborative cloud native management.

Categories: CNCF Projects

Kubernetes WG Serving concludes following successful advancement of AI inference support

Thu, 02/26/2026 - 08:30

The Kubernetes Working Group (WG) Serving was created to support development of the AI inference stack on Kubernetes. The goal of this working group was to ensure that Kubernetes is an orchestration platform of choice for inference workloads. This goal has been accomplished, and the working group is now being disbanded.

WG Serving formed workstreams to collect requirements from various model servers, hardware providers, and inference vendors. This work resulted in a common understanding of inference workload specifics and trends and laid the foundation for improvements across many SIGs in Kubernetes.

The working group oversaw several key evolutions related to load balancing and workloads. The inference gateway was adopted as a request scheduler. Multiple groups have worked to standardize AI gateway functionality, and early inference gateway participants went on to seed agent networking work in SIG Network.

The use cases and problem statements gathered by the working group informed the design of AIBrix.

Many of the unresolved problems in distributed inference — especially benchmarking and recommended best practices — have been picked up by the llm-d project, which hybridizes the infrastructure and ML ecosystems and is better able to steer model server co-evolution.

In particular, llm-d and AIBrix represent more appropriate forums for driving requirements to Kubernetes SIGs than this working group. llm-d’s goal is to provide well-lit paths for achieving state-of-the-art inference and aims to provide recommendations that can compose into existing inference user platforms. AIBrix provides a complete platform solution for cost-efficient LLM inference.

WG Serving helped with Kubernetes AI Conformance requirements. The llm-d project is leveraging multiple components from the profile and making recommendations to end users consistent with Kubernetes direction (including Kueue, inference gateway, LWS, DRA, and related efforts). Widely adopted patterns and solutions are expected to go into the conformance program.

All efforts currently running inside WG Serving can be migrated to other working groups or directly to SIGs. Requirements will be discussed in SIGs and in the llm-d community. Specifically:

  • Autoscaling-related questions — mostly related to fast bootstrap — will be discussed in SIG Node or SIG Scheduling.
  • Multi-host, multi-node work can continue as part of SIG Apps (for example, for the LWS project), and DRA requirements will be discussed in WG Device Management.
  • Orchestration topics will be covered by SIG Scheduling and SIG Node.
  • Requirements for DRA will be discussed in WG Device Management.

The Gateway API Inference Extension project is already sponsored by SIG Network and will remain there. The Serving Catalog work can be moved to the Inference Perf project. Originally it was designed for a larger scope, but it has been used mostly for inference performance.

The Inference Perf project is sponsored by SIG Scalability, and no change of ownership is needed.

CNCF thanks all contributors who participated in WG Serving and helped advance Kubernetes as a platform for AI inference workloads.

Categories: CNCF Projects

Exposing Spin apps on SpinKube with GatewayAPI

Thu, 02/26/2026 - 07:00

The Gateway API isn’t just an “Ingress v2”, it’s an entirely revamped approach for exposing services from within Kubernetes and eliminates the need of encoding routing capabilities into vendor-specific, unstructured annotations. In this post, we will explore how to expose WebAssembly applications built using the CNCF Spin framework and served by SpinKube using the Gateway API.

What is SpinKube

SpinKube, a CNCF sandbox project, is an open-source stack for running serverless WebAssembly applications (Spin apps) on top of Kubernetes. Although SpinKube leverages Kubernetes primitives like Deployments, Services and Pods, there are no containers involved for running your serverless Spin apps at all. Instead, it leverages a containerd-shim implementation and spawns processes on the underlying Kubernetes worker nodes for running Spin apps.

You can learn more about SpinKube and find detailed instructions on how to deploy SpinKube to your Kubernetes cluster at https://spinkube.dev.

What is Gateway API

The Gateway API is the modern, role-oriented successor to the legacy Ingress resource, designed to provide a more expressive and extensible networking interface for Kubernetes. Unlike Ingress, which often relies on a messy sprawl of vendor-specific annotations to handle complex logic, the Gateway API breaks traffic management into atomic resources —GatewayClass, Gateway, and routes (like HTTPRoute or GRPCRoute).

This separation allows infrastructure admins to manage the entry points while giving developers control over how their specific services are exposed, enabling native support for advanced traffic patterns like canary rollouts, header-based routing, and traffic mirroring without the need for bespoke configurations.

To dive deeper into the technical specifications and resource hierarchy, head over to the official Gateway API documentation.

Provisioning a Kubernetes cluster, installing SpinKube and implementing Spin apps are considered beyond the scope of this article. However, you can head over to https://github.com/akamai-developers/exposing-spin-apps-with-gatway-api – a repository containing all source code, along with the necessary instructions for setting up a LKE cluster with SpinKube.

To follow the article’s demo, you’ll deploy the required artifacts to your Kubernetes cluster. Make sure you have the following tools installed:

Build and deploy the Spin apps to Kubernetes

Let’s start by compiling the source code of our sample Spin apps down to WebAssembly. Doing so is as easy as executing the spin build command from within each application folder:

# Build the greeter application
pushd apps/greeter
spin build

 Building component greeter with `cargo build --target wasm32-wasip1 --release`
     Finished `release` profile [optimized] target(s) in 0.21s
 Finished building all Spin components

popd

# Build the prime_numbers application
pushd apps/prime-numbers
spin build

  Building component prime-numbers with `cargo build --target wasm32-wasip1 --release`
    Finished `release` profile [optimized] target(s) in 0.18s
  Finished building all Spin components
popd

Once the application has been compiled, we use the spin registry push to distribute it as OCI artifact. (If your OCI compliant registry requires authentication, you must login first. Use the spin registry login to authenticate before trying to push).

Tip: For testing purposes, we’ll use ttl.sh an anonymous and ephemeral OCI compliant registry, which allows us to store our applications for 24 hours by simply specifying the TTL as a tag.

# specify variables
greeter_app_artifact=ttl.sh/spin-greeter:24h
primenumbers_app_artifact=ttl.sh/spin-prime-numbers:24h

# optional: Authenticate against registry
oci_reg_server=
oci_reg_user=
oci_reg_password=
spin registry login $oci_reg_server -u $oci_reg_user -p $oci_reg_password

# distribute the Spin applications
pushd apps/greeter
spin registry push $greeter_app_artifact --build
popd

pushd apps/prime-numbers
spin registry push $primenumbers_app_artifact --build
popd

Finally, we use the spin kube scaffold command for generating the necessary Kubernetes manifests.

Tip: Spin does not have any opinions on how you deploy resources to your Kubernetes cluster. You can either use kubectl, create a Helm chart and deploy it using the helm CLI, or describe the desired state and deploy it with GitOps.

For the sake of this article, we’ll simply pipe the generated manifest to kubectl apply. The actual manifests are shown here for illustration purposes:

# Deploy the Spin applications to Kubernets
spin kube scaffold --from $greeter_app_artifact | kubectl apply -f -
spin kube scaffold --from $primenumbers_app_artifact | kubectl apply -f -
apiVersion: core.spinkube.dev/v1alpha1
kind: SpinApp
metadata:
  name: spin-greeter
spec:
  image: "ttl.sh/spin-greeter:24h"
  executor: containerd-shim-spin
  replicas: 2
---
apiVersion: core.spinkube.dev/v1alpha1
kind: SpinApp
metadata:
  name: spin-prime-numbers
spec:
  image: "ttl.sh/spin-prime-numbers:24h"
  executor: containerd-shim-spin
  replicas: 2

Obviously, there are additional knobs you can turn when executing spin kube scaffold, I highly encourage you to checkout the documentation for the command by providing the --help flag.

Testing the Spin app

We use traditional port-forwarding provided by kubectl to verify that both Spin applications runs as expected:

kubectl port-forward svc/spin-greeter 8080:80

Sent a GET request to the application using curl:

curl -i localhost:8080/hello/Akamai%20Developers

HTTP/1.1 200 OK
content-type: text/plain
transfer-encoding: chunked
date: Mon, 19 Jan 2026 13:55:34 GMT

Hello, Akamai Developers!

Next, let’s test the second Spin application:

kubectl port-forward svc/spin-prime-numbers 8080:80



Again, use curl to invoke one of the endpoints exposed by the Spin app:

curl -i localhost:8080/above/42

HTTP/1.1 200 OK
transfer-encoding: chunked
date: Mon, 19 Jan 2026 17:05:02 GMT

Next prime number above 42 is 43


Now that both apps are working, you can terminate port-forwarding again (`CTRL+C) and dive into exposing both Spin apps.

Installing Gateway API CRDs and  Controller

To use the Gateway API, we must install the corresponding Gateway API resources (CRDs) on our cluster along with a Gateway API Controller.

There are several controllers available that implement the Gateway API. You can find a list of available Gateway API controllers at https://gateway-api.sigs.k8s.io/implementations/. We’ll use NGINX Gateway Fabric for now.

To install Gateway API resources run:

kubectl kustomize "https://github.com/nginx/nginx-gateway-fabric/config/crd/gateway-api/standard?ref=v2.3.0" | kubectl apply -f -

To install NGINX Gateway Fabric run:

helm install ngf oci://ghcr.io/nginx/charts/nginx-gateway-fabric --create-namespace -n nginx-gateway


Creating cluster-specific Gateway API resources

With the Gateway API controller installed, we will first deploy a Gateway to our cluster. Think of the Gateway as an entry point into your Kubernetes cluster, which could be shared across multiple applications. We’ll now create the spinkube Gateway, which will front our two Spin applications that are already running in the default namespace.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
	name: spinkube
	namespace: default
spec:
	gatewayClassName: nginx
	listeners:
	- protocol: HTTP
	  port: 8080
	  name: http
	  allowedRoutes:
	  	namespaces:
	  		from: Same

Once you’ve deployed the Gateway, you should find a new service being provisioned to the default namespace called spinkube-nginx of type LoadBalancer once the cloud controller has acquired a public IP address, you should find it as part of the output as well.

kubectl get services

NAME            TYPE         EXTERNAL-IP
spinkube-nginx  LoadBalancer 172.238.61.25


Note down the external IP address of the spinkube-nginx service, we’ll use it in a few minutes to send requests to our Spin applications from outside of the cluster!

Creating application-specific Gateway API Resources

As we have deployed two different Spin applications to our Kubernetes cluster, we’ll also create two instances of HTTPRoute and link them to the Gateway we created in the previous section.

Tip: As managing external DNS is beyond the scope of this article, we’ll use simple PathPrefix based routing in combination with a Rewrite filter to route inbound requests to the desired Spin applications.

Create the following HTTPRoute resources in the default namespace:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: greeter
  namespace: default
spec:
  parentRefs:
  - name: spinkube
  rules:
  - backendRefs:
    - name: spin-greeter
      port: 80
    filters:
    - type: URLRewrite
      urlRewrite:
        path:
          replacePrefixMatch: /
          type: ReplacePrefixMatch
    matches:
    - path:
        type: PathPrefix
        value: /greeter
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: prime-numbers
  namespace: default
spec:
  parentRefs:
  - name: spinkube
  rules:
  - backendRefs:
    - name: spin-prime-numbers
      port: 80
    filters:
    - type: URLRewrite
      urlRewrite:
        path:
          replacePrefixMatch: /
          type: ReplacePrefixMatch
    matches:
    - path:
        type: PathPrefix
        value: /prime-numbers

Accessing the Spin apps

Having all Kubernetes resources in place, it’s time for a final test. We discovered the public IP address associated with our Gateway earlier in this post. Let’s use curl again for sending requests to both Spin application:

# Send request to the greeter app
curl -i http:///<your_gateway_ip>:8080/greet/hello/Akamai%20Developers

HTTP/1.1 200 OK
Server: nginx
Date: Mon, 19 Jan 2026 16:37:22 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive

Hello, Akamai Developers!


# Send request to the prime-numbers app
curl -i http://<your_gateway_ip>:8080/prime-numbers/above/999

HTTP/1.1 200 OK
Server: nginx
Date: Mon, 19 Jan 2026 16:37:50 GMT
Transfer-Encoding: chunked
Connection: keep-alive

Next prime number above 999 is 1009


As you can see, our requests get routed to the desired Spin application because of the path prefix (either greeter or prime-numbers).

Conclusion

The Kubernetes Gateway API streamlines how we expose services from within a Kubernetes cluster and allows precise separation of concerns. Cloud infrastructure and cluster operators create and manage resources that could be shared across multiple applications like the Gateway, while application developers provide application (or service) specific resources such as an HTTPRoute.

Especially when running tens or hundreds of different serverless applications on top of SpinKube it’s crucial to have robust and reliable routing in place to ensure applications are accessible from outside of the cluster. The Gateway API for Kubernetes makes managing these a breeze.

Contributors from Akamai collaborate on SpinKube development to deliver this runtime across its global cloud and edge. Additional information is available.at akamai.com.







Categories: CNCF Projects

Making Harbor production-ready: Essential considerations for deployment

Tue, 02/24/2026 - 07:00

Harbor is an open-source container registry that secures artifacts with policies and role-based access control, ensuring images are scanned for vulnerabilities and signed as trusted. To learn more about Harbor and how to deploy it on a Virtual Machine (VM) and in Kubernetes (K8s), refer to parts 1 and 2 of the series.

Flow chart showcasing the Harbor container registry from development team through to K8s Cluster

While deploying Harbor is straightforward, making it production-ready requires careful consideration of several key aspects. This blog outlines critical factors to ensure your Harbor instance is robust, secure, and scalable for production environments.

For this blog, we will focus on Harbor deployed on Kubernetes via Helm as our base and provide suggestions for this specific deployment.

1. High Availability (HA) and scalability

For a production environment, single points of failure are unacceptable, especially for an image registry that will act as a central repository for storing and pulling images and artifacts for development and production applications. Thus, implementing high availability for Harbor is crucial and involves several key considerations:

  • Deploy with an Ingress: Configure a Kubernetes Service of type Ingress controller (e.g. Traefik) in front of your Harbor instances to distribute incoming traffic efficiently and provide a unified entry point along with cert-manager for certificate management. You can specify this in your values.yaml file under:
expose:
  type: ingress
  tls:
    enabled: true
    certSource: secret 
  ingress:
    hosts:
      core: harbor.yourdomain.com
    annotations:
      # Specify your ingress class
      kubernetes.io/ingress.class: traefik
      # Reference your ClusterIssuer (e.g., self-signed or internal CA)
      cert-manager.io/cluster-issuer: "harbor-cluster-issuer"

To locate your values.yaml file, refer to the previous blog.

  • Utilize multiple Harbor instances: Increase the replica count for critical Harbor components (e.g., core, jobservice, portal, registry, trivy) in your values.yaml to ensure redundancy.
core:
  replicas: 3
jobservice:
  replicas: 3
portal:
  replicas: 3
registry:
  replicas: 3
trivy:
  replicas: 3

# While not strictly for the HA of the registry itself, consider increasing exporter replicas for robust monitoring availability
exporter:
  replicas: 3

# Optionally, if using Ingress, consider increasing the Nginx replicas for improving Ingress availability
nginx:
  replicas: 3

Configure shared storage: For persistent data, configure Kubernetes StorageClasses and PersistentVolumes to use shared storage solutions like vSAN or a distributed file system. Specify these in your values.yaml under:

persistence:
  enabled: true
  resourcePolicy: "keep"
  persistentVolumeClaim:
    registry:
      #If left empty, the kubernetes cluster default storage class will be used
      storageClass: "your-storage-class"
     jobservice:
       storageClass: "your-storage-class"
     database:
       storageClass: "your-storage-class"
    redis:
      storageClass: "your-storage-class"
    trivy:
      storageClass: "your-storage-class"
  • Enable database HA (PostgreSQL): While Harbor comes with a built-in PostgreSQL database, it is not recommended for production use as it: 
  1. Lack of high availability (HA): The default internal PostgreSQL setup within the Harbor Helm chart is typically a single instance. This creates a single point of failure. If that database pod goes down, your entire Harbor instance will be unavailable.
  1. Limited scalability: An embedded database is not designed for independent scaling. If your Harbor usage grows, you might hit database performance bottlenecks that are difficult to address without disrupting Harbor itself.
  1. Complex lifecycle management: Managing backups, point-in-time recovery, patching, and upgrades for a stateful database directly within an application’s Helm chart can be significantly more complex and error-prone than with dedicated database solutions.

Thus, it is recommended to deploy a highly available PostgreSQL cluster within Kubernetes (e.g., using a Helm chart for Patroni or CloudNativePG) or leverage a managed database service outside the cluster. Configure Harbor to connect to this HA database by updating the values.yaml:

database:
  type: "external"
  external:
    host: "192.168.0.1"
    port: "5432"
    username: "user"
    password: "password"
    coreDatabase: "registry"
    # If using an existing secret, the key must be "password"
    existingSecret: ""
    # "disable" - No SSL
    # "require" - Always SSL (skip verification)
    # "verify-ca" - Always SSL (verify that the certificate presented by the
    # server was signed by a trusted CA)
    # "verify-full" - Always SSL (verify that the certification presented by the
    # server was signed by a trusted CA and the server host name matches the one
    # in the certificate)
    sslmode: "verify-full"

Implement Redis HA: Deploy a highly available Redis cluster in Kubernetes (e.g., using a Helm chart for Redis Sentinel or Redis Cluster) or utilize a managed Redis service. Configure Harbor to connect to this HA Redis instance by updating redis.type and connection details in values.yaml.

redis:
  type: external
  external:
    addr: "192.168.0.2:6397"
    sentinelMasterSet: ""
    tlsOptions:
      enable: true
    username: ""
    password: ""

2. Security best practices

Security is paramount for any production system, especially a container registry.

​​Enable TLS/SSL: Always enable TLS/SSL for all Harbor components.

expose:
  tls:
    enabled: true
    certSource: auto # change to manual if using cert-manager
    auto:
      commonName: ""
internalTLS:
  enabled: true
  strong_ssl_ciphers: true
  certSource: "auto"
  core:
    secretName: ""
  jobService:
    secretName: ""
  registry:
    secretName: ""
  portal:
    secretName: ""
  trivy:
    secretName: ""


Configure authentication and authorization: Leverage Harbor’s supported Authentication and Authorization mechanisms for managing access to Harbor resources. After Harbor deployment, integrate Harbor with enterprise identity providers like LDAP or OIDC by following the Harbor configuration guides: Configure LDAP/Active Directory Authentication or Configure OIDC Provider Authentication.

A screenshot of the Harbor website showcasing the configuration panel.

Implement vulnerability scanning: Ensure vulnerability scanning is enabled in values.yaml. Harbor uses Trivy by default. Verify its activation and configuration within the Helm chart.

trivy:
 enabled: true
A screenshot of the Harbor website showcasing the Library panel from the system admin view

Activate content trust: Harbor supports multiple content trust mechanisms to ensure the integrity of your artifacts. For modern OCI artifact signing, Cosign and Notation are recommended. Enforce deployment security at the project level within the Harbor UI or via the Harbor API to allow only verified images to be deployed. This ensures that only trusted and cryptographically signed images can be deployed.

A screenshot of the library panel on the Harbor website from the system admin view. This image shows the project registry information, proxy cache and deployment security panel.
  • Maintain regular updates: Regularly update your Harbor Helm chart and underlying Kubernetes components to benefit from the latest security patches and bug fixes. Use helm upgrade for this purpose.
  • Use robot accounts for automation: Use robot accounts (service accounts) in automation such as CI/CD pipelines to avoid using user credentials. This ensures the robot account with the least required privileges is used to perform the specific task it has been created for, ensuring limited scope.
  • Fine grained audit log: In Harbor v2.13.0, Harbor supports the re-direction of specific events in the audit log.  For example, an “authentication failure”  event can be configured in the audit log and forwarded to a 3rd party syslog endpoint.

3. Storage considerations

Efficient and reliable storage is critical for Harbor’s performance and stability.

  • Choose appropriate storage type: Define Kubernetes StorageClasses that align with your underlying infrastructure (e.g., nfs-client, aws-ebs, azure-disk, gcp-pd). Specify these settings in your values.yaml: 
persistence:
  enabled: true
  resourcePolicy: "keep"
  imageChartStorage:
    #Specify storage type: "filesystem", "azure", "gcs", "s3", "swift", "oss"
    type: ""
    #Configure specific storage type section based on the selected option
  • Estimate storage sizing: Carefully calculate your storage needs based on the anticipated number and size of container images, as well as your defined retention policies. Configure the size for your PersistentVolumeClaims in values.yaml.
  • Implement robust backup and recovery: Establish a comprehensive backup strategy for all Harbor data. For Kubernetes-native backups, consider using tools like Velero to back up PersistentVolumes and Kubernetes resources. For object storage, leverage the cloud provider’s backup mechanisms or external backup solutions. Regularly test your recovery procedures.
  • Configure and run garbage collection: Set up and routinely execute Harbor’s garbage collection. This can be configured through the Harbor UI by defining a schedule for automated runs to remove unused blobs and efficiently reclaim storage space.
A screenshot of the Harbor website showcasing the 'Clean Up' panel including garbage collection and log rotation.

4. Monitoring and alerting

Proactive monitoring and alerting are essential for identifying and addressing issues before they impact users.
Collect Comprehensive Metrics: Deploy Prometheus and configure it to scrape metrics from Harbor components. The Harbor Helm chart exposes Prometheus-compatible endpoints in the values.yaml file. Visualize these metrics using Grafana.

metrics:
  enabled: true
  core:
    path: /metrics
    port: 8001
  registry:
    path: /metrics
    port: 8001
  jobservice:
    path: /metrics
    port: 8001
  exporter:
    path: /metrics
    port: 8001
  serviceMonitor:
    enabled: true
    # This label ensures the prometheus operator picks up these monitors
    additionalLabels:
      release: kube-prometheus-stack

# Example Service Monitor objects:

# Harbor Core (API and Auth Performance)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: harbor-core
  labels:
    app: harbor
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: harbor
      component: core
  endpoints:
  - port: metrics # Defaults to 8001
    path: /metrics
    interval: 30s

# Harbor Exporter (Business Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: harbor-exporter
  labels:
    app: harbor
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: harbor
      component: exporter
  endpoints:
  - port: metrics
    path: /metrics
    interval: 60s # Scraped less frequently as these are high-level stats
  • Centralized logging: Implement a centralized logging solution within Kubernetes, such as the ELK stack (Elasticsearch, Logstash, Kibana) or Grafana with Fluentd/Fluent Bit. 
  • Configure critical alerts: Set up alerting rules in Prometheus (Alertmanager) or Grafana for critical events, such as component failures, high resource utilization (CPU/memory limits), storage nearing capacity, failed vulnerability scans, or unauthorized access attempts. Define these thresholds based on your production requirements.

5. Network configuration

Proper network configuration ensures smooth communication between Harbor components and external clients.

  • Configure ingress or load balancer and DNS resolution: As already mentioned, deploy a Kubernetes Ingress controller or Load Balancer to expose Harbor externally. Ensure proper DNS records are configured to point to your Load Balancer’s IP address.
  • Set Up proxy settings (if applicable): If Harbor components need to access external resources through a corporate proxy, configure proxy settings within values.yaml. It’s crucial to note that the proxy.components field explicitly defines which Harbor components (e.g., core, jobservice, trivy) will utilize these proxy settings for their external communications.
proxy:
  httpProxy:
  httpsProxy:
  noProxy: 127.0.0.1,localhost,.local,.internal
  components:
    - core
    - jobservice
    - trivy
  • Allocate sufficient bandwidth: Ensure your Kubernetes cluster’s underlying network infrastructure and nodes have sufficient bandwidth to handle peak image pushes and pulls. Monitor network I/O on nodes running Harbor pods.

Conclusion

By diligently addressing these considerations, you can transform your basic Harbor deployment into a robust, secure, and highly available production-ready container registry. This approach ensures that Harbor serves as a cornerstone of your cloud-native infrastructure, capable of supporting demanding development and production workflows. From implementing High Availability and stringent security measures to optimizing storage and establishing proactive monitoring, each step contributes to a resilient and efficient artifact management system. 

Continue reading the Harbor Blog Series on cncf.io:

Blog 1 – Harbor: Enterprise-grade container registry for modern private cloud

Blog 2 – Deploying Harbor on Kubernetes using Helm

Categories: CNCF Projects

Announcing Kyverno 1.17!

Wed, 02/18/2026 - 07:11

Kyverno 1.17 is a landmark release that marks the stabilization of our next-generation Common Expression Language (CEL) policy engine.

While 1.16 introduced the “CEL-first” vision in beta, 1.17 promotes these capabilities to v1, offering a high-performance, future-proof path for policy as code.

This release focuses on “completing the circle” for CEL policies by introducing namespaced mutation and generation, expanding the available function libraries for complex logic, and enhancing supply chain security with upcoming Cosign v3 support.

A new look for kyverno.io

The first thing you’ll notice with 1.17 is our completely redesigned website. We’ve moved beyond a simple documentation site to create a modern, high-performance portal for platform engineers.Let’s be honest: the Kyverno website redesign was long overdue. As the project evolved into the industry standard for unified policy as code, our documentation needs to reflect that maturity. We are proud to finally unveil the new experience at https://kyverno.io.

 A new era, a new website' showcasing the changes from the old Kyverno website to the new website, with Kyverno 1.17.

  • Modern redesign
    Built on the Starlight framework, the new site is faster, fully responsive, and features a clean, professional aesthetic that makes long-form reading much easier on the eyes.
  • Enhanced documentation structure
    We’ve reorganized our docs from the ground up. Information is now tiered by “User Journey”—from a simplified Quick Start for beginners to deep-dive Reference material for advanced policy authors.
  • Fully redesigned policy catalog
    Our library of 300+ sample policies has a new interface. It features improved filtering and a dedicated search that allows you to find policies by Category (Best Practices, Security, etc.) or Type (CEL vs. JMESPath) instantly.
  • Enhanced search capabilities
    We’ve integrated a more intelligent search engine that indexes both documentation and policy code, ensuring you get the right answer on the first try.
  • Brand new blog
    The Kyverno blog has been refreshed to better showcase technical deep dives, community case studies, and release announcements like this one!

Namespaced mutating and generating policies

In 1.16, we introduced namespaced variants for validation, cleanup, and image verification.

Kyverno 1.17 completes this by adding:

  • NamespacedMutatingPolicy
  • NamespacedGeneratingPolicy

This enables true multi-tenancy. Namespace owners can now define their own mutation and generation logic (e.g., automatically injecting sidecars or creating default ConfigMaps) without requiring cluster-wide permissions or affecting other tenants.

CEL policy types reach v1 (GA)

The headline for 1.17 is the promotion of CEL-based policy types to v1. This signifies that the API is now stable and production-ready.

The promotion includes:

  • ValidatingPolicy and NamespacedValidatingPolicy
  • MutatingPolicy and NamespacedMutatingPolicy
  • GeneratingPolicy and NamespacedGeneratingPolicy
  • ImageValidatingPolicy and NamespacedImageValidatingPolicy
  • DeletingPolicy and NamespacedDeletingPolicy
  • PolicyException

With this graduation, platform teams can confidently migrate from JMESPath-based policies to CEL to take advantage of significantly improved evaluation performance and better alignment with upstream Kubernetes ValidatingAdmissionPolicies / MutatingAdmissionPolicies.

New CEL capabilities and functions

To ensure CEL policies are as powerful as the original Kyverno engine, 1.17 introduces several new function libraries:

  • Hash Functions
    Built-in support for md5(value), sha1(value), and sha256(value) hashing.
  • Math Functions
    Use math.round(value, precision) to round numbers to a specific decimal or integer precision.
  • X509 Decoding
    Policies can now inspect and validate the contents of x509 certificates directly within a CEL expression using x509.decode(pem).
  • Random String Generation
    Generate random strings with random() (default pattern) or random(pattern) for custom regex-based patterns.
  • Transform Utilities
    Use listObjToMap(list1, list2, keyField, valueField) to merge two object lists into a map.
  • JSON Parsing
    Parse JSON strings into structured data with json.unmarshal(jsonString).
  • YAML Parsing
    Parse YAML strings into structured data with yaml.parse(yamlString).
  • Time-based Logic
    New time.now(), time.truncate(timestamp, duration), and time.toCron(timestamp) functions allow for time-since or “maintenance window” style policies.

The deprecation of legacy APIs

As Kyverno matures and aligns more closely with upstream Kubernetes standards, we are making the strategic shift to a CEL-first architecture. This means that the legacy Policy and ClusterPolicy types (which served the community for years using JMESPath) are now entering their sunset phase.

The deprecation schedule

Kyverno 1.17 officially marks ClusterPolicy and CleanupPolicy as Deprecated. While they remain functional in this release, the clock has started on their removal to make way for the more performant, standardized CEL-based engines.

ReleaseDate (estimated)Statusv1.17Jan 2026Marked for deprecationv1.18Apr 2026Critical fixes onlyv1.19Jul 2026Critical fixes onlyv1.20Oct 2026Planned for removal

Why the change?

By standardizing on the Common Expression Language (CEL), Kyverno significantly improves its performance and aligns with the native validation logic used by the Kubernetes API server itself.

For platform teams, this means one less language to learn and a more predictable and scalable policy-as-code experience.

Note for authors

From this point forward, we strongly recommend that every new policy you write be based on the new CEL APIs. Choosing the legacy APIs for new work today simply adds to your migration workload later this year.

Migration tips

We understand that many of you have hundreds of existing policies. To ensure a smooth transition, we have provided comprehensive resources:

  • The Migration Guide
    Our new Migration to CEL Guide provides a side-by-side mapping of legacy ClusterPolicy fields to their new equivalents (e.g., mapping validate.pattern to ValidatingPolicy expressions).
  • New Policy Types
    You can now begin moving your rules into specialized types like ValidatingPolicy, MutatingPolicy, and GeneratingPolicy. You can see the full breakdown of these new v1 APIs in the Policy Types Overview.

Enhanced supply chain security

Supply chain security remains a core pillar of Kyverno.

  • Cosign v3 Support
    1.17 adds support for the latest Cosign features, ensuring your image verification remains compatible with the evolving Sigstore ecosystem.
  • Expanded Attestation Parsing
    New capabilities to deserialize YAML and JSON strings within CEL policies make it easier to verify complex metadata and SBOMs.

Observability and reporting upgrades

We have refined how Kyverno communicates policy results:

  • Granular Reporting Control
    A new –allowedResults flag allows you to filter which results (e.g., only “Fail”) are stored in reports, significantly reducing ETCD pressure in large clusters.
  • Enhanced Metrics
    More detailed latency and execution metrics for CEL policies are now included by default to help you monitor the “hidden” cost of policy enforcement.

For developers and integrators

To support the broader ecosystem and make it easier to build integrations, we have decoupled our core components:

  • New API Repository
    Our CEL-based APIs now live in a dedicated repository: kyverno/api. This makes it significantly lighter to import Kyverno types into your own Go projects.
  • Kyverno SDK
    For developers building custom controllers or tools that interact with Kyverno, the SDK project is now housed at kyverno/sdk.

Getting started and backward compatibility

Upgrading from 1.16 is straightforward. However, since the CEL policy types have moved to v1, we recommend updating your manifests to the new API version. Kyverno will continue to support v1beta1 for a transition period.

helm repo update
helm upgrade --install kyverno kyverno/kyverno -n kyverno --version 3.7.0

Looking ahead: The Kyverno roadmap

As we move past the 1.17 milestone, our focus shifts toward long-term sustainability and the “Kyverno Platform” experience. Our goal is to ensure that Kyverno remains the most user-friendly and performant governance tool in the cloud-native ecosystem.

  • Growing the community
    We are doubling down on our commitment to the community. Expect more frequent office hours, improved contributor onboarding, and a renewed focus on making the Kyverno community the most welcoming space in CNCF.
  • A unified tooling experience
    Over the years, we’ve built several powerful sub-projects (like the CLI, Policy Reporter, and Kyverno-Authz). A major goal on our roadmap is to unify these tools into a cohesive experience, reducing fragmentation and making it easier to manage the entire policy lifecycle from a single vantage point.
  • Performance and scalability guardrails
    As clusters grow, performance becomes paramount. We are shifting our focus toward rigorous automated performance testing and will be providing more granular metrics regarding throughput and latency. We want to give platform engineers the data they need to understand exactly what Kyverno can handle in high-scale production environments.
  • Continuous UX improvement
    The website redesign was just the first step. We will continue to iterate on our user interfaces, documentation, and error messaging to ensure that Kyverno remains “Simplified” by design, not just in name.

Conclusion

Kyverno 1.17 is the most robust version yet, blending the flexibility of our original engine with the performance and standardization of CEL.

But this release is about more than just code—it’s about the total user experience. Whether you’re browsing the new policy catalog or scaling thousands of CEL-based rules, we hope this release makes your Kubernetes journey smoother.

A massive thank you to our contributors for making this release (and the new website!) a reality.

Categories: CNCF Projects

Dragonfly v2.4.0 is released

Thu, 02/05/2026 - 19:00

Dragonfly v2.4.0 is released! Thanks to all of the contributors who made this Dragonfly release happen.

New features and enhancements

load-aware scheduling algorithm

A two-stage scheduling algorithm combining central scheduling with node-level secondary scheduling to optimize P2P download performance, based on real-time load awareness.

 Parent A (40%), Parent B (35%), Parent N (n%). As an image, it shows a two-stage scheduling algorithm combining central scheduling with node-level secondary scheduling to optimize P2P download performance, based on real-time load awareness.

For more information, please refer to the Scheduling.

Vortex protocol support for P2P file transfer

Dragonfly provides the new Vortex transfer protocol based on TLV to improve the download performance in the internal network. Use the TLV (Tag-Length-Value) format as a lightweight protocol to replace gRPC for data transfer between peers. TCP-based Vortex reduces large file download time by 50% and QUIC-based Vortex by 40% compared to gRPC, both effectively reducing peak memory usage.

For more information, please refer to the TCP Protocol Support for P2P File Transfer and QUIC Protocol Support for P2P File Transfer.

Request SDK

A SDK for routing User requests to Seed Peers using consistent hashing, replacing the previous Kubernetes Service load balancing approach.

Flow chart image of the Request SDK, showing the flow between the user, via the request, to the request SDK. From there it filters through chunk 1, chunk 2 and chunk 3 to seed peer 2. From there, it navigated through layer 1 to the OCI registry,

Simple multi‑cluster Kubernetes deployment with scheduler cluster ID

Dragonfly supports a simplified feature for deploying and managing multiple Kubernetes clusters by explicitly assigning a schedulerClusterID to each cluster. This approach allows users to directly control cluster affinity without relying on location‑based scheduling metadata such as IDC, hostname, or IP.

Using this feature, each Peer, Seed Peer, and Scheduler determines its target scheduler cluster through a clearly defined scheduler cluster ID. This ensures precise separation between clusters and predictable cross‑cluster behavior.

A screenshot of the host scheduler cluster ID process. Showing 5 lines of code.

For more information, please refer to the Create Dragonfly Cluster Simple.

Performance and resource optimization for Manager and Scheduler components

Enhanced service performance and resource utilization across Manager and Scheduler components while significantly reducing CPU and memory overhead, delivering improved system efficiency and better resource management.

Enhanced preheating

  • Support for IP-based peer selection in preheating jobs with priority-based selection logic where IP specification takes highest priority, followed by count-based and percentage-based selection.
  • Support for preheating multiple URLs in a single request.
  • Support for preheating file and image via Scheduler gRPC interface.

A screenshot of the Dragonfly operating system. It show the form for 'Create Preheat' including fields for information, clusters, url, and Args.

Calculate task ID based on image blob SHA256 to avoid redundant downloads

The Client now supports calculating task IDs directly from the SHA256 hash of image blobs, instead of using the download URL. This enhancement prevents redundant downloads and data duplication when the same blob is accessed from different registry domains.

Cache HTTP 307 redirects for split downloads

Support for caching HTTP 307 (Temporary Redirect) responses to optimize Dragonfly’s multi-piece download performance. When a download URL is split into multiple pieces, the redirect target is now cached, eliminating redundant redirect requests and reducing latency.

Go Client deprecated and replaced by Rust client

The Go client has been deprecated and replaced by the Rust Client. All future development and maintenance will focus exclusively on the Rust client, which offers improved performance, stability, and reliability.

For more information, please refer to the dragoflyoss/client.

Additional enhancements

  • Enable 64K page size support for ARM64 in the Dragonfly Rust client.
  • Fix missing git commit metadata in dfget version output.
  • Support for config_path of io.containerd.cri.v1.images plugin for containerd V3 configuration.
  • Replaces glibc DNS resolver with hickory-dns in reqwest to implement DNS caching and prevent excessive DNS lookups during piece downloads.
  • Support for the –include-files flag to selectively download files from a directory.
  • Add the –no-progress flag to disable the download progress bar output.
  • Support for custom request headers in backend operations, enabling flexible header configuration for HTTP requests.
  • Refactored log output to reduce redundant logging and improve overall logging efficiency.

Significant bug fixes

  • Modified the database field type from text to longtext to support storing the information of preheating job.
  • Fixed panic on repeated seed peer service stops during Scheduler shutdown.
  • Fixed broker authentication failure when specifying the Redis password without setting a username.

Nydus

New features and enhancements

  • Nydusd: Add CRC32 validation support for both RAFS V5 and V6 formats, enhancing data integrity verification.
  • Nydusd: Support resending FUSE requests during nydusd restoration, improving daemon recovery reliability.
  • Nydusd: Enhance VFS state saving mechanism for daemon hot upgrade and failover.
  • Nydusify: Introduce Nydus-to-OCI reverse conversion capability, enabling seamless migration back to OCI format.
  • Nydusify: Implement zero-disk transfer for image copy, significantly reducing local disk usage during copy operations.
  • Snapshotter: Builtin blob.meta in bootstrap for blob fetch reliability for RAFS v6 image.

Significant bug fixes

  • Nydusd: Fix auth token fetching for access_token field in registry authentication.
  • Nydusd: Add recursive inode/dentry invalidation for umount API.
  • Nydus Image: Fix multiple issues in optimize subcommand and add backend configuration support.
  • Snapshotter: Implement lazy parent recovery for proxy mode to handle missing parent snapshots.

We encourage you to visit the d7y.io website to find out more.

Others

You can see CHANGELOG for more details.

Links

Dragonfly Github

The QR code to access Dragonfly's GitHub project.
Categories: CNCF Projects

OpenCost: Reflecting on 2025 and looking ahead to 2026

Mon, 01/12/2026 - 06:28

The OpenCost project has had a fruitful year in terms of releases, our wonderful mentees and contributors, and fun gatherings at KubeCons.

 One of a group of technologists at the OpenCost desk in the Project Pavillion at KubeCon; the second, an image of a crowded auditorium; the third featuring three people talking, one of whom is wearing a green OpenCost sweater.

If you’re new to OpenCost, it is an open-source cost and resources management tool that is an Incubating project in the Cloud Native Computing Foundation (CNCF). It was created by IBM Kubecost and continues to be maintained and supported by IBM Kubecost, Randoli, and a wider community of partners, including the major cloud providers.

OpenCost releases

The OpenCost project had 11 releases in 2025. These include new features and capabilities that improve the experience for both users and contributors. Here are a few highlights:

  • Promless: OpenCost can be configured to run without Prometheus, using environment variables which can be set using helm. Users will be able to run OpenCost using the Collector Datasource (beta) which can be run without Prometheus.
  • OpenCost MCP server: AI agents can now query cost data in real-time using natural language. They can analyze spending patterns across namespaces, pods, and nodes, generate cost reports and recommendations automatically, and provide other insights from OpenCost data.
  • Export system: The project now has a generic export framework to make it possible to export cost data in a type-safe way.
  • Diagnostics system: OpenCost has a complete diagnostic framework with an interface, runners, and export capabilities.
  • Heartbeat system: You can do system health tracking with timestamped heartbeat events for export and more.
  • Cloud providers: There are continued improvements for users to track cloud and multi-cloud metrics. We appreciate contributions from Oracle (including providing hosting for our demo) and DigitalOcean (for recent cloud services provider work).

Thanks to our maintainers and contributors who make these releases possible and successful, including our mentees and community contributors as well.

Mentorship and community management

Our project has been committed to mentorship through the Linux Foundation for a while, and we continue to have fantastic mentees who bring innovation and support to the community. Manas Sivakumar was a summer 2025 mentee and worked on writing Integration tests for OpenCost’s enterprise readiness. Manas’ work is now part of the OpenCost integration testing pipeline for all future contributions.

  • Adesh Pal, a mentee, made a big splash with the OpenCost MCP server. The MCP server now comes by default and needs no configuration. It outputs readable markdown on metrics as well as step-by-step suggestions to make improvements.
  • Sparsh Raj has been in our community for a while and has become our most recent mentee. Sparsh has written a blog post on KubeModel, the foundation of OpenCost’s Data Model 2.0. Sparsh’s work will meet the needs for a robust and scalable data model that can handle Kubernetes complexity and constantly shifting resources.
  • On the community side, Tamao Nakahara was brought into the IBM Kubecost team for a few months of open source and developer experience expertise. Tamao helped organize the regular OpenCost community meetings, leading actions around events, the website, and docs. On the website, Tamao improved the UX for new and returning users, and brought in Ginger Walker to help clean up the docs.

Events and talks

As a CNCF incubating project, OpenCost participated in the key KubeCon events. Most recently, the team was at KubeCon + CloudNativeCon Atlanta 2025, where maintainer Matt Bolt from IBM Kubecost kicked off the week with a Project Lightning talk. During a co-located event that day, Rajith Attapattu, CTO of contributing company Randoli, also gave a talk on OpenCost. Dee Zeis, Rajith, and Tamao also answered questions at the OpenCost kiosk in the Project Pavilion.

Earlier in the year, the team was also at both KubeCon + CloudNativeCon in London and Japan, giving talks and running the OpenCost kiosks.

2026!

What’s in store for OpenCost in the coming year? Aside from meeting all of you at future KubeCon + CloudNativeCon’s, we’re also excited about a few roadmap highlights. As mentioned, our LFX mentee Sparsh is working on KubeModel, which will be important for improvements to OpenCost’s data model. As AI continues to increase in adoption, the team is also working on building out costing features to track AI usage. Finally, supply chain security improvements are a priority.

We’re looking forward to seeing more of you in the community in the next year!

Categories: CNCF Projects

HolmesGPT: Agentic troubleshooting built for the cloud native era

Wed, 01/07/2026 - 07:00

If you’ve ever debugged a production incident, you know that the hardest part often isn’t the fix, it’s finding where to begin. Most on-call engineers end up spending hours piecing together clues, fighting time pressure, and trying to make sense of scattered data. You’ve probably run into one or more of these challenges: 

  • Unwritten knowledge and missing context:
    You’re pulled into an outage for a service you barely know. The original owners have changed teams, the documentation is half-written, and the “runbook” is either stale or missing altogether. You spend the first 30 minutes trying to find someone who’s seen this issue before — and if you’re unlucky, this incident is a new one. 
  • Tool overload and context switching:
    Your screen looks like an air traffic control dashboard. You’re running monitoring queries, flipping between Grafana and Application Insights, checking container logs, and scrolling through traces — all while someone’s asking for an ETA in the incident channel. Correlating data across tools is manual, slow, and mentally exhausting. 
  • Overwhelming complexity and knowledge gaps:
    Modern cloud-native systems like Kubernetes are powerful, but they’ve made troubleshooting far more complex. Every layer — nodes, pods, controllers, APIs, networking, autoscalers – introduces its own failure modes. To diagnose effectively, you need deep expertise across multiple domains, something even seasoned engineers can’t always keep up with. 

The challenges require a solution that can look across signals, recall patterns from past incidents, and guide you toward the most likely cause. 

This is where HolmesGPT, a CNCF Sandbox project, could help. 

 
HolmesGPT was accepted as a CNCF Sandbox project in October 2025. It’s built to simplify the chaos of production debugging – bringing together logs, metrics, and traces from different sources, reasoning over them, and surfacing clear, data-backed insights in plain language. 

What is HolmesGPT?

HolmesGPT is an open-source AI troubleshooting agent built for Kubernetes and cloud-native environments. It combines observability telemetry, LLM reasoning, and structured runbooks to accelerate root cause analysis and suggest next actions. 

Unlike static dashboards or chatbots, HolmesGPT is agentic: it actively decides what data to fetch, runs targeted queries, and iteratively refines its hypotheses – all while staying within your environment. 

Key benefits:

  • AI-native control loop: HolmesGPT uses an agentic task list approach  
  • Open architecture: Every integration and toolset is open and extensible, works with existing runbooks and MCP servers 
  • Data privacy: Models can run locally or inside your cluster or on the cloud  
  • Community-driven: Designed around CNCF principles of openness, interoperability, and transparency. 

How it works 

When you run:

holmes ask “Why is my pod in crash loop back off state” 

HolmesGPT: 

  1. Understands intent → it recognizes you want to diagnose a pod restart issue 
  2. Creates a task list → breaks down the problem into smaller chunks and executes each of them separately  
  3. Queries data sources → runs Prometheus queries, collects Kubernetes events or logs, inspects pod specs including which pod 
  4. Correlates context → detects that a recent deployment updated the image   
  5. Explains and suggests fixes → returns a natural language diagnosis and remediation steps. 

Here’s a simplified overview of the architecture:

HolmesGPT architecture

Extensible by design 

HolmesGPT’s architecture allows contributors to add new components: 

  • Toolsets: Build custom commands for internal observability pipelines or expose existing tools through a Model Context Protocol (MCP) server.
  • Evals: Add custom evals to benchmark performance, cost , latency of models 
  • Runbooks: Codify best practices (e.g., “diagnose DNS failures” or “debug PVC provisioning”). 

Example of a simple custom tool: 

holmes:
  toolsets:
    kubernetes/pod_status:
      description: "Check the status of a Kubernetes pod."
      tools:
        - name: "get_pod"
          description: "Fetch pod details from a namespace."
          command: "kubectl get pod {{ pod }} -n {{ namespace }}"

Getting started

  1. Install Holmesgpt 

There are 4-5 ways to install Holmesgpt, one of the easiest ways to get started is through pip

brew tap robusta-dev/homebrew-holmesgpt
brew install holmesgpt

The detailed installation guide has instructions for helm, CLI and the UI. 

  1. Setup the LLM (Any Open AI compatible LLM) by setting the API Key  

In most cases, this means setting the appropriate environment variable based on the LLM provider.

  1. Run it locally 
holmes ask "what is wrong with the user-profile-import pod?" --model="anthropic/claude-sonnet-4-5" 

        

  1. Explore other features  

How to get involved 

HolmesGPT is entirely community-driven and welcomes all forms of contribution: 

Area How you can help Integrations Add new toolsets for your observability tools or CI/CD pipelines. Runbooks Encode operational expertise for others to reuse. Evaluation Help build benchmarks for AI reasoning accuracy and observability insights. Docs and tutorials Improve onboarding, create demos, or contribute walkthroughs. Community Join discussions around governance and CNCF Sandbox progression. 

All contributions follow the CNCF Code of Conduct

Further Resources 

Categories: CNCF Projects

Cilium releases 2025 annual report: A decade of cloud native networking 

Thu, 12/18/2025 - 11:00

Cillium 2025 Report

A decade on from its first commit in 2015, 2025 marks a significant milestone for the Cilium project. The community has published the 2025 Cilium Annual Report: A Decade of Cloud Native Networking, which reflects on the project’s evolution, key milestones, and notable developments over the past year.

What began as an experimental container networking effort has grown into a mature, widely adopted platform, bringing together cloud native networking, observability, and security through an eBPF-based architecture.As Cilium enters its second decade, the community continues to grow in both size and momentum, with sustained high-volume development, widespread production adoption, and expanding use cases including virtual machines and large-scale AI infrastructure.

We invite you to explore the 2025 Annual Report and celebrate a decade of cloud native networking with the community.

For any questions or feedback, please reach out to [email protected].

Categories: CNCF Projects