CNCF Projects
Helm Turns 10
Ten years ago, in a hackathon shortly after the release of Kubernetes 1.1.0, Helm was born.
commit ecad6e2ef9523a0218864ec552bbfc724f0b9d3d
Author: Matt Butcher <[email protected]>
Date: Mon Oct 19 17:43:26 2015 -0600
initial add
The first commit can be found on the helm-classic Git repository where the codebase for Helm v1 is located. This is the original Helm, before it merged with Deployment Manager and was folded into the Kubernetes project.
This commit was just the beginning. Helm would be shown off at the first KubeCon, just a few weeks later. From there Helm development would take off and a community of developers and charts would follow.
Happy 10th Birthday, Helm!

Helm Turns 10
Ten years ago, in a hackathon shortly after the release of Kubernetes 1.1.0, Helm was born.
commit ecad6e2ef9523a0218864ec552bbfc724f0b9d3d
Author: Matt Butcher <[email protected]>
Date: Mon Oct 19 17:43:26 2015 -0600
initial add
The first commit can be found on the helm-classic Git repository where the codebase for Helm v1 is located. This is the original Helm, before it merged with Deployment Manager and was folded into the Kubernetes project.
Spotlight on Policy Working Group
(Note: The Policy Working Group has completed its mission and is no longer active. This article reflects its work, accomplishments, and insights into how a working group operates.)
In the complex world of Kubernetes, policies play a crucial role in managing and securing clusters. But have you ever wondered how these policies are developed, implemented, and standardized across the Kubernetes ecosystem? To answer that, let's take a look back at the work of the Policy Working Group.
The Policy Working Group was dedicated to a critical mission: providing an overall architecture that encompasses both current policy-related implementations and future policy proposals in Kubernetes. Their goal was both ambitious and essential: to develop a universal policy architecture that benefits developers and end-users alike.
Through collaborative methods, this working group strove to bring clarity and consistency to the often complex world of Kubernetes policies. By focusing on both existing implementations and future proposals, they ensured that the policy landscape in Kubernetes remains coherent and accessible as the technology evolves.
This blog post dives deeper into the work of the Policy Working Group, guided by insights from its former co-chairs:
Interviewed by Arujjwal Negi.
These co-chairs explained what the Policy Working Group was all about.
Introduction
Hello, thank you for the time! Let’s start with some introductions, could you tell us a bit about yourself, your role, and how you got involved in Kubernetes?
Jim Bugwadia: My name is Jim Bugwadia, and I am a co-founder and the CEO at Nirmata which provides solutions that automate security and compliance for cloud-native workloads. At Nirmata, we have been working with Kubernetes since it started in 2014. We initially built a Kubernetes policy engine in our commercial platform and later donated it to CNCF as the Kyverno project. I joined the CNCF Kubernetes Policy Working Group to help build and standardize various aspects of policy management for Kubernetes and later became a co-chair.
Andy Suderman: My name is Andy Suderman and I am the CTO of Fairwinds, a managed Kubernetes-as-a-Service provider. I began working with Kubernetes in 2016 building a web conferencing platform. I am an author and/or maintainer of several Kubernetes-related open-source projects such as Goldilocks, Pluto, and Polaris. Polaris is a JSON-schema-based policy engine, which started Fairwinds' journey into the policy space and my involvement in the Policy Working Group.
Poonam Lamba: My name is Poonam Lamba, and I currently work as a Product Manager for Google Kubernetes Engine (GKE) at Google. My journey with Kubernetes began back in 2017 when I was building an SRE platform for a large enterprise, using a private cloud built on Kubernetes. Intrigued by its potential to revolutionize the way we deployed and managed applications at the time, I dove headfirst into learning everything I could about it. Since then, I've had the opportunity to build the policy and compliance products for GKE. I lead and contribute to GKE CIS benchmarks. I am involved with the Gatekeeper project as well as I have contributed to Policy-WG for over 2 years and served as a co-chair for the group.
Responses to the following questions represent an amalgamation of insights from the former co-chairs.
About Working Groups
One thing even I am not aware of is the difference between a working group and a SIG. Can you help us understand what a working group is and how it is different from a SIG?
Unlike SIGs, working groups are temporary and focused on tackling specific, cross-cutting issues or projects that may involve multiple SIGs. Their lifespan is defined, and they disband once they've achieved their objective. Generally, working groups don't own code or have long-term responsibility for managing a particular area of the Kubernetes project.
(To know more about SIGs, visit the list of Special Interest Groups)
You mentioned that Working Groups involve multiple SIGS. What SIGS was the Policy WG closely involved with, and how did you coordinate with them?
The group collaborated closely with Kubernetes SIG Auth throughout our existence, and more recently, the group also worked with SIG Security since its formation. Our collaboration occurred in a few ways. We provided periodic updates during the SIG meetings to keep them informed of our progress and activities. Additionally, we utilize other community forums to maintain open lines of communication and ensured our work aligned with the broader Kubernetes ecosystem. This collaborative approach helped the group stay coordinated with related efforts across the Kubernetes community.
Policy WG
Why was the Policy Working Group created?
To enable a broad set of use cases, we recognize that Kubernetes is powered by a highly declarative, fine-grained, and extensible configuration management system. We've observed that a Kubernetes configuration manifest may have different portions that are important to various stakeholders. For example, some parts may be crucial for developers, while others might be of particular interest to security teams or address operational concerns. Given this complexity, we believe that policies governing the usage of these intricate configurations are essential for success with Kubernetes.
Our Policy Working Group was created specifically to research the standardization of policy definitions and related artifacts. We saw a need to bring consistency and clarity to how policies are defined and implemented across the Kubernetes ecosystem, given the diverse requirements and stakeholders involved in Kubernetes deployments.
Can you give me an idea of the work you did in the group?
We worked on several Kubernetes policy-related projects. Our initiatives included:
- We worked on a Kubernetes Enhancement Proposal (KEP) for the Kubernetes Policy Reports API. This aims to standardize how policy reports are generated and consumed within the Kubernetes ecosystem.
- We conducted a CNCF survey to better understand policy usage in the Kubernetes space. This helped gauge the practices and needs across the community at the time.
- We wrote a paper that will guide users in achieving PCI-DSS compliance for containers. This is intended to help organizations meet important security standards in their Kubernetes environments.
- We also worked on a paper highlighting how shifting security down can benefit organizations. This focuses on the advantages of implementing security measures earlier in the development and deployment process.
Can you tell us what were the main objectives of the Policy Working Group and some of your key accomplishments?
The charter of the Policy WG was to help standardize policy management for Kubernetes and educate the community on best practices.
To accomplish this we updated the Kubernetes documentation (Policies | Kubernetes), produced several whitepapers (Kubernetes Policy Management, Kubernetes GRC), and created the Policy Reports API (API reference) which standardizes reporting across various tools. Several popular tools such as Falco, Trivy, Kyverno, kube-bench, and others support the Policy Report API. A major milestone for the Policy WG was promoting the Policy Reports API to a SIG-level API or finding it a stable home.
Beyond that, as ValidatingAdmissionPolicy and MutatingAdmissionPolicy approached GA in Kubernetes, a key goal of the WG was to guide and educate the community on the tradeoffs and appropriate usage patterns for these built-in API objects and other CNCF policy management solutions like OPA/Gatekeeper and Kyverno.
Challenges
What were some of the major challenges that the Policy Working Group worked on?
During our work in the Policy Working Group, we encountered several challenges:
-
One of the main issues we faced was finding time to consistently contribute. Given that many of us have other professional commitments, it can be difficult to dedicate regular time to the working group's initiatives.
-
Another challenge we experienced was related to our consensus-driven model. While this approach ensures that all voices are heard, it can sometimes lead to slower decision-making processes. We valued thorough discussion and agreement, but this can occasionally delay progress on our projects.
-
We've also encountered occasional differences of opinion among group members. These situations require careful navigation to ensure that we maintain a collaborative and productive environment while addressing diverse viewpoints.
-
Lastly, we've noticed that newcomers to the group may find it difficult to contribute effectively without consistent attendance at our meetings. The complex nature of our work often requires ongoing context, which can be challenging for those who aren't able to participate regularly.
Can you tell me more about those challenges? How did you discover each one? What has the impact been? What were some strategies you used to address them?
There are no easy answers, but having more contributors and maintainers greatly helps! Overall the CNCF community is great to work with and is very welcoming to beginners. So, if folks out there are hesitating to get involved, I highly encourage them to attend a WG or SIG meeting and just listen in.
It often takes a few meetings to fully understand the discussions, so don't feel discouraged if you don't grasp everything right away. We made a point to emphasize this and encouraged new members to review documentation as a starting point for getting involved.
Additionally, differences of opinion were valued and encouraged within the Policy-WG. We adhered to the CNCF core values and resolve disagreements by maintaining respect for one another. We also strove to timebox our decisions and assign clear responsibilities to keep things moving forward.
This is where our discussion about the Policy Working Group ends. The working group, and especially the people who took part in this article, hope this gave you some insights into the group's aims and workings. You can get more info about Working Groups here.
Kyverno vs Kubernetes policies: How Kyverno complements and completes Kubernetes policy types
Originally posted on Nirmata.com on October 1, 2025
How Kyverno extends and integrates with Kubernetes policies
With the addition of ValidatingAdmissionPolicy and MutatingAdmissionPolicy in Kubernetes, do you still need Kyverno? This post answers the question by providing ten reasons why Kyverno is essential even when you are using Kubernetes policy types.
Introduction
Prior to Kyverno, policy management in Kubernetes was complex and cumbersome. While the need for Policy as Code was clear, initial implementations required learning complex languages and did not implement the full policy as code lifecycle.
Kyverno was created by Nirmata, and donated to the CNCF in November 2020. It rapidly gained popularity due to its embrace of Kubernetes resources for policy declarations, it’s easy-to-use syntax, and breadth of features that addressed all aspects of policy as code.
Recently, Kubernetes has also introduced native policy types which can be executed directly in the Kubernetes API server. This move validates that policies are a must have for Kubernetes, and now allows critical policies to be executed directly in the API server.
The Kubernetes API server is a critical resource that needs to be extremely efficient. To safely execute policies in the API server, the Kubernetes authors chose CEL (Common Expressions Language) to embed logic in policy YAML declarations. In addition to a familiar syntax, CEL programs can be pre-compiled and execution costs can be pre-calculated.
With these changes in Kubernetes, the Kyverno has also evolved to stay true to its mission of providing the best policy engine and tools for Kubernetes native policy as code.
Kyverno now supports five new policy types, two of which, ValidatingPolicy and MutatingPolicy, are extensions of Kubernetes policy types ValidatingAdmissionPolicy and MutatingAdmissionPolicy, respectively.
NOTE: I will use the term “Kubernetes Policies” to refer to ValidatingAdmissionPolicies and MutatingAdmissionPolicies.
Here is a summary of the Kyverno policy types:
- ValidatingPolicy: This policy type checks if a resource’s configuration adheres to predefined rules and can either enforce or audit compliance. This policy type is an extension of the Kubernetes ValidatingAdmissionPolicy.
- ImageValidatingPolicy: A specialized validating policy that verifies a container image’s signatures and attestations to ensure its integrity and trustworthiness.
- MutatingPolicy: This policy type modifies a resource’s configuration as it’s being created or updated, applying changes like adding labels, annotations, or sidecar containers. This policy is an extension of the Kubernetes MutatingAdmissionPolicy.
- GeneratingPolicy: This policy creates or clones new resources in response to a trigger event, such as automatically generating a NetworkPolicy when a new Namespace is created.
- DeletingPolicy: This policy automatically deletes existing resources that match specific criteria on a predefined schedule, often used for garbage collection or enforcing retention policies
So, when should you choose to use Kyverno policies vs the Kubernetes policy types? The right answer is that if you believe that declarative Policy as Code is the right way to manage Kubernetes configuration complexity, you will need both!
As you will see below, Kyverno provides critical features that are missing in Kubernetes policies and also helps with policy management at scale.
1. Applying policies on existing resources
When new policies are created, they need to be applied on existing resources. Kubernetes Policies only apply on resource changes, and hence policy violations on existing resources are not reported.
Kyverno applies policies, including Kubernetes Policies types, on all resources.
2. Reapplying policies on changes
Like code, policies change over time. This can be to adapt to updated or new features, or to fix issues in the policy. When a policy changes, it must be re-applied to all resources. Kubernetes Policies are embedded in the API server and not reapplied when the policy changes.
3. Applying policies pff cluster (shift-left)
Providing feedback to developers as early as possible in a deployment pipeline is highly desirable and has tangible benefits of time and cost savings. The Kyverno CLI can apply Kyverno and Kubernetes Policy types in CI/CD and IaC pipelines.
4. Testing policy as code
Like all software, policies must be thoroughly tested prior to deployment. Kyverno provides tools for testing Kyverno and Kubernetes policy types. You can use the Kyverno CLI for unit tests, and Kyverno Chainsaw for e2e behavioral tests.
5. Reporting policy results
Kyverno provides integrated reporting, where reports are namespaced Kubernetes resources and hence via the Kubernetes API and other tools to application owners. Kyverno reports are generated for both Kyverno and Kubernetes policy types.
6. Managing fine-grained policy exceptions
Kyverno allows configuring policy exceptions to exclude some resources from policies. Kyverno exceptions are Kubernetes resources making it possible to view and manage via the Kubernetes API using standard tools.
Kyverno exceptions can specify an image, so you can exclude certain containers in a pod while still applying the policy to other containers. Exceptions can also declare specific values that are allowed. Exceptions can also be time-bound, but adding a TTL (time to live).
These powerful capabilities allow enforcing policies and then use exceptions to exclude certain resources, or even parts of a resource declaration.
7. Complex policy logic
Kubernetes policies are designed for simple checks, and can only apply to the admission payload. This is often insufficient, as policies may need to look up other resources or even reference external data. These types of checks are not possible with Kubernetes policies. Additionally, Kubernetes MutatingAdmissionPolicies cannot match sub-resources and apply to a resource.
Kyverno supports features for complex policies, including API lookups and external data management. Kyverno also offers an extended CEL library with useful functions necessary for complex policies.
8. Image verification
Kyverno offers built-in verification of OCI (Open Container Initiative) image and artifact signatures, using Sigstore’s Cosign or CNCF’s Notary projects. This allows implementing software supply chain security use cases and achieving high levels of SLSA (Supply-chain Levels for Software Artifacts.)
9. Policy-based automation
Besides validating and mutating resources, policies are an essential tool for automating several complex platform engineering tasks. For example, policies can be used to automatically generate secure defaults, or resources like network policies, on flexible triggers such as a namespace creation or when a label is added. This allows a tight control loop, and can be used to replace custom controllers with declarative and scalable policy as code.
10. Kyverno everywhere
While Kubernetes Policy types can only be applied to Kubernetes resources, Kyverno policies can be applied to any JSON or YAML payload including Terraform or OpenTofu manifests, other IaC manifests such as CDK, and build artifacts such as Dockerfiles.
Kyverno enables a unified policy as code approach, which is essential for platform engineering teams that manage both Kubernetes clusters, and pipelines for CI/CD and IaC.
Conclusion
Kyverno is fully compatible with Kubernetes policies, and is designed to seamlessly support and extend Kubernetes policy types. It applies Kubernetes policies to existing resources and can also provide policy reporting and exception management for Kubernetes policies.
Like Kubernetes policies, Kyverno policies also use the Common Expressions Language (CEL) and extend the Kubernetes policy declarations with additional fields, and extended CEL libraries, required for complex policies and advanced policy as code use cases.
This allows having a mix of Kubernetes and Kyverno policies managed by Kyverno. You can get started with Kubernetes policies and then upgrade to Kyverno policies for advanced use cases.
If you have existing Kubernetes policies, you can use Kyverno to apply them to existing resources, produce reports, apply the policies off-cluster, and perform unit and behavioral tests.
If you are starting out, you can use Kyverno policy types. Wherever possible Kyverno will automatically generate and manage Kubernetes policies for optimal performance. For complex policies, which cannot be handled in the API server, Kyverno will execute these during admission controls and periodically as background scans.
Regardless of where you start, with Kyverno you get a powerful and complete policy as code solution for Kubernetes and all your policy-based authorization needs!
Karmada v1.15 Released! Enhanced Resource Awareness for Multi-Template Workloads
Karmada is an open multi-cloud and multi-cluster container orchestration engine designed to help users deploy and operate business applications in a multi-cloud environment. With its compatibility with the native Kubernetes API, Karmada can smoothly migrate single-cluster workloads while still maintaining coordination with the surrounding Kubernetes ecosystem tools.
Karmada v1.15 has been released, this version includes the following new features:
- Precise resource awareness for multi-template workloads
- Enhanced cluster-level failover functionality
- Structured logging
- Significant performance improvements for Karmada controllers and schedulers
Overview of New Features
Precise Resource Awareness for Multi-Template Workloads
Karmada utilizes a resource interpreter to retrieve the replica count and resource requests of workloads. Based on this data, it calculates the total resource requirements of the workloads, thereby enabling advanced capabilities such as resource-aware scheduling and federated quota management. This mechanism works well for traditional single-template workloads. However, many AI and big data application workloads (e.g., FlinkDeployments, PyTorchJobs, and RayJobs) consist of multiple Pod templates or components, each with unique resource demands. Since the resource interpreter can only process resource requests from a single template and fails to accurately reflect differences between multiple templates, the resource calculation for multi-template workloads is not precise enough.
In this version, Karmada has strengthened its resource awareness for multi-template workloads. By extending the resource interpreter, Karmada can now obtain the replica count and resource requests of different templates within the same workload, ensuring data accuracy. This improvement also provides more reliable and granular data support for federated quota management of multi-template workloads.
Suppose you deploy a FlinkDeployment with the following resource-related configuration:
spec: jobManager: replicas: 1 resource: cpu: 1 memory: 1024m taskManager: replicas: 1 resource: cpu: 2 memory: 2048mThrough ResourceBinding, you can view the replica count and resource requests of each template in the FlinkDeployment parsed by the resource interpreter.
spec: components: – name: jobmanager replicaRequirements: resourceRequest: cpu: “1” memory: “1.024” replicas: 1 – name: taskmanager replicaRequirements: resourceRequest: cpu: “2” memory: “2.048” replicas: 1At this point, the resource usage of the FlinkDeployment calculated by FederatedResourceQuota is as follows:
status: overallUsed: cpu: “3” memory: 3072mNote: This feature is currently in the Alpha stage and requires enabling the MultiplePodTemplatesScheduling feature gate to use.
As multi-template workloads are widely adopted in cloud-native environments, Karmada is committed to providing stronger support for them. In upcoming versions, we will further enhance scheduling support for multi-template workloads based on this feature and offer more granular resource-aware scheduling—stay tuned for more updates!
For more information about this feature, please refer to: Multi-Pod Template Support.
Enhanced Cluster-Level Failover Functionality
In previous versions, Karmada provided basic cluster-level failover capabilities, allowing cluster-level application migration to be triggered through custom failure conditions. To meet the requirement of preserving the running state of stateful applications during cluster failover, Karmada v1.15 supports an application state preservation policy for cluster failover. For big data processing applications (e.g., Flink), this capability enables restarting from the pre-failure checkpoint and seamlessly resuming data processing to the state before the restart, thus avoiding duplicate data processing.
The community has introduced a new StatePreservation field under .spec.failover.cluster in the PropagationPolicy/ClusterPropagationPolicy API. This field is used to define policies for preserving and restoring state data of stateful applications during failover. Combined with this policy, when an application is migrated from a failed cluster to another cluster, key data can be extracted from the original resource configuration.
The state preservation policy StatePreservation includes a series of StatePreservationRule configurations. It uses JSONPath to specify the segments of state data that need to be preserved and leverages the associated AliasLabelName to pass the data to the migrated cluster.
Taking a Flink application as an example: in a Flink application, jobID is a unique identifier used to distinguish and manage different Flink jobs. When a cluster fails, the Flink application can use jobID to restore the state of the job before the failure and continue execution from the failure point. The specific configuration and steps are as follows:
apiVersion: policy.karmada.io/v1alpha1kind: PropagationPolicy
metadata:
name: foo
spec:
#…
failover:
cluster:
purgeMode: Directly
statePreservation:
rules:
– aliasLabelName: application.karmada.io/cluster-failover-jobid
jsonPath: “{ .jobStatus.jobID }”
- Before migration, the Karmada controller extracts the job ID according to the path configured by the user.
- During migration, the Karmada controller injects the extracted job ID into the Flink application configuration in the form of a label, such as application.karmada.io/cluster-failover-jobid: <jobID>.
- Kyverno running in the member cluster intercepts the Flink application creation request, obtains the checkpoint data storage path of the job based on the jobID (e.g., /<shared-path>/<job-namespace>/<jobId>/checkpoints/xxx), and then configures initialSavepointPath to indicate starting from the savepoint.
- The Flink application starts based on the checkpoint data under initialSavepointPath, thereby inheriting the final state saved before migration.
This capability is widely applicable to stateful applications that can start from a specific savepoint. These applications can follow the above process to implement state persistence and restoration for cluster-level failover.
Note: This feature is currently in the Alpha stage and requires enabling the StatefulFailoverInjection feature gate to use.
Function Constraints:
- The application must be restricted to run in a single cluster.
- The migration cleanup policy (PurgeMode) is limited to Directly—this means ensuring that the failed application is deleted from the old cluster before being restored in the new cluster to guarantee data consistency.
Structured Logging
Logs are critical tools for recording events, states, and behaviors during system operation, and are widely used for troubleshooting, performance monitoring, and security auditing. Karmada components provide rich runtime logs to help users quickly locate issues and trace execution scenarios. In previous versions, Karmada only supported unstructured text logs, which were difficult to parse and query efficiently, limiting its integration capabilities in modern observability systems.
Karmada v1.15 introduces support for structured logging, which can be configured to output in JSON format using the –logging-format=json startup flag. An example of structured logging is as follows:
{ “ts”:“日志时间戳”, “logger”:”cluster_status_controller”, “level”: “info”, “msg”:”Syncing cluster status”, “clusterName”:”member1″}The introduction of structured logging significantly improves the usability and observability of logs:
- Efficient Integration: Integration with mainstream logging systems such as Elastic, Loki, and Splunk, without relying on complex regular expressions or log parsers.
- Efficient Query: Structured fields support fast retrieval and analysis, significantly improving troubleshooting efficiency.
- Enhanced Observability: Key context information (e.g., cluster name, log level) is presented as structured fields, facilitating cross-component and cross-time event correlation for accurate issue localization.
- Maintainability: Structured logging makes it easier for developers and operators to maintain, parse, and evolve log formats as the system changes.
Significant Performance Improvements for Karmada Controllers and Schedulers
In this version, the Karmada performance optimization team has continued to focus on improving the performance of Karmada’s key components, achieving significant progress in both controllers and schedulers.
In terms of the controller, by introducing controller-runtime priority queue, the controller can give priority to responding to user-triggered resource changes after a restart or leader transition, thereby significantly reducing the downtime during service restart and failover processes.
The test environment included 5,000 Deployments, 2,500 Policies, and 5,000 ResourceBindings. The Deployment and Policy were updated when the controller restarted with a large number of pending events still in the work queue. Test results showed that the controller could immediately respond to and prioritize processing these update events, verifying the effectiveness of this optimization.
Note: This feature is currently in the Alpha stage and requires enabling the ControllerPriorityQueue feature gate to use.
In terms of the scheduler, by reducing redundant computations in the scheduling process and decreasing the number of remote call requests, the scheduling efficiency of the Karmada scheduler has been significantly improved.
Tests were conducted to record the time taken to schedule 5,000 ResourceBindings with the precise scheduling component karmada-scheduler-estimator enabled. The results are as follows:
- The scheduler throughput QPS increased from approximately 15 to about 22, representing a 46% performance improvement.
- The number of gRPC requests decreased from approximately 10,000 to around 5,000, a reduction of 50%.
These tests confirm that the performance of Karmada controllers and schedulers has been greatly improved in version 1.15. In the future, we will continue to conduct systematic performance optimizations for controllers and schedulers.
For the detailed test report, please refer to [Performance] Overview of performance improvements for v1.15.
Acknowledging Our Contributors
The Karmada v1.15 release includes 269 code commits from 39 contributors. We would like to extend our sincere gratitude to all the contributors:
@abhi0324@abhinav-1305@Arhell@Bhaumik10@CaesarTY@cbaenziger@deefreak@dekaihu@devarsh10@greenmoon55@iawia002@jabellard@jennryaz@liaolecheng@linyao22@LivingCcj@liwang0513@mohamedawnallah@mohit-nagaraj@mszacillo@RainbowMango@ritzdevp@ryanwuer@samzong@seanlaii@SunsetB612@tessapham@wangbowen1401@warjiang@wenhuwang@whitewindmills@whosefriendA@XiShanYongYe-Chang@zach593@zclyne@zhangsquared@zhuyulicfc49@zhzhuang-zju@zzklachlan
References:
[1] Karmada: https://karmada.io/
[2] Karmada v1.15: https://github.com/karmada-io/karmada/releases/tag/v1.15.0
[3] Multi-Pod Template Support: https://github.com/karmada-io/karmada/tree/master/docs/proposals/scheduling/multi-podtemplate-support
[4] [Performance] Overview of performance improvements for v1.15: https://github.com/karmada-io/karmada/issues/6516
[5] Karmada GitHub:https://github.com/karmada-io/karmada
CoreDNS-1.13.1 Release
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.
Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.
The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.
The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.
Map view of Karpenter Resources and how they relate to Kubernetes resources
Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics
Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .


Scaling decisions
Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.


Config editor with validation support
Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources
View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.



Dashboard for Pending Pods
View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers
This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.
Provider Name Tested Extra provider specific info supported AWS ✅ ✅ Azure ✅ ✅ AlibabaCloud ❌ ❌ Bizfly Cloud ❌ ❌ Cluster API ❌ ❌ GCP ❌ ❌ Proxmox ❌ ❌ Oracle Cloud Infrastructure (OCI) ❌ ❌Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).
How to use
Please see the plugins/karpenter/README.md for instructions on how to use.
Feedback and Questions
Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.
CoreDNS-1.13.0 Release
Autonomous Testing of etcd's Robustness
This is a post from the CNCF blog which we are sharing with our community as well.
As a critical component of many production systems, including Kubernetes, the etcd project’s first priority is reliability. Ensuring consistency and data safety requires our project contributors to continuously improve testing methodologies. In this article, we will describe how we used advanced simulation testing to uncover subtle bugs, validate the robustness of our releases, and increase our confidence in etcd’s stability. We’ll share our key findings and how they have improved etcd.
Fluentd to Fluent Bit: A Migration Guide
Fluentd was created over 14 years ago and still continues to be one of the most widely deployed technologies for log collection in the enterprise. Fluentd’s distributed plugin architecture and highly permissive licensing made it ideal as part of the Cloud Native Computing Foundation (CNCF) as a now graduated project.
However, enterprises drowning in telemetry data are now requiring solutions that have higher performance, more native support for evolving schemas and formats, and increased flexibility in processing. Enter Fluent Bit.
When and why to migrate?
Fluent Bit, while initially growing as a sub-project within the Fluent ecosystem, expanded from Fluentd to support all telemetry types – logs, metrics, and traces. Fluent Bit now is the more popular of the two with over 15 billion deployments and used by Amazon, Google, Oracle and Microsoft to name a few.
Fluent Bit also is fully aligned with OpenTelemetry signals, format and protocol, which ensures that users will be able to continue handling telemetry data as it grows and evolves.
Among the most frequent questions we get as the maintainers of the projects are:
- How do we migrate?
- What should we watch out for?
- And what business value do we get for migrating?
This article aims to answer these questions with examples. We want to help make it an easy decision to migrate from Fluentd to Fluent Bit.
Why Migrate?
Here is a quick list of the reasons users switch from Fluentd to Fluent Bit:
- Higher performance for the same resources you are already using
- Full OpenTelemetry support for logs, metrics, and traces as well as Prometheus support for metrics
- Simpler configuration and routing ability to multiple locations
- Higher velocity for adding custom processing rules
- Integrated monitoring to better understand performance and dataflows
Fluentd vs. Fluent Bit: What are the Differences
Background
To understand all the differences between the projects, it is important to understand the background of each project and the era it was built for. With Fluentd, the main language is Ruby and initially designed to help users push data to big data platforms such as Hadoop.The project follows a distributed architecture, where plugins are installed after the main binary is installed and deployed.
Fluent Bit on the other hand, is written in C, with a focus on hyper performance in smaller systems (containers, embedded Linux). The project learned from Fluentd’s plugins and instead opts for fully embedded plugins that are part of the core binary.
Performance
The obvious difference and main value of switching from Fluentd to Fluent Bit is the performance. With Fluent Bit, the amount of logs you can process with the same resources could be anywhere from 10 to 40 times greater depending on the plugin you are using.
Fluent Bit was written from the ground up to be hyper performant, with a focus of shipping data as fast as possible for data analysis. Later on, performance was found to be efficient enough that more edge processing could be added without compromising on the mission to make the agent as fast as possible.
Routing
Other parts of Fluent Bit evolved from challenges encountered with Fluentd, such as buffering and routing. With Fluentd multirouting was an afterthought and users needed to “copy” the data streams to route data to multiple points.
This made configuration management a nightmare, in addition to essentially duplicating the resource requirements for routing that data.
In Fluent Bit the buffers are stored once, which allows multiple plugins to “subscribe” to a stream of data. This ensures that data is stored once and subscribed many times allowing for multirouting without the trade-offs of performance and configuration fatigue.
Telemetry signal focus
While Fluentd was initially a data shipper, it grew into a logging agent used within projects such as Kubernetes and companies like Splunk. Fluent Bit on the other hand started as an embedded metrics collector with log files coming in after. As Fluent Bit adoption started to outweigh Fluentd’s functionality, capabilities such as OpenTelemetry logs/metrics/traces, Prometheus Scrape and Remote Write Support, eBPF and profiling support were all added.
Today Fluent Bit is aligned with OpenTelemetry schema, formats and protocols and meant to be a lightweight implementation that is highly performant.
Custom processing
Fluentd and Fluent Bit have many of the same processor names, but when it comes to custom processing the options are quite different.
With Fluentd the option is `enable_ruby`, which allows custom Ruby scripts within a configuration to perform actions. This can work effectively for small tasks; however it has a large penalty as logic gets more complicated, adding more performance bottlenecks.
With Fluent Bit, custom processing is done in the language Lua, which gives tremendous flexibility. However, unlike Fluentd, Fluent Bit’s Lua processor is quite performant and can be used at scale (100+ TB/day).
Custom plugins
Both projects allow custom plugins to help you connect with your source or destination. With Fluentd, these custom plugins are “Ruby Gems” that you can download and install into existing or new installations or deployments. With Fluent Bit, custom plugins are written and compiled in Go. There are also new initiatives for writing custom plugins in any language you want and compiling them into WebAssembly.
One lesson we learned from Fluentd’s distributed plugin architecture was the number of plugins can increase exponentially. However, the quality and maintenance required generally left many of the plugins abandoned and unsupported. With Fluent Bit, plugins are all incorporated into the source code itself, which ensures compatibility with every release.
Custom plugins still remain independent of the main repository. However, we are looking at ways to allow these to also share the same benefit of native C plugins within the main GitHub repository.
Monitoring
Understanding how data is traversing your environment is generally a top request from users who deploy Fluentd or Fluent Bit. With Fluentd, enabling these settings could require complicated configuration via “monitor_agent” or using a third party prometheus exporter plugin. These monitoring plugins also add maintenance overhead for Fluentd, which can affect performance.
Fluent Bit has monitoring as part of its core functionality and is retrievable via a native plugin (`fluentbit_metrics`) or scrapeable on an HTTP port. Fluent Bit’s metrics also incorporate more information than Fluentd’s, which allows you to understand bytes, records, storage and connection information.
How to get started with a Fluentd to Fluent Bit migration
The next question we’re answering is: How do you get started?
The first important step is to understand how Fluentd is deployed, what processing happens in the environment and where data is flowing.
What you don’t need to worry about:
- Architecture support: Both applications support x86 and ARM.
- Platform support: Fluent Bit supports the same and more as Fluentd does today. Legacy systems may differ, however it is important to note those are not maintained in either OSS project.
- Regular expressions: If you built a large library of regular expressions using the Onigmo parser library, you can rest comfortably knowing that Fluent Bit supports it.
Deployment
Deployed as an Agent (Linux or Windows Package)
When Fluentd is deployed as an agent on Linux or Windows, its primary function is to collect local log files or Windows event logs and route them to a particular destination.Thankfully, Fluent Bit’s local collection capabilities are equal to Fluentd’s, including the ability to resume on failure, store last log lines collected and local buffering.
Deployed in Kubernetes as a DaemonSet
If Fluentd is running as a DaemonSet in your Kubernetes cluster, you should first check the image that is running. As Fluentd has distributed plugins, the DaemonSet image may have specific plugins included, which ensures you can go directly from reading Kubernetes logs to the end destination.
This example has OpenSearch and Kafka included as plugins, so you should validate that the image you are using has the same plugins as Fluent Bit. Fluent Bit also supports Kubernetes enrichment on all logs, giving data around namespace, pod, labels and more.
Deployed as an Aggregator / Collector
If your Fluentd is deployed collecting logs from syslog, network devices or HTTP requests, you can first verify that Fluent Bit has the same capability. For example, Fluent Bit has syslog, TCP, HTTP and UDP plugins that can cover a majority of these use cases.
In addition, Fluent Bit also can receive OpenTelemetry HTTP1/gRPC, Prometheus Remote Write, HTTP gzip and Splunk HTTP Event Collector (HEC) as additional inbound signals.
Adding a Telemetry Pipeline
When migrating from Fluentd to Fluent Bit, we would also recommend looking at adding a Telemetry Pipeline in the middle of the agents and the destinations. This allows you to move larger pieces of processing logic within Fluentd agents downstream.
Configuration
The configuration syntax between Fluentd and Fluent Bit is vastly different. While both have started to support YAML more recently, most legacy Fluentd configurations will still be written in the domain-specific configuration language that is XML-esque.
Some general notes:
- Look at validating a single plugin at a time, and then at expanding to a single route (such as system logs to OpenSearch).
- Buffering and thread settings are not as important within Fluent Bit.
- Security settings should be similar.
When in doubt, reaching out to the Fluent community is useful in helping with some of the more granular settings.
Custom Plugins
When migrating, it’s important to ensure that Fluent Bit supports all plugins (sources and destinations). You should also check that it supports particular settings around authentication, authorization or access. This will be a manual process that can take some time. However, this will also allow you a chance to revisit decisions on specific data formats or plugin settings that you made in the past.
Custom Processing Logic
If you have labels, filters or other processing logic within Fluentd, it is important to note the functionality you are trying to achieve. While it may seem like just swapping those filters over might be easiest, you should also look at ways to migrate those directly into Fluent Bit processors. If you have a significant amount of custom Ruby, you can use large language models (LLMs) to help convert it into suitable Lua.
Migrating Portions at a Time
You don’t need to migrate all your functionality at once. Because Fluent Bit is lightweight and performant, you can look at ways to have each agent handle different portions of the workload. Over time you can follow the logic above to continue migrating without having to worry about log collection disruptions.
Conclusion
While migrating from Fluentd to Fluent Bit might seem like an enormous task, you have many options about how to attack and where to focus to achieve the highest impact. Of course migrations are also a great time to re-evaluate certain logic for improvement and even introduce new architecture patterns such as a telemetry pipeline.
If you are looking for guided or assisted help, let me know. I have helped many folks migrate from Fluentd to Fluent Bit and even assisted with modernizing certain portions to a telemetry pipeline.
Frequently Asked Questions
Why migrate from Fluentd to Fluent Bit
With Fluent Bit you will get higher performance for the same resources you are already using; full OpenTelemetry support for logs, metrics, and traces as well as Prometheus support for metrics; simpler configuration and routing ability to multiple locations; higher velocity for adding custom processing rules; and integrated monitoring to better understand performance and dataflows.
What are some differences between Fluentd and Fluent Bit?
With Fluentd, the main language is Ruby and initially designed to help users push data to big data platforms such as Hadoop. Meanwhile, Fluent Bit is written in C, with a focus on hyper performance in smaller systems (containers, embedded Linux).
Can Fluentd and Fluent Bit work together?
Yes Fluent Bit and Fluentd can work together, which means it’s possible to capture from more sources by using Fluentd and introduce the data into a Fluent Bit deployment. The Forward plugin has a defined standard that Fluent Bit and Fluentd both use. Some external products have also adopted this protocol so they can be connected directly to Fluent Bit.
Autonomous Testing of etcd’s Robustness
As a critical component of many production systems, including Kubernetes, the etcd project’s first priority is reliability. Ensuring consistency and data safety requires our project contributors to continuously improve testing methodologies. In this article, we describe how we use advanced simulation testing to uncover subtle bugs, validate the robustness of our releases, and increase our confidence in etcd’s stability. We’ll share our key findings and how they have improved etcd.
Enhancing etcd’s Robustness Testing
Many critical software systems depend on etcd to be correct and consistent, most notably as the primary datastore for Kubernetes. After some issues with the v3.5 release, the etcd maintainers developed a new robustness testing framework to better test for correctness under various failure scenarios. To further enhance our testing capabilities, we integrated a deterministic simulation testing platform from Antithesis into our workflow.
The platform works by running the entire etcd cluster inside a deterministic hypervisor. This specialized environment gives the testing software complete control over every source of non-determinism, such as network behavior, thread scheduling, and system clocks. This means any bug it discovers can be perfectly and reliably reproduced.
Within this simulated environment, the testing methodology shifts away from traditional, scenario-based tests. Instead of writing tests imperatively with strict assertions for one specific outcome, this approach uses declarative, property-based assertions about system behavior. These properties are high-level invariants about the system that must always hold true. For example, “data consistency is never violated” or “a watch event is never dropped.”
The platform then treats these properties not as passive checks, but as targets to break. It combines automated exploration with targeted fault injection, actively searching for the precise sequence of events and failures that will cause a property to be violated. This active search for violations is what allows the platform to uncover subtle bugs that result from complex combinations of factors. Antithesis refers to this approach as Autonomous Testing.
This builds upon etcd’s existing robustness tests, which also use a property-based approach. However, without a deterministic environment or automated exploration, the original framework resembled throwing darts while blindfolded and hoping to hit the bullseye. A bug might be found, but the process relies heavily on random chance and is difficult to reproduce. Antithesis’s deterministic simulation and active exploration remove the blindfold, enabling a systematic and reproducible search for bugs.
How We Tested
Our goals for this testing effort were to:
- Validate the robustness of etcd v3.6.
- Improve etcd’s software quality by finding and fixing bugs.
- Enhance our existing testing framework with autonomous testing.
We ran our existing robustness tests on the Antithesis simulation platform, testing a 3-node and a 1-node etcd cluster against a variety of faults, including:
- Network faults: latency, congestion, and partitions.
- Container-level faults: thread pauses, process kills, clock jitter, and CPU throttling.
We tested older versions of etcd with known bugs to validate the testing methodology, as well as our stable releases (3.4, 3.5, 3.6) and the main development branch. In total, we ran 830 wall-clock hours of testing, which simulated 4.5 years of usage.
What We Found
The results were impressive. The simulation testing not only found all the known bugs we tested for but also uncovered several new issues in our main development branch.
Here are some of the key findings:
- A critical watch bug was discovered that our existing tests had missed. This bug was present in all stable releases of etcd.
- All known bugs were found, giving us confidence in the ability of the combined testing approach to find regressions.
- Our own testing was improved by revealing a flaw in our linearization checker model.
Issues in the Main Development Branch
DescriptionReport LinkStatusImpactDetailsWatch on future revision might receive old eventsTriage ReportFixed in 3.6.2 (#20281)MediumNew bug discovered by AntithesisWatch on future revision might receive old notificationsTriage ReportFixed in 3.6.2 (#20221)MediumNew bug discovered by both Antithesis and robustness testsPanic when two snapshots are received in a short periodTriage ReportOpenLowPreviously discovered by robustnessPanic from db page expected to be 5Triage ReportOpenLowNew bug discovered by AntithesisOperation time based on watch response is incorrectTriage ReportFixed test on main branch (#19998)LowBug in robustness tests discovered by AntithesisKnown Issues
Antithesis also successfully found and reproduced these known issues in older releases – the “Brown M&Ms” set by the etcd maintainers.
DescriptionReport LinkWatch dropping an event when compacting on deleteTriage ReportRevision decreasing caused by crash during compactionTriage ReportWatch progress notification not synced with streamTriage ReportInconsistent revision caused by crash during defragTriage ReportWatchable runlock bugTriage ReportConclusion
The integration of this advanced simulation testing into our development workflow has been a success. It has allowed us to find and fix critical bugs, improve our existing testing framework, and increase our confidence in the reliability of etcd. We will continue to leverage this technology to ensure that etcd remains a stable and trusted distributed key-value store for the community.
Announcing Changed Block Tracking API support (alpha)
We're excited to announce the alpha support for a changed block tracking mechanism. This enhances the Kubernetes storage ecosystem by providing an efficient way for CSI storage drivers to identify changed blocks in PersistentVolume snapshots. With a driver that can use the feature, you could benefit from faster and more resource-efficient backup operations.
If you're eager to try this feature, you can skip to the Getting Started section.
What is changed block tracking?
Changed block tracking enables storage systems to identify and track modifications at the block level between snapshots, eliminating the need to scan entire volumes during backup operations. The improvement is a change to the Container Storage Interface (CSI), and also to the storage support in Kubernetes itself. With the alpha feature enabled, your cluster can:
- Identify allocated blocks within a CSI volume snapshot
- Determine changed blocks between two snapshots of the same volume
- Streamline backup operations by focusing only on changed data blocks
For Kubernetes users managing large datasets, this API enables significantly more efficient backup processes. Backup applications can now focus only on the blocks that have changed, rather than processing entire volumes.
Note:
As of now, the Changed Block Tracking API is supported only for block volumes and not for file volumes. CSI drivers that manage file-based storage systems will not be able to implement this capability.Benefits of changed block tracking support in Kubernetes
As Kubernetes adoption grows for stateful workloads managing critical data, the need for efficient backup solutions becomes increasingly important. Traditional full backup approaches face challenges with:
- Long backup windows: Full volume backups can take hours for large datasets, making it difficult to complete within maintenance windows.
- High resource utilization: Backup operations consume substantial network bandwidth and I/O resources, especially for large data volumes and data-intensive applications.
- Increased storage costs: Repetitive full backups store redundant data, causing storage requirements to grow linearly even when only a small percentage of data actually changes between backups.
The Changed Block Tracking API addresses these challenges by providing native Kubernetes support for incremental backup capabilities through the CSI interface.
Key components
The implementation consists of three primary components:
- CSI SnapshotMetadata Service API: An API, offered by gRPC, that provides volume snapshot and changed block data.
- SnapshotMetadataService API: A Kubernetes CustomResourceDefinition (CRD) that advertises CSI driver metadata service availability and connection details to cluster clients.
- External Snapshot Metadata Sidecar: An intermediary component that connects CSI drivers to backup applications via a standardized gRPC interface.
Implementation requirements
Storage provider responsibilities
If you're an author of a storage integration with Kubernetes and want to support the changed block tracking feature, you must implement specific requirements:
-
Implement CSI RPCs: Storage providers need to implement the
SnapshotMetadataservice as defined in the CSI specifications protobuf. This service requires server-side streaming implementations for the following RPCs:GetMetadataAllocated: For identifying allocated blocks in a snapshotGetMetadataDelta: For determining changed blocks between two snapshots
-
Storage backend capabilities: Ensure the storage backend has the capability to track and report block-level changes.
-
Deploy external components: Integrate with the
external-snapshot-metadatasidecar to expose the snapshot metadata service. -
Register custom resource: Register the
SnapshotMetadataServiceresource using a CustomResourceDefinition and create aSnapshotMetadataServicecustom resource that advertises the availability of the metadata service and provides connection details. -
Support error handling: Implement proper error handling for these RPCs according to the CSI specification requirements.
Backup solution responsibilities
A backup solution looking to leverage this feature must:
-
Set up authentication: The backup application must provide a Kubernetes ServiceAccount token when using the Kubernetes SnapshotMetadataService API. Appropriate access grants, such as RBAC RoleBindings, must be established to authorize the backup application ServiceAccount to obtain such tokens.
-
Implement streaming client-side code: Develop clients that implement the streaming gRPC APIs defined in the schema.proto file. Specifically:
- Implement streaming client code for
GetMetadataAllocatedandGetMetadataDeltamethods - Handle server-side streaming responses efficiently as the metadata comes in chunks
- Process the
SnapshotMetadataResponsemessage format with proper error handling
The
external-snapshot-metadataGitHub repository provides a convenient iterator support package to simplify client implementation. - Implement streaming client code for
-
Handle large dataset streaming: Design clients to efficiently handle large streams of block metadata that could be returned for volumes with significant changes.
-
Optimize backup processes: Modify backup workflows to use the changed block metadata to identify and only transfer changed blocks to make backups more efficient, reducing both backup duration and resource consumption.
Getting started
To use changed block tracking in your cluster:
- Ensure your CSI driver supports volume snapshots and implements the snapshot metadata capabilities with the required
external-snapshot-metadatasidecar - Make sure the SnapshotMetadataService custom resource is registered using CRD
- Verify the presence of a SnapshotMetadataService custom resource for your CSI driver
- Create clients that can access the API using appropriate authentication (via Kubernetes ServiceAccount tokens)
The API provides two main functions:
GetMetadataAllocated: Lists blocks allocated in a single snapshotGetMetadataDelta: Lists blocks changed between two snapshots
What’s next?
Depending on feedback and adoption, the Kubernetes developers hope to push the CSI Snapshot Metadata implementation to Beta in the future releases.
Where can I learn more?
For those interested in trying out this new feature:
- Official Kubernetes CSI Developer Documentation
- The enhancement proposal for the snapshot metadata feature.
- GitHub repository for implementation and release status of
external-snapshot-metadata - Complete gRPC protocol definitions for snapshot metadata API: schema.proto
- Example snapshot metadata client implementation: snapshot-metadata-lister
- End-to-end example with csi-hostpath-driver: example documentation
How do I get involved?
This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. On behalf of SIG Storage, I would like to offer a huge thank you to the contributors who helped review the design and implementation of the project, including but not limited to the following:
- Ben Swartzlander (bswartz)
- Carl Braganza (carlbraganza)
- Daniil Fedotov (hairyhum)
- Ivan Sim (ihcsim)
- Nikhil Ladha (Nikhil-Ladha)
- Prasad Ghangal (PrasadG193)
- Praveen M (iPraveenParihar)
- Rakshith R (Rakshith-R)
- Xing Yang (xing-yang)
Thank also to everyone who has contributed to the project, including others who helped review the KEP and the CSI spec PR
For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We always welcome new contributors.
The SIG also holds regular Data Protection Working Group meetings. New attendees are welcome to join our discussions.
CNCF’s Helm Project Remains Fully Open Source and Unaffected by Recent Vendor Deprecations
Recently, users may have seen the news about Broadcom (Bitnami) regarding upcoming deprecations of their publicly available container images and Helm Charts. These changes, which will take effect by September 29, 2025, mark a shift to a paid subscription model for Bitnami Secure Images and the removal of many free-to-use artifacts from public registries.
We want to be clear: these changes do not impact the Helm project itself.
Helm is a graduated project that will remain under the CNCF. It continues to be fully open source, Apache 2.0 licensed, and governed by a neutral community. The CNCF community retains ownership of all project intellectual property per our IP policy, ensuring no single vendor can alter its open governance model.
While “Helm charts” refer broadly to a packaging format that anyone can use to deploy applications on Kubernetes, Bitnami Helm Charts are a specific vendor-maintained implementation. Developed and maintained by the Bitnami team (now part of Broadcom), these charts are known for their ease of use, security features, and reliability. Bitnami’s decision to deprecate its public chart and image repositories is entirely separate from the Helm project itself.
Users currently depending on Bitnami Helm Charts should begin exploring migration or mirroring strategies to avoid potential disruption.
The Helm community is actively working to support users during this transition, including guidance on:
- Updating chart dependencies
- Exploring alternative chart sources
- Migrating to maintained open image repositories
We encourage users to follow the Helm blog and Helm GitHub for updates and support resources.
CNCF remains committed to maintaining the integrity of our open source projects and supporting communities through transitions like this. This event also reinforces the importance of vendor neutrality and resilient infrastructure design—a principle at the heart of our mission.
For any media inquiries, please contact: [email protected]
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.
Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.
The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.
The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.
Map view of Karpenter Resources and how they relate to Kubernetes resources
Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics
Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .


Scaling decisions
Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.


Config editor with validation support
Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources
View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.



Dashboard for Pending Pods
View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers
This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.
Provider Name Tested Extra provider specific info supported AWS ✅ ✅ Azure ✅ ✅ AlibabaCloud ❌ ❌ Bizfly Cloud ❌ ❌ Cluster API ❌ ❌ GCP ❌ ❌ Proxmox ❌ ❌ Oracle Cloud Infrastructure (OCI) ❌ ❌Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).
How to use
Please see the plugins/karpenter/README.md for instructions on how to use.
Feedback and Questions
Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.
Kubernetes v1.34: Pod Level Resources Graduated to Beta
On behalf of the Kubernetes community, I am thrilled to announce that the Pod Level Resources feature has graduated to Beta in the Kubernetes v1.34 release and is enabled by default! This significant milestone introduces a new layer of flexibility for defining and managing resource allocation for your Pods. This flexibility stems from the ability to specify CPU and memory resources for the Pod as a whole. Pod level resources can be combined with the container-level specifications to express the exact resource requirements and limits your application needs.
Pod-level specification for resources
Until recently, resource specifications that applied to Pods were primarily defined at the individual container level. While effective, this approach sometimes required duplicating or meticulously calculating resource needs across multiple containers within a single Pod. As a beta feature, Kubernetes allows you to specify the CPU, memory and hugepages resources at the Pod-level. This means you can now define resource requests and limits for an entire Pod, enabling easier resource sharing without requiring granular, per-container management of these resources where it's not needed.
Why does Pod-level specification matter?
This feature enhances resource management in Kubernetes by offering flexible resource management at both the Pod and container levels.
-
It provides a consolidated approach to resource declaration, reducing the need for meticulous, per-container management, especially for Pods with multiple containers.
-
Pod-level resources enable containers within a pod to share unused resoures amongst themselves, promoting efficient utilization within the pod. For example, it prevents sidecar containers from becoming performance bottlenecks. Previously, a sidecar (e.g., a logging agent or service mesh proxy) hitting its individual CPU limit could be throttled and slow down the entire Pod, even if the main application container had plenty of spare CPU. With pod-level resources, the sidecar and the main container can share Pod's resource budget, ensuring smooth operation during traffic spikes - either the whole Pod is throttled or all containers work.
-
When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This gives you – and cluster administrators - a powerful way to enforce overall resource boundaries for your Pods.
For scheduling, if a pod-level request is explicitly defined, the scheduler uses that specific value to find a suitable node, insteaf of the aggregated requests of the individual containers. At runtime, the pod-level limit acts as a hard ceiling for the combined resource usage of all containers. Crucially, this pod-level limit is the absolute enforcer; even if the sum of the individual container limits is higher, the total resource consumption can never exceed the pod-level limit.
-
Pod-level resources are prioritized in influencing the Quality of Service (QoS) class of the Pod.
-
For Pods running on Linux nodes, the Out-Of-Memory (OOM) score adjustment calculation considers both pod-level and container-level resources requests.
-
Pod-level resources are designed to be compatible with existing Kubernetes functionalities, ensuring a smooth integration into your workflows.
How to specify resources for an entire Pod
Using PodLevelResources feature
gate requires
Kubernetes v1.34 or newer for all cluster components, including the control plane
and every node. This feature gate is in beta and enabled by default in v1.34.
Example manifest
You can specify CPU, memory and hugepages resources directly in the Pod spec manifest at the resources field for the entire Pod.
Here’s an example demonstrating a Pod with both CPU and memory requests and limits defined at the Pod level:
apiVersion: v1
kind: Pod
metadata:
name: pod-resources-demo
namespace: pod-resources-example
spec:
# The 'resources' field at the Pod specification level defines the overall
# resource budget for all containers within this Pod combined.
resources: # Pod-level resources
# 'limits' specifies the maximum amount of resources the Pod is allowed to use.
# The sum of the limits of all containers in the Pod cannot exceed these values.
limits:
cpu: "1" # The entire Pod cannot use more than 1 CPU core.
memory: "200Mi" # The entire Pod cannot use more than 200 MiB of memory.
# 'requests' specifies the minimum amount of resources guaranteed to the Pod.
# This value is used by the Kubernetes scheduler to find a node with enough capacity.
requests:
cpu: "1" # The Pod is guaranteed 1 CPU core when scheduled.
memory: "100Mi" # The Pod is guaranteed 100 MiB of memory when scheduled.
containers:
- name: main-app-container
image: nginx
...
# This container has no resource requests or limits specified.
- name: auxiliary-container
image: fedora
command: ["sleep", "inf"]
...
# This container has no resource requests or limits specified.
In this example, the pod-resources-demo Pod as a whole requests 1 CPU and 100 MiB of memory, and is limited to 1 CPU and 200 MiB of memory. The containers within will operate under these overall Pod-level constraints, as explained in the next section.
Interaction with container-level resource requests or limits
When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This means the node allocates resources based on the pod-level specifications.
Consider a Pod with two containers where pod-level CPU and memory requests and limits are defined, and only one container has its own explicit resource definitions:
apiVersion: v1
kind: Pod
metadata:
name: pod-resources-demo
namespace: pod-resources-example
spec:
resources:
limits:
cpu: "1"
memory: "200Mi"
requests:
cpu: "1"
memory: "100Mi"
containers:
- name: main-app-container
image: nginx
resources:
requests:
cpu: "0.5"
memory: "50Mi"
- name: auxiliary-container
image: fedora
command: [ "sleep", "inf"]
# This container has no resource requests or limits specified.
-
Pod-Level Limits: The pod-level limits (cpu: "1", memory: "200Mi") establish an absolute boundary for the entire Pod. The sum of resources consumed by all its containers is enforced at this ceiling and cannot be surpassed.
-
Resource Sharing and Bursting: Containers can dynamically borrow any unused capacity, allowing them to burst as needed, so long as the Pod's aggregate usage stays within the overall limit.
-
Pod-Level Requests: The pod-level requests (cpu: "1", memory: "100Mi") serve as the foundational resource guarantee for the entire Pod. This value informs the scheduler's placement decision and represents the minimum resources the Pod can rely on during node-level contention.
-
Container-Level Requests: Container-level requests create a priority system within the Pod's guaranteed budget. Because main-app-container has an explicit request (cpu: "0.5", memory: "50Mi"), it is given precedence for its share of resources under resource pressure over the auxiliary-container, which has no such explicit claim.
Limitations
-
First of all, in-place resize of pod-level resources is not supported for Kubernetes v1.34 (or earlier). Attempting to modify the pod-level resource limits or requests on a running Pod results in an error: the resize is rejected. The v1.34 implementation of Pod level resources focuses on allowing initial declaration of an overall resource envelope, that applies to the entire Pod. That is distinct from in-place pod resize, which (despite what the name might suggest) allows you to make dynamic adjustments to container resource requests and limits, within a running Pod, and potentially without a container restart. In-place resizing is also not yet a stable feature; it graduated to Beta in the v1.33 release.
-
Only CPU, memory, and hugepages resources can be specified at pod-level.
-
Pod-level resources are not supported for Windows pods. If the Pod specification explicitly targets Windows (e.g., by setting spec.os.name: "windows"), the API server will reject the Pod during the validation step. If the Pod is not explicitly marked for Windows but is scheduled to a Windows node (e.g., via a nodeSelector), the Kubelet on that Windows node will reject the Pod during its admission process.
-
The Topology Manager, Memory Manager and CPU Manager do not align pods and containers based on pod-level resources as these resource managers don't currently support pod-level resources.
Getting started and providing feedback
Ready to explore Pod Level Resources feature? You'll need a Kubernetes cluster running version 1.34 or later. Remember to enable the PodLevelResources feature gate across your control plane and all nodes.
As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:
PromCon is Only One Month Away; See You in Person or via Live Stream!
It's that time of the year again! With just a few weeks to go until PromCon EU 2025, the Prometheus community is buzzing with excitement, preparations, and passionate conference-driven development. This tenth iteration of the conference dedicated to the CNCF Prometheus monitoring ecosystem continues the tradition of a cozy, community-led, single-track, 2-day intense Prometheus user and developer learning time!
The conference is taking place on October 21–22 at the Google Office in Munich, with an incredible lineup of talks and discussions you won't want to miss!
Content Highlights
The schedule is packed with sessions for everyone, from beginners to seasoned veterans. Topics range from:
- Exciting development of Prometheus features and protocols, e.g. OpenMetrics 2.0, downsampling, delta metric, Parquet-based storage.
- Insightful user community case studies and integrations, e.g. agentic AI, OpenTelemetry Schema and Resource Attributes adoption, auto aggregations, operator improvements, Perses novelties and Rust ecosystem updates.
- General best practices and guidelines for navigating complex observability ecosystems, e.g. Prometheus and OpenTelemetry instrumentation, new alerting methodologies, monitoring CPU hardware, and new PromQL aspects.
...and more!
Last Tickets and Live Recordings
A few last tickets are still available, but they will likely sell quickly soon. You can register for the conference here: https://promcon.io/2025-munich/register/.
However, if you can't make it to Munich, fear not! All talks will be live-streamed and available as recordings after the event.
Feel free to bookmark livestream links below:
Lightning Talks Form is Now Open
Following our tradition, we're offering space for any PromCon in-person participant (capacity limits apply) to deliver a 5-minute talk related to the Prometheus ecosystem.
If you have a last-minute learning, case study or idea to share with the community, feel free to prepare a few slides and sign up here: https://forms.gle/2JF13tSBGzrcuZD7A
The form will be open until the start of the PromCon conference.
Credits
This conference wouldn't be possible without the hard work of the Prometheus team's core organisers (Goutham, Richi and Basti), and the amazing CNCF team!
We wouldn't be able to do it without our sponsors, too:
- Diamond: Grafana Labs and VictoriaMetrics
- Platinum: Red Hat
- Gold: HRT and Amadeus
- Venue: Google Cloud
We can't wait to see you there, either in person or online!
Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)
Have you ever made a typo when expanding your persistent volumes in Kubernetes? Meant to specify 2TB
but specified 20TiB? This seemingly innocuous problem was kinda hard to fix - and took the project almost 5 years to fix.
Automated recovery from storage expansion has been around for a while in beta; however, with the v1.34 release, we have graduated this to
general availability.
While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).
What if you make a mistake and then realize immediately? With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested size hadn't finished, you can amend the size requested. Kubernetes will automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the latest size you specified.
I'll walk through an example of how all of this works.
Reducing PVC size to recover from failed expansion
Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously
specified 10TB to 100TB - but you make a typo and specify 1000TB.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1000TB # newly specified size - but incorrect!
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to 1000TB is never going to succeed.
In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size, that is smaller than the mistake, provided it is still larger than the original size of the actual PersistentVolume.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100TB # Corrected size; has to be greater than 10TB.
# You cannot shrink the volume below its actual size.
This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.
This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it must be still higher than the original size in .status.capacity.
Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.
Improved error handling and observability of volume expansion
Implementing what might look like a relatively minor change also required us to almost fully redo how volume expansion works under the hood in Kubernetes. There are new API fields available in PVC objects which you can monitor to observe progress of volume expansion.
Improved observability of in-progress expansion
You can query .status.allocatedResourceStatus['storage'] of a PVC to monitor progress of a volume expansion operation.
For a typical block volume, this should transition between ControllerResizeInProgress, NodeResizePending and NodeResizeInProgress and become nil/empty when volume expansion has finished.
If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - ControllerResizeInfeasible or NodeResizeInfeasible.
You can also observe size towards which Kubernetes is working by watching pvc.status.allocatedResources.
Improved error handling and reporting
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.
Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate pvc.status.conditions with error keys ControllerResizeError or NodeResizeError when volume expansion fails.
Fixes long standing bugs in resizing workflows
This feature also has allowed us to fix long standing bugs in resizing workflow such as Kubernetes issue #115294. If you observe anything broken, please report your bugs to https://github.com/kubernetes/kubernetes/issues, along with details about how to reproduce the problem.
Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA without feedback from @msau42, @jsafrane and @xing-yang.
All of the contributors who worked on this also appreciate the input provided by @thockin and @liggitt at various Kubernetes contributor summits.
Kubernetes v1.34: DRA Consumable Capacity
Dynamic Resource Allocation (DRA) is a Kubernetes API for managing scarce resources across Pods and containers. It enables flexible resource requests, going beyond simply allocating N number of devices to support more granular usage scenarios. With DRA, users can request specific types of devices based on their attributes, define custom configurations tailored to their workloads, and even share the same resource among multiple containers or Pods.
In this blog, we focus on the device sharing feature and dive into a new capability introduced in Kubernetes 1.34: DRA consumable capacity, which extends DRA to support finer-grained device sharing.
Background: device sharing via ResourceClaims
From the beginning, DRA introduced the ability for multiple Pods to share a device by referencing the same ResourceClaim. This design decouples resource allocation from specific hardware, allowing for more dynamic and reusable provisioning of devices.
In Kubernetes 1.33, the new support for partitionable devices allowed resource drivers to advertise slices of a device that are available, rather than exposing the entire device as an all-or-nothing resource. This enabled Kubernetes to model shareable hardware more accurately.
But there was still a missing piece: it didn't yet support scenarios where the device driver manages fine-grained, dynamic portions of a device resource — like network bandwidth — based on user demand, or to share those resources independently of ResourceClaims, which are restricted by their spec and namespace.
That’s where consumable capacity for DRA comes in.
Benefits of DRA consumable capacity support
Here's a taste of what you get in a cluster with the DRAConsumableCapacity
feature gate enabled.
Device sharing across multiple ResourceClaims or DeviceRequests
Resource drivers can now support sharing the same device — or even a slice of a device — across multiple ResourceClaims or across multiple DeviceRequests.
This means that Pods from different namespaces can simultaneously share the same device, if permitted and supported by the specific DRA driver.
Device resource allocation
Kubernetes extends the allocation algorithm in the scheduler to support allocating a portion of a device's resources, as defined in the capacity field.
The scheduler ensures that the total allocated capacity across all consumers never exceeds the device’s total capacity, even when shared across multiple ResourceClaims or DeviceRequests.
This is very similar to the way the scheduler allows Pods and containers to share allocatable resources on Nodes;
in this case, it allows them to share allocatable (consumable) resources on Devices.
This feature expands support for scenarios where the device driver is able to manage resources within a device and on a per-process basis — for example, allocating a specific amount of memory (e.g., 8 GiB) from a virtual GPU, or setting bandwidth limits on virtual network interfaces allocated to specific Pods. This aims to provide safe and efficient resource sharing.
DistinctAttribute constraint
This feature also introduces a new constraint: DistinctAttribute, which is the complement of the existing MatchAttribute constraint.
The primary goal of DistinctAttribute is to prevent the same underlying device from being allocated multiple times within a single ResourceClaim, which could happen since we are allocating shares (or subsets) of devices.
This constraint ensures that each allocation refers to a distinct resource, even if they belong to the same device class.
It is useful for use cases such as allocating network devices connecting to different subnets to expand coverage or provide redundancy across failure domains.
How to use consumable capacity?
DRAConsumableCapacity is introduced as an alpha feature in Kubernetes 1.34. The feature gate DRAConsumableCapacity must be enabled in kubelet, kube-apiserver, kube-scheduler and kube-controller-manager.
--feature-gates=...,DRAConsumableCapacity=true
As a DRA driver developer
As a DRA driver developer writing in Golang, you can make a device within a ResourceSlice allocatable to multiple ResourceClaims (or devices.requests) by setting AllowMultipleAllocations to true.
Device {
...
AllowMultipleAllocations: ptr.To(true),
...
}
Additionally, you can define a policy to restrict how each device's Capacity should be consumed by each DeviceRequest by defining RequestPolicy field in the DeviceCapacity.
The example below shows how to define a policy that requires a GPU with 40 GiB of memory to allocate at least 5 GiB per request, with each allocation in multiples of 5 GiB.
DeviceCapacity{
Value: resource.MustParse("40Gi"),
RequestPolicy: &CapacityRequestPolicy{
Default: ptr.To(resource.MustParse("5Gi")),
ValidRange: &CapacityRequestPolicyRange {
Min: ptr.To(resource.MustParse("5Gi")),
Step: ptr.To(resource.MustParse("5Gi")),
}
}
}
This will be published to the ResourceSlice, as partially shown below:
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
...
spec:
devices:
- name: gpu0
allowMultipleAllocations: true
capacity:
memory:
value: 40Gi
requestPolicy:
default: 5Gi
validRange:
min: 5Gi
step: 5Gi
An allocated device with a specified portion of consumed capacity will have a ShareID field set in the allocation status.
claim.Status.Allocation.Devices.Results[i].ShareID
This ShareID allows the driver to distinguish between different allocations that refer to the same device or same statically-partitioned slice but come from different ResourceClaim requests.
It acts as a unique identifier for each shared slice, enabling the driver to manage and enforce resource limits independently across multiple consumers.
As a consumer
As a consumer (or user), the device resource can be requested with a ResourceClaim like this:
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
...
spec:
devices:
requests: # for devices
- name: req0
exactly:
- deviceClassName: resource.example.com
capacity:
requests: # for resources which must be provided by those devices
memory: 10Gi
This configuration ensures that the requested device can provide at least 10GiB of memory.
Notably that any resource.example.com device that has at least 10GiB of memory can be allocated.
If a device that does not support multiple allocations is chosen, the allocation would consume the entire device.
To filter only devices that support multiple allocations, you can define a selector like this:
selectors:
- cel:
expression: |-
device.allowMultipleAllocations == true
Integration with DRA device status
In device sharing, general device information is provided through the resource slice.
However, some details are set dynamically after allocation.
These can be conveyed using the .status.devices field of a ResourceClaim.
That field is only published in clusters where the DRAResourceClaimDeviceStatus
feature gate is enabled.
If you do have device status support available, a driver can expose additional device-specific information beyond the ShareID.
One particularly useful use case is for virtual networks, where a driver can include the assigned IP address(es) in the status.
This is valuable for both network service operations and troubleshooting.
You can find more information by watching our recording at: KubeCon Japan 2025 - Reimagining Cloud Native Networks: The Critical Role of DRA.
What can you do next?
-
Check out the CNI DRA Driver project for an example of DRA integration in Kubernetes networking. Try integrating with network resources like
macvlan,ipvlan, or smart NICs. -
Start enabling the
DRAConsumableCapacityfeature gate and experimenting with virtualized or partitionable devices. Specify your workloads with consumable capacity (for example: fractional bandwidth or memory). -
Let us know your feedback:
- ✅ What worked well?
- ⚠️ What didn’t?
If you encountered issues to fix or opportunities to enhance, please file a new issue and reference KEP-5075 there, or reach out via Slack (#wg-device-management).
Conclusion
Consumable capacity support enhances the device sharing capability of DRA by allowing effective device sharing across namespaces, across claims, and tailored to each Pod’s actual needs. It also empowers drivers to enforce capacity limits, improves scheduling accuracy, and unlocks new use cases like bandwidth-aware networking and multi-tenant device sharing.
Try it out, experiment with consumable resources, and help shape the future of dynamic resource allocation in Kubernetes!
Further Reading
Kubernetes v1.34: Pods Report DRA Resource Health
The rise of AI/ML and other high-performance workloads has made specialized hardware like GPUs, TPUs, and FPGAs a critical component of many Kubernetes clusters. However, as discussed in a previous blog post about navigating failures in Pods with devices, when this hardware fails, it can be difficult to diagnose, leading to significant downtime. With the release of Kubernetes v1.34, we are excited to announce a new alpha feature that brings much-needed visibility into the health of these devices.
This work extends the functionality of KEP-4680, which first introduced a mechanism for reporting the health of devices managed by Device Plugins. Now, this capability is being extended to Dynamic Resource Allocation (DRA). Controlled by the ResourceHealthStatus feature gate, this enhancement allows DRA drivers to report device health directly into a Pod's .status field, providing crucial insights for operators and developers.
Why expose device health in Pod status?
For stateful applications or long-running jobs, a device failure can be disruptive and costly. By exposing device health in the .status field for a Pod, Kubernetes provides a standardized way for users and automation tools to quickly diagnose issues. If a Pod is failing, you can now check its status to see if an unhealthy device is the root cause, saving valuable time that might otherwise be spent debugging application code.
How it works
This feature introduces a new, optional communication channel between the Kubelet and DRA drivers, built on three core components.
A new gRPC health service
A new gRPC service, DRAResourceHealth, is defined in the dra-health/v1alpha1 API group. DRA drivers can implement this service to stream device health updates to the Kubelet. The service includes a NodeWatchResources server-streaming RPC that sends the health status (Healthy, Unhealthy, or Unknown) for the devices it manages.
Kubelet integration
The Kubelet’s DRAPluginManager discovers which drivers implement the health service. For each compatible driver, it starts a long-lived NodeWatchResources stream to receive health updates. The DRA Manager then consumes these updates and stores them in a persistent healthInfoCache that can survive Kubelet restarts.
Populating the Pod status
When a device's health changes, the DRA manager identifies all Pods affected by the change and triggers a Pod status update. A new field, allocatedResourcesStatus, is now part of the v1.ContainerStatus API object. The Kubelet populates this field with the current health of each device allocated to the container.
A practical example
If a Pod is in a CrashLoopBackOff state, you can use kubectl describe pod <pod-name> to inspect its status. If an allocated device has failed, the output will now include the allocatedResourcesStatus field, clearly indicating the problem:
status:
containerStatuses:
- name: my-gpu-intensive-container
# ... other container statuses
allocatedResourcesStatus:
- name: "claim:my-gpu-claim"
resources:
- resourceID: "example.com/gpu-a1b2-c3d4"
health: "Unhealthy"
This explicit status makes it clear that the issue is with the underlying hardware, not the application.
Now you can improve the failure detection logic to react on the unhealthy devices associated with the Pod by de-scheduling a Pod.
How to use this feature
As this is an alpha feature in Kubernetes v1.34, you must take the following steps to use it:
- Enable the
ResourceHealthStatusfeature gate on your kube-apiserver and kubelets. - Ensure you are using a DRA driver that implements the
v1alpha1 DRAResourceHealthgRPC service.
DRA drivers
If you are developing a DRA driver, make sure to think about device failure detection strategy and ensure that your driver is integrated with this feature. This way, your driver will improve the user experience and simplify debuggability of hardware issues.
What's next?
This is the first step in a broader effort to improve how Kubernetes handles device failures. As we gather feedback on this alpha feature, the community is planning several key enhancements before graduating to Beta:
- Detailed health messages: To improve the troubleshooting experience, we plan to add a human-readable message field to the gRPC API. This will allow DRA drivers to provide specific context for a health status, such as "GPU temperature exceeds threshold" or "NVLink connection lost".
- Configurable health timeouts: The timeout for marking a device's health as "Unknown" is currently hardcoded. We plan to make this configurable, likely on a per-driver basis, to better accommodate the different health-reporting characteristics of various hardware.
- Improved post-mortem troubleshooting: We will address a known limitation where health updates may not be applied to pods that have already terminated. This fix will ensure that the health status of a device at the time of failure is preserved, which is crucial for troubleshooting batch jobs and other "run-to-completion" workloads.
This feature was developed as part of KEP-4680, and community feedback is crucial as we work toward graduating it to Beta. We have more improvements of device failure handling in k8s and encourage you to try it out and share your experiences with the SIG Node community!
Kubernetes v1.34: Moving Volume Group Snapshots to v1beta2
Volume group snapshots were introduced as an Alpha feature with the Kubernetes 1.27 release and moved to Beta in the Kubernetes 1.32 release. The recent release of Kubernetes v1.34 moved that support to a second beta. The support for volume group snapshots relies on a set of extension APIs for group snapshots. These APIs allow users to take crash consistent snapshots for a set of volumes. Behind the scenes, Kubernetes uses a label selector to group multiple PersistentVolumeClaims for snapshotting. A key aim is to allow you restore that set of snapshots to new volumes and recover your workload based on a crash consistent recovery point.
This new feature is only supported for CSI volume drivers.
What's new in Beta 2?
While testing the beta version, we encountered an issue where the restoreSize field is not set for individual VolumeSnapshotContents and VolumeSnapshots if CSI driver does not implement the ListSnapshots RPC call.
We evaluated various options here and decided to make this change releasing a new beta for the API.
Specifically, a VolumeSnapshotInfo struct is added in v1beta2, it contains information for an individual volume snapshot that is a member of a volume group snapshot. VolumeSnapshotInfoList, a list of VolumeSnapshotInfo, is added to VolumeGroupSnapshotContentStatus, replacing VolumeSnapshotHandlePairList. VolumeSnapshotInfoList is a list of snapshot information returned by the CSI driver to identify snapshots on the storage system. VolumeSnapshotInfoList is populated by the csi-snapshotter sidecar based on the CSI CreateVolumeGroupSnapshotResponse returned by the CSI driver's CreateVolumeGroupSnapshot call.
The existing v1beta1 API objects will be converted to the new v1beta2 API objects by a conversion webhook.
What’s next?
Depending on feedback and adoption, the Kubernetes project plans to push the volume group snapshot implementation to general availability (GA) in a future release.
How can I learn more?
- The design spec for the volume group snapshot feature.
- The code repository for volume group snapshot APIs and controller.
- CSI documentation on the group snapshot feature.
How do I get involved?
This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. On behalf of SIG Storage, I would like to offer a huge thank you to the contributors who stepped up these last few quarters to help the project reach beta:
- Ben Swartzlander (bswartz)
- Hemant Kumar (gnufied)
- Jan Šafránek (jsafrane)
- Madhu Rajanna (Madhu-1)
- Michelle Au (msau42)
- Niels de Vos (nixpanic)
- Leonardo Cecchi (leonardoce)
- Saad Ali (saad-ali)
- Xing Yang (xing-yang)
- Yati Padia (yati1998)
For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We always welcome new contributors.
We also hold regular Data Protection Working Group meetings. New attendees are welcome to join our discussions.