You are here

CNCF Projects

Dragonfly v2.4.0 is released

CNCF Blog Projects Category - Thu, 02/05/2026 - 19:00

Dragonfly v2.4.0 is released! Thanks to all of the contributors who made this Dragonfly release happen.

New features and enhancements

load-aware scheduling algorithm

A two-stage scheduling algorithm combining central scheduling with node-level secondary scheduling to optimize P2P download performance, based on real-time load awareness.

 Parent A (40%), Parent B (35%), Parent N (n%). As an image, it shows a two-stage scheduling algorithm combining central scheduling with node-level secondary scheduling to optimize P2P download performance, based on real-time load awareness.

For more information, please refer to the Scheduling.

Vortex protocol support for P2P file transfer

Dragonfly provides the new Vortex transfer protocol based on TLV to improve the download performance in the internal network. Use the TLV (Tag-Length-Value) format as a lightweight protocol to replace gRPC for data transfer between peers. TCP-based Vortex reduces large file download time by 50% and QUIC-based Vortex by 40% compared to gRPC, both effectively reducing peak memory usage.

For more information, please refer to the TCP Protocol Support for P2P File Transfer and QUIC Protocol Support for P2P File Transfer.

Request SDK

A SDK for routing User requests to Seed Peers using consistent hashing, replacing the previous Kubernetes Service load balancing approach.

Flow chart image of the Request SDK, showing the flow between the user, via the request, to the request SDK. From there it filters through chunk 1, chunk 2 and chunk 3 to seed peer 2. From there, it navigated through layer 1 to the OCI registry,

Simple multi‑cluster Kubernetes deployment with scheduler cluster ID

Dragonfly supports a simplified feature for deploying and managing multiple Kubernetes clusters by explicitly assigning a schedulerClusterID to each cluster. This approach allows users to directly control cluster affinity without relying on location‑based scheduling metadata such as IDC, hostname, or IP.

Using this feature, each Peer, Seed Peer, and Scheduler determines its target scheduler cluster through a clearly defined scheduler cluster ID. This ensures precise separation between clusters and predictable cross‑cluster behavior.

A screenshot of the host scheduler cluster ID process. Showing 5 lines of code.

For more information, please refer to the Create Dragonfly Cluster Simple.

Performance and resource optimization for Manager and Scheduler components

Enhanced service performance and resource utilization across Manager and Scheduler components while significantly reducing CPU and memory overhead, delivering improved system efficiency and better resource management.

Enhanced preheating

  • Support for IP-based peer selection in preheating jobs with priority-based selection logic where IP specification takes highest priority, followed by count-based and percentage-based selection.
  • Support for preheating multiple URLs in a single request.
  • Support for preheating file and image via Scheduler gRPC interface.

A screenshot of the Dragonfly operating system. It show the form for 'Create Preheat' including fields for information, clusters, url, and Args.

Calculate task ID based on image blob SHA256 to avoid redundant downloads

The Client now supports calculating task IDs directly from the SHA256 hash of image blobs, instead of using the download URL. This enhancement prevents redundant downloads and data duplication when the same blob is accessed from different registry domains.

Cache HTTP 307 redirects for split downloads

Support for caching HTTP 307 (Temporary Redirect) responses to optimize Dragonfly’s multi-piece download performance. When a download URL is split into multiple pieces, the redirect target is now cached, eliminating redundant redirect requests and reducing latency.

Go Client deprecated and replaced by Rust client

The Go client has been deprecated and replaced by the Rust Client. All future development and maintenance will focus exclusively on the Rust client, which offers improved performance, stability, and reliability.

For more information, please refer to the dragoflyoss/client.

Additional enhancements

  • Enable 64K page size support for ARM64 in the Dragonfly Rust client.
  • Fix missing git commit metadata in dfget version output.
  • Support for config_path of io.containerd.cri.v1.images plugin for containerd V3 configuration.
  • Replaces glibc DNS resolver with hickory-dns in reqwest to implement DNS caching and prevent excessive DNS lookups during piece downloads.
  • Support for the –include-files flag to selectively download files from a directory.
  • Add the –no-progress flag to disable the download progress bar output.
  • Support for custom request headers in backend operations, enabling flexible header configuration for HTTP requests.
  • Refactored log output to reduce redundant logging and improve overall logging efficiency.

Significant bug fixes

  • Modified the database field type from text to longtext to support storing the information of preheating job.
  • Fixed panic on repeated seed peer service stops during Scheduler shutdown.
  • Fixed broker authentication failure when specifying the Redis password without setting a username.

Nydus

New features and enhancements

  • Nydusd: Add CRC32 validation support for both RAFS V5 and V6 formats, enhancing data integrity verification.
  • Nydusd: Support resending FUSE requests during nydusd restoration, improving daemon recovery reliability.
  • Nydusd: Enhance VFS state saving mechanism for daemon hot upgrade and failover.
  • Nydusify: Introduce Nydus-to-OCI reverse conversion capability, enabling seamless migration back to OCI format.
  • Nydusify: Implement zero-disk transfer for image copy, significantly reducing local disk usage during copy operations.
  • Snapshotter: Builtin blob.meta in bootstrap for blob fetch reliability for RAFS v6 image.

Significant bug fixes

  • Nydusd: Fix auth token fetching for access_token field in registry authentication.
  • Nydusd: Add recursive inode/dentry invalidation for umount API.
  • Nydus Image: Fix multiple issues in optimize subcommand and add backend configuration support.
  • Snapshotter: Implement lazy parent recovery for proxy mode to handle missing parent snapshots.

We encourage you to visit the d7y.io website to find out more.

Others

You can see CHANGELOG for more details.

Links

Dragonfly Github

The QR code to access Dragonfly's GitHub project.
Categories: CNCF Projects

Introducing Node Readiness Controller

Kubernetes Blog - Mon, 02/02/2026 - 21:00
Logo for node readiness controller

In the standard Kubernetes model, a node’s suitability for workloads hinges on a single binary "Ready" condition. However, in modern Kubernetes environments, nodes require complex infrastructure dependencies—such as network agents, storage drivers, GPU firmware, or custom health checks—to be fully operational before they can reliably host pods.

Today, on behalf of the Kubernetes project, I am announcing the Node Readiness Controller. This project introduces a declarative system for managing node taints, extending the readiness guardrails during node bootstrapping beyond standard conditions. By dynamically managing taints based on custom health signals, the controller ensures that workloads are only placed on nodes that met all infrastructure-specific requirements.

Why the Node Readiness Controller?

Core Kubernetes Node "Ready" status is often insufficient for clusters with sophisticated bootstrapping requirements. Operators frequently struggle to ensure that specific DaemonSets or local services are healthy before a node enters the scheduling pool.

The Node Readiness Controller fills this gap by allowing operators to define custom scheduling gates tailored to specific node groups. This enables you to enforce distinct readiness requirements across heterogeneous clusters, ensuring for example, that GPU equipped nodes only accept pods once specialized drivers are verified, while general purpose nodes follow a standard path.

It provides three primary advantages:

  • Custom Readiness Definitions: Define what ready means for your specific platform.
  • Automated Taint Management: The controller automatically applies or removes node taints based on condition status, preventing pods from landing on unready infrastructure.
  • Declarative Node Bootstrapping: Manage multi-step node initialization reliably, with a clear observability into the bootstrapping process.

Core concepts and features

The controller centers around the NodeReadinessRule (NRR) API, which allows you to define declarative gates for your nodes.

Flexible enforcement modes

The controller supports two distinct operational modes:

Continuous enforcement
Actively maintains the readiness guarantee throughout the node’s entire lifecycle. If a critical dependency (like a device driver) fails later, the node is immediately tainted to prevent new scheduling.
Bootstrap-only enforcement
Specifically for one-time initialization steps, such as pre-pulling heavy images or hardware provisioning. Once conditions are met, the controller marks the bootstrap as complete and stops monitoring that specific rule for the node.

Condition reporting

The controller reacts to Node Conditions rather than performing health checks itself. This decoupled design allows it to integrate seamlessly with other tools existing in the ecosystem as well as custom solutions:

  • Node Problem Detector (NPD): Use existing NPD setups and custom scripts to report node health.
  • Readiness Condition Reporter: A lightweight agent provided by the project that can be deployed to periodically check local HTTP endpoints and patch node conditions accordingly.

Operational safety with dry run

Deploying new readiness rules across a fleet carries inherent risk. To mitigate this, dry run mode allows operators to first simulate impact on the cluster. In this mode, the controller logs intended actions and updates the rule's status to show affected nodes without applying actual taints, enabling safe validation before enforcement.

Example: CNI bootstrapping

The following NodeReadinessRule ensures a node remains unschedulable until its CNI agent is functional. The controller monitors a custom cniplugin.example.net/NetworkReady condition and only removes the readiness.k8s.io/acme.com/network-unavailable taint once the status is True.

apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
 name: network-readiness-rule
spec:
 conditions:
 - type: "cniplugin.example.net/NetworkReady"
 requiredStatus: "True"
 taint:
 key: "readiness.k8s.io/acme.com/network-unavailable"
 effect: "NoSchedule"
 value: "pending"
 enforcementMode: "bootstrap-only"
 nodeSelector:
 matchLabels:
 node-role.kubernetes.io/worker: ""

Demo:

Getting involved

The Node Readiness Controller is just getting started, with our initial releases out, and we are seeking community feedback to refine the roadmap. Following our productive Unconference discussions at KubeCon NA 2025, we are excited to continue the conversation in person.

Join us at KubeCon + CloudNativeCon Europe 2026 for our maintainer track session: Addressing Non-Deterministic Scheduling: Introducing the Node Readiness Controller.

In the meantime, you can contribute or track our progress here:

Categories: CNCF Projects, Kubernetes

New Conversion from cgroup v1 CPU Shares to v2 CPU Weight

Kubernetes Blog - Fri, 01/30/2026 - 11:00

I'm excited to announce the implementation of an improved conversion formula from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses critical issues with CPU priority allocation for Kubernetes workloads when running on systems with cgroup v2.

Background

Kubernetes was originally designed with cgroup v1 in mind, where CPU shares were defined simply by assigning the container's CPU requests in millicpu form.

For example, a container requesting 1 CPU (1024m) would get (cpu.shares = 1024).

After a while, cgroup v1 started being replaced by its successor, cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to 262144, or from 2¹ to 2¹⁸) was replaced with CPU weight (which ranges from [1, 10000], or 10⁰ to 10⁴).

With the transition to cgroup v2, KEP-2254 introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU weight. The conversion formula was defined as: cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)

This formula linearly maps values from [2¹, 2¹⁸] to [10⁰, 10⁴].

Linear conversion formula

While this approach is simple, the linear mapping imposes a few significant problems and impacts both performance and configuration granularity.

Problems with previous conversion formula

The current conversion formula creates two major issues:

1. Reduced priority against non-Kubernetes workloads

In cgroup v1, the default value for CPU shares is 1024, meaning a container requesting 1 CPU has equal priority with system processes that live outside of Kubernetes' scope. However, in cgroup v2, the default CPU weight is 100, but the current formula converts 1 CPU (1024m) to only ≈39 weight - less than 40% of the default.

Example:

  • Container requesting 1 CPU (1024m)
  • cgroup v1: cpu.shares = 1024 (equal to default)
  • cgroup v2 (current): cpu.weight = 39 (much lower than default 100)

This means that after moving to cgroup v2, Kubernetes (or OCI) workloads would de-facto reduce their CPU priority against non-Kubernetes processes. The problem can be severe for setups with many system daemons that run outside of Kubernetes' scope and expect Kubernetes workloads to have priority, especially in situations of resource starvation.

2. Unmanageable granularity

The current formula produces very low values for small CPU requests, limiting the ability to create sub-cgroups within containers for fine-grained resource distribution (which will possibly be much easier moving forward, see KEP #5474 for more info).

Example:

  • Container requesting 100m CPU
  • cgroup v1: cpu.shares = 102
  • cgroup v2 (current): cpu.weight = 4 (too low for sub-cgroup configuration)

With cgroup v1, requesting 100m CPU which led to 102 CPU shares was manageable in the sense that sub-cgroups could have been created inside the main container, assigning fine-grained CPU priorities for different groups of processes. With cgroup v2 however, having 4 shares is very hard to distribute between sub-cgroups since it's not granular enough.

With plans to allow writable cgroups for unprivileged containers, this becomes even more relevant.

New conversion formula

Description

The new formula is more complicated, but does a much better job mapping between cgroup v1 CPU shares and cgroup v2 CPU weight:

$$cpu.weight = \lceil 10^{(L^{2}/612 + 125L/612 - 7/34)} \rceil, \text{ where: } L = \log_2(cpu.shares)$$

The idea is that this is a quadratic function to cross the following values:

  • (2, 1): The minimum values for both ranges.
  • (1024, 100): The default values for both ranges.
  • (262144, 10000): The maximum values for both ranges.

Visually, the new function looks as follows:

2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png

And if you zoom in to the important part:

2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png

The new formula is "close to linear", yet it is carefully designed to map the ranges in a clever way so the three important points above would cross.

How it solves the problems

  1. Better priority alignment:

    • A container requesting 1 CPU (1024m) will now get a cpu.weight = 102. This value is close to cgroup v2's default 100. This restores the intended priority relationship between Kubernetes workloads and system processes.
  2. Improved granularity:

    • A container requesting 100m CPU will get cpu.weight = 17, (see here). Enables better fine-grained resource distribution within containers.

Adoption and integration

This change was implemented at the OCI layer. In other words, this is not implemented in Kubernetes itself; therefore the adoption of the new conversion formula depends solely on the OCI runtime adoption.

For example:

  • runc: The new formula is enabled from version 1.3.2.
  • crun: The new formula is enabled from version 1.23.

Impact on existing deployments

Important: Some consumers may be affected if they assume the older linear conversion formula. Applications or monitoring tools that directly calculate expected CPU weight values based on the previous formula may need updates to account for the new quadratic conversion. This is particularly relevant for:

  • Custom resource management tools that predict CPU weight values.
  • Monitoring systems that validate or expect specific weight values.
  • Applications that programmatically set or verify CPU weight values.

The Kubernetes project recommends testing the new conversion formula in non-production environments before upgrading OCI runtimes to ensure compatibility with existing tooling.

Where can I learn more?

For those interested in this enhancement:

How do I get involved?

For those interested in getting involved with Kubernetes node-level features, join the Kubernetes Node Special Interest Group. We always welcome new contributors and diverse perspectives on resource management challenges.

Categories: CNCF Projects, Kubernetes

Ingress NGINX: Statement from the Kubernetes Steering and Security Response Committees

Kubernetes Blog - Wed, 01/28/2026 - 19:00

In March 2026, Kubernetes will retire Ingress NGINX, a piece of critical infrastructure for about half of cloud native environments. The retirement of Ingress NGINX was announced for March 2026, after years of public warnings that the project was in dire need of contributors and maintainers. There will be no more releases for bug fixes, security patches, or any updates of any kind after the project is retired. This cannot be ignored, brushed off, or left until the last minute to address. We cannot overstate the severity of this situation or the importance of beginning migration to alternatives like Gateway API or one of the many third-party Ingress controllers immediately.

To be abundantly clear: choosing to remain with Ingress NGINX after its retirement leaves you and your users vulnerable to attack. None of the available alternatives are direct drop-in replacements. This will require planning and engineering time. Half of you will be affected. You have two months left to prepare.

Existing deployments will continue to work, so unless you proactively check, you may not know you are affected until you are compromised. In most cases, you can check to find out whether or not you rely on Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.

Despite its broad appeal and widespread use by companies of all sizes, and repeated calls for help from the maintainers, the Ingress NGINX project never received the contributors it so desperately needed. According to internal Datadog research, about 50% of cloud native environments currently rely on this tool, and yet for the last several years, it has been maintained solely by one or two people working in their free time. Without sufficient staffing to maintain the tool to a standard both ourselves and our users would consider secure, the responsible choice is to wind it down and refocus efforts on modern alternatives like Gateway API.

We did not make this decision lightly; as inconvenient as it is now, doing so is necessary for the safety of all users and the ecosystem as a whole. Unfortunately, the flexibility Ingress NGINX was designed with, that was once a boon, has become a burden that cannot be resolved. With the technical debt that has piled up, and fundamental design decisions that exacerbate security flaws, it is no longer reasonable or even possible to continue maintaining the tool even if resources did materialize.

We issue this statement together to reinforce the scale of this change and the potential for serious risk to a significant percentage of Kubernetes users if this issue is ignored. It is imperative that you check your clusters now. If you are reliant on Ingress NGINX, you must begin planning for migration.

Thank you,

Kubernetes Steering Committee

Kubernetes Security Response Committee

Categories: CNCF Projects, Kubernetes

Experimenting with Gateway API using kind

Kubernetes Blog - Tue, 01/27/2026 - 19:00

This document will guide you through setting up a local experimental environment with Gateway API on kind. This setup is designed for learning and testing. It helps you understand Gateway API concepts without production complexity.

Caution:

This is an experimentation learning setup, and should not be used for production. The components used on this document are not suited for production usage. Once you're ready to deploy Gateway API in a production environment, select an implementation that suits your needs.

Overview

In this guide, you will:

  • Set up a local Kubernetes cluster using kind (Kubernetes in Docker)
  • Deploy cloud-provider-kind, which provides both LoadBalancer Services and a Gateway API controller
  • Create a Gateway and HTTPRoute to route traffic to a demo application
  • Test your Gateway API configuration locally

This setup is ideal for learning, development, and experimentation with Gateway API concepts.

Prerequisites

Before you begin, ensure you have the following installed on your local machine:

  • Docker - Required to run kind and cloud-provider-kind
  • kubectl - The Kubernetes command-line tool
  • kind - Kubernetes in Docker
  • curl - Required to test the routes

Create a kind cluster

Create a new kind cluster by running:

kind create cluster

This will create a single-node Kubernetes cluster running in a Docker container.

Install cloud-provider-kind

Next, you need cloud-provider-kind, which provides two key components for this setup:

  • A LoadBalancer controller that assigns addresses to LoadBalancer-type Services
  • A Gateway API controller that implements the Gateway API specification

It also automatically installs the Gateway API Custom Resource Definitions (CRDs) in your cluster.

Run cloud-provider-kind as a Docker container on the same host where you created the kind cluster:

VERSION="$(basename $(curl -s -L -o /dev/null -w '%{url_effective}' https://github.com/kubernetes-sigs/cloud-provider-kind/releases/latest))"
docker run -d --name cloud-provider-kind --rm --network host -v /var/run/docker.sock:/var/run/docker.sock registry.k8s.io/cloud-provider-kind/cloud-controller-manager:${VERSION}

Note: On some systems, you may need elevated privileges to access the Docker socket.

Verify that cloud-provider-kind is running:

docker ps --filter name=cloud-provider-kind

You should see the container listed and in a running state. You can also check the logs:

docker logs cloud-provider-kind

Experimenting with Gateway API

Now that your cluster is set up, you can start experimenting with Gateway API resources.

cloud-provider-kind automatically provisions a GatewayClass called cloud-provider-kind. You'll use this class to create your Gateway.

It is worth noticing that while kind is not a cloud provider, the project is named as cloud-provider-kind as it provides features that simulate a cloud-enabled environment.

Deploy a Gateway

The following manifest will:

  • Create a new namespace called gateway-infra
  • Deploy a Gateway that listens on port 80
  • Accept HTTPRoutes with hostnames matching the *.exampledomain.example pattern
  • Allow routes from any namespace to attach to the Gateway. Note: In real clusters, prefer Same or Selector values on the allowedRoutes namespace selector field to limit attachments.

Apply the following manifest:

---
apiVersion: v1
kind: Namespace
metadata:
 name: gateway-infra
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: gateway
 namespace: gateway-infra
spec:
 gatewayClassName: cloud-provider-kind
 listeners:
 - name: default
 hostname: "*.exampledomain.example"
 port: 80
 protocol: HTTP
 allowedRoutes:
 namespaces:
 from: All

Then verify that your Gateway is properly programmed and has an address assigned:

kubectl get gateway -n gateway-infra gateway

Expected output:

NAME CLASS ADDRESS PROGRAMMED AGE
gateway cloud-provider-kind 172.18.0.3 True 5m6s

The PROGRAMMED column should show True, and the ADDRESS field should contain an IP address.

Deploy a demo application

Next, deploy a simple echo application that will help you test your Gateway configuration. This application:

  • Listens on port 3000
  • Echoes back request details including path, headers, and environment variables
  • Runs in a namespace called demo

Apply the following manifest:

apiVersion: v1
kind: Namespace
metadata:
 name: demo
---
apiVersion: v1
kind: Service
metadata:
 labels:
 app.kubernetes.io/name: echo
 name: echo
 namespace: demo
spec:
 ports:
 - name: http
 port: 3000
 protocol: TCP
 targetPort: 3000
 selector:
 app.kubernetes.io/name: echo
 type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
 app.kubernetes.io/name: echo
 name: echo
 namespace: demo
spec:
 selector:
 matchLabels:
 app.kubernetes.io/name: echo
 template:
 metadata:
 labels:
 app.kubernetes.io/name: echo
 spec:
 containers:
 - env:
 - name: POD_NAME
 valueFrom:
 fieldRef:
 apiVersion: v1
 fieldPath: metadata.name
 - name: NAMESPACE
 valueFrom:
 fieldRef:
 apiVersion: v1
 fieldPath: metadata.namespace
 image: registry.k8s.io/gateway-api/echo-basic:v20251204-v1.4.1
 name: echo-basic

Create an HTTPRoute

Now create an HTTPRoute to route traffic from your Gateway to the echo application. This HTTPRoute will:

  • Respond to requests for the hostname some.exampledomain.example
  • Route traffic to the echo application
  • Attach to the Gateway in the gateway-infra namespace

Apply the following manifest:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: echo
 namespace: demo
spec:
 parentRefs:
 - name: gateway
 namespace: gateway-infra
 hostnames: ["some.exampledomain.example"]
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /
 backendRefs:
 - name: echo
 port: 3000

Test your route

The final step is to test your route using curl. You'll make a request to the Gateway's IP address with the hostname some.exampledomain.example. The command below is for POSIX shell only, and may need to be adjusted for your environment:

GW_ADDR=$(kubectl get gateway -n gateway-infra gateway -o jsonpath='{.status.addresses[0].value}')
curl --resolve some.exampledomain.example:80:${GW_ADDR} http://some.exampledomain.example

You should receive a JSON response similar to this:

{
 "path": "/",
 "host": "some.exampledomain.example",
 "method": "GET",
 "proto": "HTTP/1.1",
 "headers": {
 "Accept": [
 "*/*"
 ],
 "User-Agent": [
 "curl/8.15.0"
 ]
 },
 "namespace": "demo",
 "ingress": "",
 "service": "",
 "pod": "echo-dc48d7cf8-vs2df"
}

If you see this response, congratulations! Your Gateway API setup is working correctly.

Troubleshooting

If something isn't working as expected, you can troubleshoot by checking the status of your resources.

Check the Gateway status

First, inspect your Gateway resource:

kubectl get gateway -n gateway-infra gateway -o yaml

Look at the status section for conditions. Your Gateway should have:

  • Accepted: True - The Gateway was accepted by the controller
  • Programmed: True - The Gateway was successfully configured
  • .status.addresses populated with an IP address

Check the HTTPRoute status

Next, inspect your HTTPRoute:

kubectl get httproute -n demo echo -o yaml

Check the status.parents section for conditions. Common issues include:

  • ResolvedRefs set to False with reason BackendNotFound; this means that the backend Service doesn't exist or has the wrong name
  • Accepted set to False; this means that the route couldn't attach to the Gateway (check namespace permissions or hostname matching)

Example error when a backend is not found:

status:
 parents:
 - conditions:
 - lastTransitionTime: "2026-01-19T17:13:35Z"
 message: backend not found
 observedGeneration: 2
 reason: BackendNotFound
 status: "False"
 type: ResolvedRefs
 controllerName: kind.sigs.k8s.io/gateway-controller

Check controller logs

If the resource statuses don't reveal the issue, check the cloud-provider-kind logs:

docker logs -f cloud-provider-kind

This will show detailed logs from both the LoadBalancer and Gateway API controllers.

Cleanup

When you're finished with your experiments, you can clean up the resources:

Remove Kubernetes resources

Delete the namespaces (this will remove all resources within them):

kubectl delete namespace gateway-infra
kubectl delete namespace demo

Stop cloud-provider-kind

Stop and remove the cloud-provider-kind container:

docker stop cloud-provider-kind

Because the container was started with the --rm flag, it will be automatically removed when stopped.

Delete the kind cluster

Finally, delete the kind cluster:

kind delete cluster

Next steps

Now that you've experimented with Gateway API locally, you're ready to explore production-ready implementations:

  • Production Deployments: Review the Gateway API implementations to find a controller that matches your production requirements
  • Learn More: Explore the Gateway API documentation to learn about advanced features like TLS, traffic splitting, and header manipulation
  • Advanced Routing: Experiment with path-based routing, header matching, request mirroring and other features following Gateway API user guides

A final word of caution

This kind setup is for development and learning only. Always use a production-grade Gateway API implementation for real workloads.

Categories: CNCF Projects, Kubernetes

Cluster API v1.12: Introducing In-place Updates and Chained Upgrades

Kubernetes Blog - Tue, 01/27/2026 - 11:00

Cluster API brings declarative management to Kubernetes cluster lifecycle, allowing users and platform teams to define the desired state of clusters and rely on controllers to continuously reconcile toward it.

Similar to how you can use StatefulSets or Deployments in Kubernetes to manage a group of Pods, in Cluster API you can use KubeadmControlPlane to manage a set of control plane Machines, or you can use MachineDeployments to manage a group of worker Nodes.

The Cluster API v1.12.0 release expands what is possible in Cluster API, reducing friction in common lifecycle operations by introducing in-place updates and chained upgrades.

Emphasis on simplicity and usability

With v1.12.0, the Cluster API project demonstrates once again that this community is capable of delivering a great amount of innovation, while at the same time minimizing impact for Cluster API users.

What does this mean in practice?

Users simply have to change the Cluster or the Machine spec (just as with previous Cluster API releases), and Cluster API will automatically trigger in-place updates or chained upgrades when possible and advisable.

In-place Updates

Like Kubernetes does for Pods in Deployments, when the Machine spec changes also Cluster API performs rollouts by creating a new Machine and deleting the old one.

This approach, inspired by the principle of immutable infrastructure, has a set of considerable advantages:

  • It is simple to explain, predictable, consistent and easy to reason about with users and engineers.
  • It is simple to implement, because it relies only on two core primitives, create and delete.
  • Implementation does not depend on Machine-specific choices, like OS, bootstrap mechanism etc.

As a result, Machine rollouts drastically reduce the number of variables to be considered when managing the lifecycle of a host server that is hosting Nodes.

However, while advantages of immutability are not under discussion, both Kubernetes and Cluster API are undergoing a similar journey, introducing changes that allow users to minimize workload disruption whenever possible.

Over time, also Cluster API has introduced several improvements to immutable rollouts, including:

The new in-place update feature in Cluster API is the next step in this journey.

With the v1.12.0 release, Cluster API introduces support for update extensions allowing users to make changes on existing machines in-place, without deleting and re-creating the Machines.

Both KubeadmControlPlane and MachineDeployments support in-place updates based on the new update extension, and this means that the boundary of what is possible in Cluster API is now changed in a significant way.

How do in-place updates work?

The simplest way to explain it is that once the user triggers an update by changing the desired state of Machines, then Cluster API chooses the best tool to achieve the desired state.

The news is that now Cluster API can choose between immutable rollouts and in-place update extensions to perform required changes.

In-place updates in Cluster API

Importantly, this is not immutable rollouts vs in-place updates; Cluster API considers both valid options and selects the most appropriate mechanism for a given change.

From the perspective of the Cluster API maintainers, in-place updates are most useful for making changes that don't otherwise require a node drain or pod restart; for example: changing user credentials for the Machine. On the other hand, when the workload will be disrupted anyway, just do a rollout.

Nevertheless, Cluster API remains true to its extensible nature, and everyone can create their own update extension and decide when and how to use in-place updates by trading in some of the benefits of immutable rollouts.

For a deep dive into this feature, make sure to attend the session In-place Updates with Cluster API: The Sweet Spot Between Immutable and Mutable Infrastructure at KubeCon EU in Amsterdam!

Chained Upgrades

ClusterClass and managed topologies in Cluster API jointly provided a powerful and effective framework that acts as a building block for many platforms offering Kubernetes-as-a-Service.

Now with v1.12.0 this feature is making another important step forward, by allowing users to upgrade by more than one Kubernetes minor version in a single operation, commonly referred to as a chained upgrade.

This allows users to declare a target Kubernetes version and let Cluster API safely orchestrate the required intermediate steps, rather than manually managing each minor upgrade.

The simplest way to explain how chained upgrades work, is that once the user triggers an update by changing the desired version for a Cluster, Cluster API computes an upgrade plan, and then starts executing it. Rather than (for example) update the Cluster to v1.33.0 and then v1.34.0 and then v1.35.0, checking on progress at each step, a chained upgrade lets you go directly to v1.35.0.

Executing an upgrade plan means upgrading control plane and worker machines in a strictly controlled order, repeating this process as many times as needed to reach the desired state. The Cluster API is now capable of managing this for you.

Cluster API takes care of optimizing and minimizing the upgrade steps for worker machines, and in fact worker machines will skip upgrades to intermediate Kubernetes minor releases whenever allowed by the Kubernetes version skew policies.

Chained upgrades in Cluster API

Also in this case extensibility is at the core of this feature, and upgrade plan runtime extensions can be used to influence how the upgrade plan is computed; similarly, lifecycle hooks can be used to automate other tasks that must be performed during an upgrade, e.g. upgrading an addon after the control plane update completed.

From our perspective, chained upgrades are most useful for users that struggle to keep up with Kubernetes minor releases, and e.g. they want to upgrade only once per year and then upgrade by three versions (n-3 → n). But be warned: the fact that you can now easily upgrade by more than one minor version is not an excuse to not patch your cluster frequently!

Release team

I would like to thank all the contributors, the maintainers, and all the engineers that volunteered for the release team.

The reliability and predictability of Cluster API releases, which is one of the most appreciated features from our users, is only possible with the support, commitment, and hard work of its community.

Kudos to the entire Cluster API community for the v1.12.0 release and all the great releases delivered in 2025! ​​ If you are interested in getting involved, learn about Cluster API contributing guidelines.

What’s next?

If you read the Cluster API manifesto, you can see how the Cluster API subproject claims the right to remain unfinished, recognizing the need to continuously evolve, improve, and adapt to the changing needs of Cluster API’s users and the broader Cloud Native ecosystem.

As Kubernetes itself continues to evolve, the Cluster API subproject will keep advancing alongside it, focusing on safer upgrades, reduced disruption, and stronger building blocks for platforms managing Kubernetes at scale.

Innovation remains at the heart of Cluster API, stay tuned for an exciting 2026!

Useful links:

Categories: CNCF Projects, Kubernetes

Headlamp in 2025: Project Highlights

Kubernetes Blog - Wed, 01/21/2026 - 21:00

This announcement is a recap from a post originally published on the Headlamp blog.

Headlamp has come a long way in 2025. The project has continued to grow – reaching more teams across platforms, powering new workflows and integrations through plugins, and seeing increased collaboration from the broader community.

We wanted to take a moment to share a few updates and highlight how Headlamp has evolved over the past year.

Updates

Joining Kubernetes SIG UI

This year marked a big milestone for the project: Headlamp is now officially part of Kubernetes SIG UI. This move brings roadmap and design discussions even closer to the core Kubernetes community and reinforces Headlamp’s role as a modern, extensible UI for the project.

As part of that, we’ve also been sharing more about making Kubernetes approachable for a wider audience, including an appearance on Enlightening with Whitney Lee and a talk at KCD New York 2025.

Linux Foundation mentorship

This year, we were excited to work with several students through the Linux Foundation’s Mentorship program, and our mentees have already left a visible mark on Headlamp:

  • Adwait Godbole built the KEDA plugin, adding a UI in Headlamp to view and manage KEDA resources like ScaledObjects and ScaledJobs.
  • Dhairya Majmudar set up an OpenTelemetry-based observability stack for Headlamp, wiring up metrics, logs, and traces so the project is easier to monitor and debug.
  • Aishwarya Ghatole led a UX audit of Headlamp plugins, identifying usability issues and proposing design improvements and personas for plugin users.
  • Anirban Singha developed the Karpenter plugin, giving Headlamp a focused view into Karpenter autoscaling resources and decisions.
  • Aditya Chaudhary improved Gateway API support, so you can see networking relationships on the resource map, as well as improved support for many of the new Gateway API resources.
  • Faakhir Zahid completed a way to easily manage plugin installation with Headlamp deployed in clusters.
  • Saurav Upadhyay worked on backend caching for Kubernetes API calls, reducing load on the API server and improving performance in Headlamp.

New changes

Multi-cluster view

Managing multiple clusters is challenging: teams often switch between tools and lose context when trying to see what runs where. Headlamp solves this by giving you a single view to compare clusters side-by-side. This makes it easier to understand workloads across environments and reduces the time spent hunting for resources.

Multi-cluster view View of multi-cluster workloads

Projects

Kubernetes apps often span multiple namespaces and resource types, which makes troubleshooting feel like piecing together a puzzle. We’ve added Projects to give you an application-centric view that groups related resources across multiple namespaces – and even clusters. This allows you to reduce sprawl, troubleshoot faster, and collaborate without digging through YAML or cluster-wide lists.

Projects feature View of the new Projects feature

Changes:

  • New “Projects” feature for grouping namespaces into app- or team-centric projects
  • Extensible Projects details view that plugins can customize with their own tabs and actions

Day-to-day ops in Kubernetes often means juggling logs, terminals, YAML, and dashboards across clusters. We redesigned Headlamp’s navigation to treat these as first-class “activities” you can keep open and come back to, instead of one-off views you lose as soon as you click away.

New task bar View of the new task bar

Changes:

  • A new task bar/activities model lets you pin logs, exec sessions, and details as ongoing activities
  • An activity overview with a “Close all” action and cluster information
  • Multi-select and global filters in tables

Thanks to Jan Jansen and Aditya Chaudhary.

Search and map

When something breaks in production, the first two questions are usually “where is it?” and “what is it connected to?” We’ve upgraded both search and the map view so you can get from a high-level symptom to the right set of objects much faster.

Advanced search View of the new Advanced Search feature

Changes:

  • An Advanced search view that supports rich, expression-based queries over Kubernetes objects
  • Improved global search that understands labels and multiple search items, and can even update your current namespace based on what you find
  • EndpointSlice support in the Network section
  • A richer map view that now includes Custom Resources and Gateway API objects

Thanks to Fabian, Alexander North, and Victor Marcolino from Swisscom, and also to Aditya Chaudhary.

OIDC and authentication

We’ve put real work into making OIDC setup clearer and more resilient, especially for in-cluster deployments.

User info View of user information for OIDC clusters

Changes:

  • User information displayed in the top bar for OIDC-authenticated users
  • PKCE support for more secure authentication flows, as well as hardened token refresh handling
  • Documentation for using the access token using -oidc-use-access-token=true
  • Improved support for public OIDC clients like AKS and EKS
  • New guide for setting up Headlamp on AKS with Azure Entra-ID using OAuth2Proxy

Thanks to David Dobmeier and Harsh Srivastava.

App Catalog and Helm

We’ve broadened how you deploy and source apps via Headlamp, specifically supporting vanilla Helm repos.

Changes:

  • A more capable Helm chart with optional backend TLS termination, PodDisruptionBudgets, custom pod labels, and more
  • Improved formatting and added missing access token arg in the Helm chart
  • New in-cluster Helm support with an --enable-helm flag and a service proxy

Thanks to Vrushali Shah and Murali Annamneni from Oracle, and also to Pat Riehecky, Joshua Akers, Rostislav Stříbrný, Rick L,and Victor.

Performance, accessibility, and UX

Finally, we’ve spent a lot of time on the things you notice every day but don’t always make headlines: startup time, list views, log viewers, accessibility, and small network UX details. A continuous accessibility self-audit has also helped us identify key issues and make Headlamp easier for everyone to use.

Learn section View of the Learn section in docs

Changes:

  • Significant desktop improvements, with up to 60% faster app loads and much quicker dev-mode reloads for contributors
  • Numerous table and log viewer refinements: persistent sort order, consistent row actions, copy-name buttons, better tooltips, and more forgiving log inputs
  • Accessibility and localization improvements, including fixes for zoom-related layout issues, better color contrast, improved screen reader support, and expanded language coverage
  • More control over resources, with live pod CPU/memory metrics, richer pod details, and inline editing for secrets and CRD fields
  • A refreshed documentation and plugin onboarding experience, including a “Learn” section and plugin showcase
  • A more complete NetworkPolicy UI and network-related polish
  • Nightly builds available for early testing

Thanks to Jaehan Byun and Jan Jansen.

Plugins and extensibility

Discovering plugins is simpler now – no more hopping between Artifact Hub and assorted GitHub repos. Browse our dedicated Plugins page for a curated catalog of Headlamp-endorsed plugins, along with a showcase of featured plugins.

Plugins page View of the Plugins showcase

Headlamp AI Assistant

Managing Kubernetes often means memorizing commands and juggling tools. Headlamp’s new AI Assistant changes this by adding a natural-language interface built into the UI. Now, instead of typing kubectl or digging through YAML you can ask, “Is my app healthy?” or “Show logs for this deployment,” and get answers in context, speeding up troubleshooting and smoothing onboarding for new users. Learn more about it here.

New plugins additions

Alongside the new AI Assistant, we’ve been growing Headlamp’s plugin ecosystem so you can bring more of your workflows into a single UI, with integrations like Minikube, Karpenter, and more.

Highlights from the latest plugin releases:

  • Minikube plugin, providing a locally stored single node Minikube cluster
  • Karpenter plugin, with support for Azure Node Auto-Provisioning (NAP)
  • KEDA plugin, which you can learn more about here
  • Community-maintained plugins for Gatekeeper and KAITO

Thanks to Vrushali Shah and Murali Annamneni from Oracle, and also to Anirban Singha, Adwait Godbole, Sertaç Özercan, Ernest Wong, and Chloe Lim.

Other plugins updates

Alongside new additions, we’ve also spent time refining plugins that many of you already use, focusing on smoother workflows and better integration with the core UI.

Backstage plugin View of the Backstage plugin

Changes:

  • Flux plugin: Updated for Flux v2.7, with support for newer CRDs, navigation fixes so it works smoothly on recent clusters
  • App Catalog: Now supports Helm repos in addition to Artifact Hub, can run in-cluster via /serviceproxy, and shows both current and latest app versions
  • Plugin Catalog: Improved card layout and accessibility, plus dependency and Storybook test updates
  • Backstage plugin: Dependency and build updates, more info here

Plugin development

We’ve focused on making it faster and clearer to build, test, and ship Headlamp plugins, backed by improved documentation and lighter tooling.

Plugin development View of the Plugin Development guide

Changes:

  • New and expanded guides for plugin architecture and development, including how to publish and ship plugins
  • Added i18n support documentation so plugins can be translated and localized
  • Added example plugins: ui-panels, resource-charts, custom-theme, and projects
  • Improved type checking for Headlamp APIs, restored Storybook support for component testing, and reduced dependencies for faster installs and fewer updates
  • Documented plugin install locations, UI signifiers in Plugin Settings, and labels that differentiated shipped, UI-installed, and dev-mode plugins

Security upgrades

We've also been investing in keeping Headlamp secure – both by tightening how authentication works and by staying on top of upstream vulnerabilities and tooling.

Updates:

  • We've been keeping up with security updates, regularly updating dependencies and addressing upstream security issues.
  • We tightened the Helm chart's default security context and fixed a regression that broke the plugin manager.
  • We've improved OIDC security with PKCE support, helping unblock more secure and standards-compliant OIDC setups when deploying Headlamp in-cluster.

Conclusion

Thank you to everyone who has contributed to Headlamp this year – whether through pull requests, plugins, or simply sharing how you're using the project. Seeing the different ways teams are adopting and extending the project is a big part of what keeps us moving forward. If your organization uses Headlamp, consider adding it to our adopters list.

If you haven't tried Headlamp recently, all these updates are available today. Check out the latest Headlamp release, explore the new views, plugins, and docs, and share your feedback with us on Slack or GitHub – your feedback helps shape where Headlamp goes next.

Categories: CNCF Projects, Kubernetes

Announcing the Checkpoint/Restore Working Group

Kubernetes Blog - Wed, 01/21/2026 - 13:00

The community around Kubernetes includes a number of Special Interest Groups (SIGs) and Working Groups (WGs) facilitating discussions on important topics between interested contributors. Today we would like to announce the new Kubernetes Checkpoint Restore WG focusing on the integration of Checkpoint/Restore functionality into Kubernetes.

Motivation and use cases

There are several high-level scenarios discussed in the working group:

  • Optimizing resource utilization for interactive workloads, such as Jupyter notebooks and AI chatbots
  • Accelerating startup of applications with long initialization times, including Java applications and LLM inference services
  • Using periodic checkpointing to enable fault-tolerance for long-running workloads, such as distributed model training
  • Providing interruption-aware scheduling with transparent checkpoint/restore, allowing lower-priority Pods to be preempted while preserving the runtime state of applications
  • Facilitating Pod migration across nodes for load balancing and maintenance, without disrupting workloads.
  • Enabling forensic checkpointing to investigate and analyze security incidents such as cyberattacks, data breaches, and unauthorized access.

Across these scenarios, the goal is to help facilitate discussions of ideas between the Kubernetes community and the growing Checkpoint/Restore in Userspace (CRIU) ecosystem. The CRIU community includes several projects that support these use cases, including:

  • CRIU - A tool for checkpointing and restoring running applications and containers
  • checkpointctl - A tool for in-depth analysis of container checkpoints
  • criu-coordinator - A tool for coordinated checkpoint/restore of distributed applications with CRIU
  • checkpoint-restore-operator - A Kubernetes operator for managing checkpoints

More information about the checkpoint/restore integration with Kubernetes is also available here.

Following our presentation about transparent checkpointing at KubeCon EU 2025, we are excited to welcome you to our panel discussion and AI + ML session at KubeCon + CloudNativeCon Europe 2026.

Connect with us

If you are interested in contributing to Kubernetes or CRIU, there are several ways to participate:

Categories: CNCF Projects, Kubernetes

Rook v1.19 Storage Enhancements

Rook Blog - Tue, 01/20/2026 - 14:51

The Rook v1.19 release is out! v1.19 is another feature-filled release to improve storage for Kubernetes. Thanks again to the community for all the great support in this journey to deploy storage in production.

The statistics continue to show Rook is widely used in the community, with over 13.3K Github stars, and Slack members and X followers constantly increasing.

If your organization deploys Rook in production, we would love to hear about it. Please see the Adopters page to add your submission. As an upstream project, we don’t track our users, but we appreciate the transparency of those who are deploying Rook!

We have a lot of new features for the Ceph storage provider that we hope you’ll be excited about with the v1.19 release!

NVMe-oF Gateway

NVMe over Fabrics allows RBD volumes to be exposed and accessed via the NVMe/TCP protocol. This enables both Kubernetes pods within the cluster and external clients outside the cluster to connect to Ceph block storage using standard NVMe-oF initiators, providing high-performance block storage access over the network.

NVMe-oF is supported by Ceph starting in the recent Ceph Tentacle release. The initial integration with Rook is now completed, and ready for testing in experimental mode, which means that it is not production ready and only intended for testing. As a large new feature, this will take some time before we declare it stable. Please test out the feature and let us know your feedback!

See the NVMe-oF Configuration Guide to get started.

Ceph CSI 3.16

The v3.16 release of Ceph CSI has a range of features and improvements for the RBD, CephFS, NFS drivers. Similar to v1.18, this release is again supported both by the Ceph CSI operator and Rook’s direct mode of configuration. The Ceph CSI operator is still configured automatically by Rook. We will target v1.20 to fully document the Ceph CSI operator configuration.

In this release, new Ceph CSI features include:

  • NVMe-oF CSI driver for provisioning and mounting volumes over the NVMe over Fabrics protocol
  • Improved fencing for RBD and CephFS volumes during node failure
  • Block volume usage statistics
  • Configurable block encryption cipher

Concurrent Cluster Reconciles

Previous to this release, when multiple Ceph clusters are configured in the same cluster, they each have been reconciled serially by Rook. If one cluster is having health issues, it would block all other subsequent clusters from being reconciled.

To improve the reconcile of multiple clusters, Rook now enables clusters to be reconciled concurrently. Concurrency is enabled by increasing the operator setting ROOK_RECONCILE_CONCURRENT_CLUSTERS (in operator.yaml or the helm setting reconcileConcurrentClusters) to a value greater than 1. If resource requests and limits are set on the operator, they may need to be increased to accommodate the concurrent reconciles.

While this is a relatively small change, to be conservative due to the difficulty of testing the concurrency, we have marked this feature experimental. Please let us know if the concurrency works smoothly for you or report any issues!

When clusters are reconciled concurrently, the rook operator log will contain the logging intermingled between all the clusters in progress. To improve the troubleshooting, we have updated many of the log entries with the namespace and/or cluster name.

Breaking changes

There are a few minor changes to be aware of during upgrades.

CephFS

  • The behavior of the activeStandby property in the CephFilesystem CRD has changed. When set to false, the standby MDS daemon deployment will be scaled down and removed, rather than only disabling the standby cache while the daemon remains running.

Helm

  • The rook-ceph-cluster chart has changed where the Ceph image is defined, to allow separate settings for the repository and tag. See the example values.yaml for the new repository and tag settings. If you were previously specifying the ceph image in the cephClusterSpec, remove it at the time of upgrade while specifying the new properties.

External Clusters

  • In external mode, if you specify a Ceph admin keyring (not the default recommendation), Rook will no longer create CSI Ceph clients automatically. The CSI client keyrings will only be created by the external Python script. This removes the duplication between the Python script and the operator from creating the same users.

Versions

Supported Ceph Versions

Rook v1.19 has removed support for Ceph Reef v18 since it has reached end of life. If you are still running Reef, upgrade at least to Ceph Squid v19 before upgrading to Rook v1.19.

Ceph Squid and Ceph Tentacle are the supported versions with Rook v1.19.

Kubernetes v1.30 — v1.35

Kubernetes v1.30 is now the minimum version supported by Rook through the latest K8s release v1.35. Rook CI runs tests against these versions to ensure there are no issues as Kubernetes is updated. If you still require running an older K8s version, we haven’t done anything to prevent running Rook, we simply just do not have test validation on older versions.

What’s Next?

As we continue the journey to develop reliable storage operators for Kubernetes, we look forward to your ongoing feedback. Only with the community is it possible to continue this fantastic momentum.

There are many different ways to get involved in the Rook project, whether as a user or developer. Please join us in helping the project continue to grow on its way beyond the v1.17 milestone!

Rook v1.19 Storage Enhancements was originally published in Rook Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Categories: CNCF Projects

Uniform API server access using clientcmd

Kubernetes Blog - Mon, 01/19/2026 - 13:00

If you've ever wanted to develop a command line client for a Kubernetes API, especially if you've considered making your client usable as a kubectl plugin, you might have wondered how to make your client feel familiar to users of kubectl. A quick glance at the output of kubectl options might put a damper on that: "Am I really supposed to implement all those options?"

Fear not, others have done a lot of the work involved for you. In fact, the Kubernetes project provides two libraries to help you handle kubectl-style command line arguments in Go programs: clientcmd and cli-runtime (which uses clientcmd). This article will show how to use the former.

General philosophy

As might be expected since it's part of client-go, clientcmd's ultimate purpose is to provide an instance of restclient.Config that can issue requests to an API server.

It follows kubectl semantics:

  • defaults are taken from ~/.kube or equivalent;
  • files can be specified using the KUBECONFIG environment variable;
  • all of the above settings can be further overridden using command line arguments.

It doesn't set up a --kubeconfig command line argument, which you might want to do to align with kubectl; you'll see how to do this in the "Bind the flags" section.

Available features

clientcmd allows programs to handle

  • kubeconfig selection (using KUBECONFIG);
  • context selection;
  • namespace selection;
  • client certificates and private keys;
  • user impersonation;
  • HTTP Basic authentication support (username/password).

Configuration merging

In various scenarios, clientcmd supports merging configuration settings: KUBECONFIG can specify multiple files whose contents are combined. This can be confusing, because settings are merged in different directions depending on how they are implemented. If a setting is defined in a map, the first definition wins, subsequent definitions are ignored. If a setting is not defined in a map, the last definition wins.

When settings are retrieved using KUBECONFIG, missing files result in warnings only. If the user explicitly specifies a path (in --kubeconfig style), there must be a corresponding file.

If KUBECONFIG isn't defined, the default configuration file, ~/.kube/config, is used instead, if present.

Overall process

The general usage pattern is succinctly expressed in the clientcmd package documentation:

loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()
// if you want to change the loading rules (which files in which order), you can do so here

configOverrides := &clientcmd.ConfigOverrides{}
// if you want to change override values or bind them to flags, there are methods to help you

kubeConfig := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, configOverrides)
config, err := kubeConfig.ClientConfig()
if err != nil {
 // Do something
}
client, err := metav1.New(config)
// ...

In the context of this article, there are six steps:

  1. Configure the loading rules.
  2. Configure the overrides.
  3. Build a set of flags.
  4. Bind the flags.
  5. Build the merged configuration.
  6. Obtain an API client.

Configure the loading rules

clientcmd.NewDefaultClientConfigLoadingRules() builds loading rules which will use either the contents of the KUBECONFIG environment variable, or the default configuration file name (~/.kube/config). In addition, if the default configuration file is used, it is able to migrate settings from the (very) old default configuration file (~/.kube/.kubeconfig).

You can build your own ClientConfigLoadingRules, but in most cases the defaults are fine.

Configure the overrides

clientcmd.ConfigOverrides is a struct storing overrides which will be applied over the settings loaded from the configuration derived using the loading rules. In the context of this article, its primary purpose is to store values obtained from command line arguments. These are handled using the pflag library, which is a drop-in replacement for Go's flag package, adding support for double-hyphen arguments with long names.

In most cases there's nothing to set in the overrides; I will only bind them to flags.

Build a set of flags

In this context, a flag is a representation of a command line argument, specifying its long name (such as --namespace), its short name if any (such as -n), its default value, and a description shown in the usage information. Flags are stored in instances of the FlagInfo struct.

Three sets of flags are available, representing the following command line arguments:

  • authentication arguments (certificates, tokens, impersonations, username/password);
  • cluster arguments (API server, certificate authority, TLS configuration, proxy, compression)
  • context arguments (cluster name, kubeconfig user name, namespace)

The recommended selection includes all three with a named context selection argument and a timeout argument.

These are all available using the Recommended…Flags functions. The functions take a prefix, which is prepended to all the argument long names.

So calling clientcmd.RecommendedConfigOverrideFlags("") results in command line arguments such as --context, --namespace, and so on. The --timeout argument is given a default value of 0, and the --namespace argument has a corresponding short variant, -n. Adding a prefix, such as "from-", results in command line arguments such as --from-context, --from-namespace, etc. This might not seem particularly useful on commands involving a single API server, but they come in handy when multiple API servers are involved, such as in multi-cluster scenarios.

There's a potential gotcha here: prefixes don't modify the short name, so --namespace needs some care if multiple prefixes are used: only one of the prefixes can be associated with the -n short name. You'll have to clear the short names associated with the other prefixes' --namespace , or perhaps all prefixes if there's no sensible -n association. Short names can be cleared as follows:

kflags := clientcmd.RecommendedConfigOverrideFlags(prefix)
kflags.ContextOverrideFlags.Namespace.ShortName = ""

In a similar fashion, flags can be disabled entirely by clearing their long name:

kflags.ContextOverrideFlags.Namespace.LongName = ""

Bind the flags

Once a set of flags has been defined, it can be used to bind command line arguments to overrides using clientcmd.BindOverrideFlags. This requires a pflag FlagSet rather than one from Go's flag package.

If you also want to bind --kubeconfig, you should do so now, by binding ExplicitPath in the loading rules:

flags.StringVarP(&loadingRules.ExplicitPath, "kubeconfig", "", "", "absolute path(s) to the kubeconfig file(s)")

Build the merged configuration

Two functions are available to build a merged configuration:

As the names suggest, the difference between the two is that the first can ask for authentication information interactively, using a provided reader, whereas the second only operates on the information given to it by the caller.

The "deferred" mention in these function names refers to the fact that the final configuration will be determined as late as possible. This means that these functions can be called before the command line arguments are parsed, and the resulting configuration will use whatever values have been parsed by the time it's actually constructed.

Obtain an API client

The merged configuration is returned as a ClientConfig instance. An API client can be obtained from that by calling the ClientConfig() method.

If no configuration is given (KUBECONFIG is empty or points to non-existent files, ~/.kube/config doesn't exist, and no configuration is given using command line arguments), the default setup will return an obscure error referring to KUBERNETES_MASTER. This is legacy behaviour; several attempts have been made to get rid of it, but it is preserved for the --local and --dry-run command line arguments in --kubectl. You should check for "empty configuration" errors by calling clientcmd.IsEmptyConfig() and provide a more explicit error message.

The Namespace() method is also useful: it returns the namespace that should be used. It also indicates whether the namespace was overridden by the user (using --namespace).

Full example

Here's a complete example.

package main

import (
 "context"
 "fmt"
 "os"

 "github.com/spf13/pflag"
 v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 "k8s.io/client-go/kubernetes"
 "k8s.io/client-go/tools/clientcmd"
)

func main() {
 // Loading rules, no configuration
 loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()

 // Overrides and flag (command line argument) setup
 configOverrides := &clientcmd.ConfigOverrides{}
 flags := pflag.NewFlagSet("clientcmddemo", pflag.ExitOnError)
 clientcmd.BindOverrideFlags(configOverrides, flags,
 clientcmd.RecommendedConfigOverrideFlags(""))
 flags.StringVarP(&loadingRules.ExplicitPath, "kubeconfig", "", "", "absolute path(s) to the kubeconfig file(s)")
 flags.Parse(os.Args)

 // Client construction
 kubeConfig := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, configOverrides)
 config, err := kubeConfig.ClientConfig()
 if err != nil {
 if clientcmd.IsEmptyConfig(err) {
 panic("Please provide a configuration pointing to the Kubernetes API server")
 }
 panic(err)
 }
 client, err := kubernetes.NewForConfig(config)
 if err != nil {
 panic(err)
 }

 // How to find out what namespace to use
 namespace, overridden, err := kubeConfig.Namespace()
 if err != nil {
 panic(err)
 }
 fmt.Printf("Chosen namespace: %s; overridden: %t\n", namespace, overridden)

 // Let's use the client
 nodeList, err := client.CoreV1().Nodes().List(context.TODO(), v1.ListOptions{})
 if err != nil {
 panic(err)
 }
 for _, node := range nodeList.Items {
 fmt.Println(node.Name)
 }
}

Happy coding, and thank you for your interest in implementing tools with familiar usage patterns!

Categories: CNCF Projects, Kubernetes

CoreDNS-1.14. Release

CoreDNS Blog - Wed, 01/14/2026 - 19:00
This release primarily addresses security vulnerabilities affecting Go versions prior to Go 1.25.6 and Go 1.24.12 (CVE-2025-61728, CVE-2025-61726, CVE-2025-68121, CVE-2025-61731, CVE-2025-68119). It also includes performance improvements to the proxy plugin via multiplexed connections, along with various documentation updates. Brought to You By Alex Massy Shiv Tyagi Ville Vesilehto Yong Tang Noteworthy Changes plugin/proxy: Use mutex-based connection pool (https://github.com/coredns/coredns/pull/7790)
Categories: CNCF Projects

OpenCost: Reflecting on 2025 and looking ahead to 2026

CNCF Blog Projects Category - Mon, 01/12/2026 - 06:28

The OpenCost project has had a fruitful year in terms of releases, our wonderful mentees and contributors, and fun gatherings at KubeCons.

 One of a group of technologists at the OpenCost desk in the Project Pavillion at KubeCon; the second, an image of a crowded auditorium; the third featuring three people talking, one of whom is wearing a green OpenCost sweater.

If you’re new to OpenCost, it is an open-source cost and resources management tool that is an Incubating project in the Cloud Native Computing Foundation (CNCF). It was created by IBM Kubecost and continues to be maintained and supported by IBM Kubecost, Randoli, and a wider community of partners, including the major cloud providers.

OpenCost releases

The OpenCost project had 11 releases in 2025. These include new features and capabilities that improve the experience for both users and contributors. Here are a few highlights:

  • Promless: OpenCost can be configured to run without Prometheus, using environment variables which can be set using helm. Users will be able to run OpenCost using the Collector Datasource (beta) which can be run without Prometheus.
  • OpenCost MCP server: AI agents can now query cost data in real-time using natural language. They can analyze spending patterns across namespaces, pods, and nodes, generate cost reports and recommendations automatically, and provide other insights from OpenCost data.
  • Export system: The project now has a generic export framework to make it possible to export cost data in a type-safe way.
  • Diagnostics system: OpenCost has a complete diagnostic framework with an interface, runners, and export capabilities.
  • Heartbeat system: You can do system health tracking with timestamped heartbeat events for export and more.
  • Cloud providers: There are continued improvements for users to track cloud and multi-cloud metrics. We appreciate contributions from Oracle (including providing hosting for our demo) and DigitalOcean (for recent cloud services provider work).

Thanks to our maintainers and contributors who make these releases possible and successful, including our mentees and community contributors as well.

Mentorship and community management

Our project has been committed to mentorship through the Linux Foundation for a while, and we continue to have fantastic mentees who bring innovation and support to the community. Manas Sivakumar was a summer 2025 mentee and worked on writing Integration tests for OpenCost’s enterprise readiness. Manas’ work is now part of the OpenCost integration testing pipeline for all future contributions.

  • Adesh Pal, a mentee, made a big splash with the OpenCost MCP server. The MCP server now comes by default and needs no configuration. It outputs readable markdown on metrics as well as step-by-step suggestions to make improvements.
  • Sparsh Raj has been in our community for a while and has become our most recent mentee. Sparsh has written a blog post on KubeModel, the foundation of OpenCost’s Data Model 2.0. Sparsh’s work will meet the needs for a robust and scalable data model that can handle Kubernetes complexity and constantly shifting resources.
  • On the community side, Tamao Nakahara was brought into the IBM Kubecost team for a few months of open source and developer experience expertise. Tamao helped organize the regular OpenCost community meetings, leading actions around events, the website, and docs. On the website, Tamao improved the UX for new and returning users, and brought in Ginger Walker to help clean up the docs.

Events and talks

As a CNCF incubating project, OpenCost participated in the key KubeCon events. Most recently, the team was at KubeCon + CloudNativeCon Atlanta 2025, where maintainer Matt Bolt from IBM Kubecost kicked off the week with a Project Lightning talk. During a co-located event that day, Rajith Attapattu, CTO of contributing company Randoli, also gave a talk on OpenCost. Dee Zeis, Rajith, and Tamao also answered questions at the OpenCost kiosk in the Project Pavilion.

Earlier in the year, the team was also at both KubeCon + CloudNativeCon in London and Japan, giving talks and running the OpenCost kiosks.

2026!

What’s in store for OpenCost in the coming year? Aside from meeting all of you at future KubeCon + CloudNativeCon’s, we’re also excited about a few roadmap highlights. As mentioned, our LFX mentee Sparsh is working on KubeModel, which will be important for improvements to OpenCost’s data model. As AI continues to increase in adoption, the team is also working on building out costing features to track AI usage. Finally, supply chain security improvements are a priority.

We’re looking forward to seeing more of you in the community in the next year!

Categories: CNCF Projects

CoreDNS-1.14.0 Release

CoreDNS Blog - Fri, 01/09/2026 - 19:00
This release focuses on security hardening and operational reliability. Core updates introduce a regex length limit to reduce resource-exhaustion risk. Plugin updates improve error consolidation (show_first), reduce misleading SOA warnings, add Kubernetes API rate limiting, enhance metrics with plugin chain tracking, and fix issues in azure and sign. This release also includes additional security fixes; see the security advisory for details. Brought to You By cangming pasteley Raisa Kabir Ross Golder rusttech Syed Azeez Ville Vesilehto Yong Tang
Categories: CNCF Projects

Kubernetes v1.35: Restricting executables invoked by kubeconfigs via exec plugin allowList added to kuberc

Kubernetes Blog - Fri, 01/09/2026 - 13:30

Did you know that kubectl can run arbitrary executables, including shell scripts, with the full privileges of the invoking user, and without your knowledge? Whenever you download or auto-generate a kubeconfig, the users[n].exec.command field can specify an executable to fetch credentials on your behalf. Don't get me wrong, this is an incredible feature that allows you to authenticate to the cluster with external identity providers. Nevertheless, you probably see the problem: Do you know exactly what executables your kubeconfig is running on your system? Do you trust the pipeline that generated your kubeconfig? If there has been a supply-chain attack on the code that generates the kubeconfig, or if the generating pipeline has been compromised, an attacker might well be doing unsavory things to your machine by tricking your kubeconfig into running arbitrary code.

To give the user more control over what gets run on their system, SIG-Auth and SIG-CLI added the credential plugin policy and allowlist as a beta feature to Kubernetes 1.35. This is available to all clients using the client-go library, by filling out the ExecProvider.PluginPolicy struct on a REST config. To broaden the impact of this change, Kubernetes v1.35 also lets you manage this without writing a line of application code. You can configure kubectl to enforce the policy and allowlist by adding two fields to the kuberc configuration file: credentialPluginPolicy and credentialPluginAllowlist. Adding one or both of these fields restricts which credential plugins kubectl is allowed to execute.

How it works

A full description of this functionality is available in our official documentation for kuberc, but this blog post will give a brief overview of the new security knobs. The new features are in beta and available without using any feature gates.

The following example is the simplest one: simply don't specify the new fields.

apiVersion: kubectl.config.k8s.io/v1beta1
kind: Preference

This will keep kubectl acting as it always has, and all plugins will be allowed.

The next example is functionally identical, but it is more explicit and therefore preferred if it's actually what you want:

apiVersion: kubectl.config.k8s.io/v1beta1
kind: Preference
credentialPluginPolicy: AllowAll

If you don't know whether or not you're using exec credential plugins, try setting your policy to DenyAll:

apiVersion: kubectl.config.k8s.io/v1beta1
kind: Preference
credentialPluginPolicy: DenyAll

If you are using credential plugins, you'll quickly find out what kubectl is trying to execute. You'll get an error like the following.

Unable to connect to the server: getting credentials: plugin "cloudco-login" not allowed: policy set to "DenyAll"

If there is insufficient information for you to debug the issue, increase the logging verbosity when you run your next command. For example:

# increase or decrease verbosity if the issue is still unclear
kubectl get pods --verbosity 5

Selectively allowing plugins

What if you need the cloudco-login plugin to do your daily work? That is why there's a third option for your policy, Allowlist. To allow a specific plugin, set the policy and add the credentialPluginAllowlist:

apiVersion: kubectl.config.k8s.io/v1beta1
kind: Preference
credentialPluginPolicy: Allowlist
credentialPluginAllowlist:
 - name: /usr/local/bin/cloudco-login
 - name: get-identity

You'll notice that there are two entries in the allowlist. One of them is specified by full path, and the other, get-identity is just a basename. When you specify just the basename, the full path will be looked up using exec.LookPath, which does not expand globbing or handle wildcards. Globbing is not supported at this time. Both forms (basename and full path) are acceptable, but the full path is preferable because it narrows the scope of allowed binaries even further.

Future enhancements

Currently, an allowlist entry has only one field, name. In the future, we (Kubernetes SIG CLI) want to see other requirements added. One idea that seems useful is checksum verification whereby, for example, a binary would only be allowed to run if it has the sha256 sum b9a3fad00d848ff31960c44ebb5f8b92032dc085020f857c98e32a5d5900ff9c and exists at the path /usr/bin/cloudco-login.

Another possibility is only allowing binaries that have been signed by one of a set of a trusted signing keys.

Get involved

The credential plugin policy is still under development and we are very interested in your feedback. We'd love to hear what you like about it and what problems you'd like to see it solve. Or, if you have the cycles to contribute one of the above enhancements, they'd be a great way to get started contributing to Kubernetes. Feel free to join in the discussion on slack:

Categories: CNCF Projects, Kubernetes

Kubernetes v1.35: Mutable PersistentVolume Node Affinity (alpha)

Kubernetes Blog - Thu, 01/08/2026 - 13:30

The PersistentVolume node affinity API dates back to Kubernetes v1.10. It is widely used to express that volumes may not be equally accessible by all nodes in the cluster. This field was previously immutable, and it is now mutable in Kubernetes v1.35 (alpha). This change opens a door to more flexible online volume management.

Why make node affinity mutable?

This raises an obvious question: why make node affinity mutable now? While stateless workloads like Deployments can be changed freely and the changes will be rolled out automatically by re-creating every Pod, PersistentVolumes (PVs) are stateful and cannot be re-created easily without losing data.

However, Storage providers evolve and storage requirements change. Most notably, multiple providers are offering regional disks now. Some of them even support live migration from zonal to regional disks, without disrupting the workloads. This change can be expressed through the VolumeAttributesClass API, which recently graduated to GA in 1.34. However, even if the volume is migrated to regional storage, Kubernetes still prevents scheduling Pods to other zones because of the node affinity recorded in the PV object. In this case, you may want to change the PV node affinity from:

spec:
 nodeAffinity:
 required:
 nodeSelectorTerms:
 - matchExpressions:
 - key: topology.kubernetes.io/zone
 operator: In
 values:
 - us-east1-b

to:

spec:
 nodeAffinity:
 required:
 nodeSelectorTerms:
 - matchExpressions:
 - key: topology.kubernetes.io/region
 operator: In
 values:
 - us-east1

As another example, providers sometimes offer new generations of disks. New disks cannot always be attached to older nodes in the cluster. This accessibility can also be expressed through PV node affinity and ensures the Pods can be scheduled to the right nodes. But when the disk is upgraded, new Pods using this disk can still be scheduled to older nodes. To prevent this, you may want to change the PV node affinity from:

spec:
 nodeAffinity:
 required:
 nodeSelectorTerms:
 - matchExpressions:
 - key: provider.com/disktype.gen1
 operator: In
 values:
 - available

to:

spec:
 nodeAffinity:
 required:
 nodeSelectorTerms:
 - matchExpressions:
 - key: provider.com/disktype.gen2
 operator: In
 values:
 - available

So, it is mutable now, a first step towards a more flexible online volume management. While it is a simple change that removes one validation from the API server, we still have a long way to go to integrate well with the Kubernetes ecosystem.

Try it out

This feature is for you if you are a Kubernetes cluster administrator, and your storage provider allows online update that you want to utilize, but those updates can affect the accessibility of the volume.

Note that changing PV node affinity alone will not actually change the accessibility of the underlying volume. Before using this feature, you must first update the underlying volume in the storage provider, and understand which nodes can access the volume after the update. You can then enable this feature and keep the PV node affinity in sync.

Currently, this feature is in alpha state. It is disabled by default, and may subject to change. To try it out, enable the MutablePVNodeAffinity feature gate on APIServer, then you can edit the PV spec.nodeAffinity field. Typically only administrators can edit PVs, please make sure you have the right RBAC permissions.

Race condition between updating and scheduling

There are only a few factors outside of a Pod that can affect the scheduling decision, and PV node affinity is one of them. It is fine to allow more nodes to access the volume by relaxing node affinity, but there is a race condition when you try to tighten node affinity: it is unclear how the Scheduler will see the modified PV in its cache, so there is a small window where the scheduler may place a Pod on an old node that can no longer access the volume. In this case, the Pod will stuck at ContainerCreating state.

One mitigation currently under discussion is for the kubelet to fail Pod startup if the PersistentVolume’s node affinity is violated. This has not landed yet. So if you are trying this out now, please watch subsequent Pods that use the updated PV, and make sure they are scheduled onto nodes that can access the volume. If you update PV and immediately start new Pods in a script, it may not work as intended.

Future integration with CSI (Container Storage Interface)

Currently, it is up to the cluster administrator to modify both PV's node affinity and the underlying volume in the storage provider. But manual operations are error-prone and time-consuming. It is preferred to eventually integrate this with VolumeAttributesClass, so that an unprivileged user can modify their PersistentVolumeClaim (PVC) to trigger storage-side updates, and PV node affinity is updated automatically when appropriate, without the need for cluster admin's intervention.

We welcome your feedback from users and storage driver developers

As noted earlier, this is only a first step.

If you are a Kubernetes user, we would like to learn how you use (or will use) PV node affinity. Is it beneficial to update it online in your case?

If you are a CSI driver developer, would you be willing to implement this feature? How would you like the API to look?

Please provide your feedback via:

For any inquiries or specific questions related to this feature, please reach out to the SIG Storage community.

Categories: CNCF Projects, Kubernetes

Kubernetes v1.35: A Better Way to Pass Service Account Tokens to CSI Drivers

Kubernetes Blog - Wed, 01/07/2026 - 13:30

If you maintain a CSI driver that uses service account tokens, Kubernetes v1.35 brings a refinement you'll want to know about. Since the introduction of the TokenRequests feature, service account tokens requested by CSI drivers have been passed to them through the volume_context field. While this has worked, it's not the ideal place for sensitive information, and we've seen instances where tokens were accidentally logged in CSI drivers.

Kubernetes v1.35 introduces a beta solution to address this: CSI Driver Opt-in for Service Account Tokens via Secrets Field. This allows CSI drivers to receive service account tokens through the secrets field in NodePublishVolumeRequest, which is the appropriate place for sensitive data in the CSI specification.

Understanding the existing approach

When CSI drivers use the TokenRequests feature, they can request service account tokens for workload identity by configuring the TokenRequests field in the CSIDriver spec. These tokens are passed to drivers as part of the volume attributes map, using the key csi.storage.k8s.io/serviceAccount.tokens.

The volume_context field works, but it's not designed for sensitive data. Because of this, there are a few challenges:

First, the protosanitizer tool that CSI drivers use doesn't treat volume context as sensitive, so service account tokens can end up in logs when gRPC requests are logged. This happened with CVE-2023-2878 in the Secrets Store CSI Driver and CVE-2024-3744 in the Azure File CSI Driver.

Second, each CSI driver that wants to avoid this issue needs to implement its own sanitization logic, which leads to inconsistency across drivers.

The CSI specification already has a secrets field in NodePublishVolumeRequest that's designed exactly for this kind of sensitive information. The challenge is that we can't just change where we put the tokens without breaking existing CSI drivers that expect them in volume context.

How the opt-in mechanism works

Kubernetes v1.35 introduces an opt-in mechanism that lets CSI drivers choose how they receive service account tokens. This way, existing drivers continue working as they do today, and drivers can move to the more appropriate secrets field when they're ready.

CSI drivers can set a new field in their CSIDriver spec:

#
# CAUTION: this is an example configuration.
# Do not use this for your own cluster!
#
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
 name: example-csi-driver
spec:
 # ... existing fields ...
 tokenRequests:
 - audience: "example.com"
 expirationSeconds: 3600
 # New field for opting into secrets delivery
 serviceAccountTokenInSecrets: true # defaults to false

The behavior depends on the serviceAccountTokenInSecrets field:

When set to false (the default), tokens are placed in VolumeContext with the key csi.storage.k8s.io/serviceAccount.tokens, just like today. When set to true, tokens are placed only in the Secrets field with the same key.

About the beta release

The CSIServiceAccountTokenSecrets feature gate is enabled by default on both kubelet and kube-apiserver. Since the serviceAccountTokenInSecrets field defaults to false, enabling the feature gate doesn't change any existing behavior. All drivers continue receiving tokens via volume context unless they explicitly opt in. This is why we felt comfortable starting at beta rather than alpha.

Guide for CSI driver authors

If you maintain a CSI driver that uses service account tokens, here's how to adopt this feature.

Adding fallback logic

First, update your driver code to check both locations for tokens. This makes your driver compatible with both the old and new approaches:

const serviceAccountTokenKey = "csi.storage.k8s.io/serviceAccount.tokens"

func getServiceAccountTokens(req *csi.NodePublishVolumeRequest) (string, error) {
 // Check secrets field first (new behavior when driver opts in)
 if tokens, ok := req.Secrets[serviceAccountTokenKey]; ok {
 return tokens, nil
 }

 // Fall back to volume context (existing behavior)
 if tokens, ok := req.VolumeContext[serviceAccountTokenKey]; ok {
 return tokens, nil
 }

 return "", fmt.Errorf("service account tokens not found")
}

This fallback logic is backward compatible and safe to ship in any driver version, even before clusters upgrade to v1.35.

Rollout sequence

CSI driver authors need to follow a specific sequence when adopting this feature to avoid breaking existing volumes.

Driver preparation (can happen anytime)

You can start preparing your driver right away by adding fallback logic that checks both the secrets field and volume context for tokens. This code change is backward compatible and safe to ship in any driver version, even before clusters upgrade to v1.35. We encourage you to add this fallback logic early, cut releases, and even backport to maintenance branches where feasible.

Cluster upgrade and feature enablement

Once your driver has the fallback logic deployed, here's the safe rollout order for enabling the feature in a cluster:

  1. Complete the kube-apiserver upgrade to 1.35 or later
  2. Complete kubelet upgrade to 1.35 or later on all nodes
  3. Ensure CSI driver version with fallback logic is deployed (if not already done in preparation phase)
  4. Fully complete CSI driver DaemonSet rollout across all nodes
  5. Update your CSIDriver manifest to set serviceAccountTokenInSecrets: true

Important constraints

The most important thing to remember is timing. If your CSI driver DaemonSet and CSIDriver object are in the same manifest or Helm chart, you need two separate updates. Deploy the new driver version with fallback logic first, wait for the DaemonSet rollout to complete, then update the CSIDriver spec to set serviceAccountTokenInSecrets: true.

Also, don't update the CSIDriver before all driver pods have rolled out. If you do, volume mounts will fail on nodes still running the old driver version, since those pods only check volume context.

Why this matters

Adopting this feature helps in a few ways:

  • It eliminates the risk of accidentally logging service account tokens as part of volume context in gRPC requests
  • It uses the CSI specification's designated field for sensitive data, which feels right
  • The protosanitizer tool automatically handles the secrets field correctly, so you don't need driver-specific workarounds
  • It's opt-in, so you can migrate at your own pace without breaking existing deployments

Call to action

We (Kubernetes SIG Storage) encourage CSI driver authors to adopt this feature and provide feedback on the migration experience. If you have thoughts on the API design or run into any issues during adoption, please reach out to us on the #csi channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/).

You can follow along on KEP-5538 to track progress across the coming Kubernetes releases.

Categories: CNCF Projects, Kubernetes

HolmesGPT: Agentic troubleshooting built for the cloud native era

CNCF Blog Projects Category - Wed, 01/07/2026 - 07:00

If you’ve ever debugged a production incident, you know that the hardest part often isn’t the fix, it’s finding where to begin. Most on-call engineers end up spending hours piecing together clues, fighting time pressure, and trying to make sense of scattered data. You’ve probably run into one or more of these challenges: 

  • Unwritten knowledge and missing context:
    You’re pulled into an outage for a service you barely know. The original owners have changed teams, the documentation is half-written, and the “runbook” is either stale or missing altogether. You spend the first 30 minutes trying to find someone who’s seen this issue before — and if you’re unlucky, this incident is a new one. 
  • Tool overload and context switching:
    Your screen looks like an air traffic control dashboard. You’re running monitoring queries, flipping between Grafana and Application Insights, checking container logs, and scrolling through traces — all while someone’s asking for an ETA in the incident channel. Correlating data across tools is manual, slow, and mentally exhausting. 
  • Overwhelming complexity and knowledge gaps:
    Modern cloud-native systems like Kubernetes are powerful, but they’ve made troubleshooting far more complex. Every layer — nodes, pods, controllers, APIs, networking, autoscalers – introduces its own failure modes. To diagnose effectively, you need deep expertise across multiple domains, something even seasoned engineers can’t always keep up with. 

The challenges require a solution that can look across signals, recall patterns from past incidents, and guide you toward the most likely cause. 

This is where HolmesGPT, a CNCF Sandbox project, could help. 

 
HolmesGPT was accepted as a CNCF Sandbox project in October 2025. It’s built to simplify the chaos of production debugging – bringing together logs, metrics, and traces from different sources, reasoning over them, and surfacing clear, data-backed insights in plain language. 

What is HolmesGPT?

HolmesGPT is an open-source AI troubleshooting agent built for Kubernetes and cloud-native environments. It combines observability telemetry, LLM reasoning, and structured runbooks to accelerate root cause analysis and suggest next actions. 

Unlike static dashboards or chatbots, HolmesGPT is agentic: it actively decides what data to fetch, runs targeted queries, and iteratively refines its hypotheses – all while staying within your environment. 

Key benefits:

  • AI-native control loop: HolmesGPT uses an agentic task list approach  
  • Open architecture: Every integration and toolset is open and extensible, works with existing runbooks and MCP servers 
  • Data privacy: Models can run locally or inside your cluster or on the cloud  
  • Community-driven: Designed around CNCF principles of openness, interoperability, and transparency. 

How it works 

When you run:

holmes ask “Why is my pod in crash loop back off state” 

HolmesGPT: 

  1. Understands intent → it recognizes you want to diagnose a pod restart issue 
  2. Creates a task list → breaks down the problem into smaller chunks and executes each of them separately  
  3. Queries data sources → runs Prometheus queries, collects Kubernetes events or logs, inspects pod specs including which pod 
  4. Correlates context → detects that a recent deployment updated the image   
  5. Explains and suggests fixes → returns a natural language diagnosis and remediation steps. 

Here’s a simplified overview of the architecture:

HolmesGPT architecture

Extensible by design 

HolmesGPT’s architecture allows contributors to add new components: 

  • Toolsets: Build custom commands for internal observability pipelines or expose existing tools through a Model Context Protocol (MCP) server.
  • Evals: Add custom evals to benchmark performance, cost , latency of models 
  • Runbooks: Codify best practices (e.g., “diagnose DNS failures” or “debug PVC provisioning”). 

Example of a simple custom tool: 

holmes:
  toolsets:
    kubernetes/pod_status:
      description: "Check the status of a Kubernetes pod."
      tools:
        - name: "get_pod"
          description: "Fetch pod details from a namespace."
          command: "kubectl get pod {{ pod }} -n {{ namespace }}"

Getting started

  1. Install Holmesgpt 

There are 4-5 ways to install Holmesgpt, one of the easiest ways to get started is through pip

brew tap robusta-dev/homebrew-holmesgpt
brew install holmesgpt

The detailed installation guide has instructions for helm, CLI and the UI. 

  1. Setup the LLM (Any Open AI compatible LLM) by setting the API Key  

In most cases, this means setting the appropriate environment variable based on the LLM provider.

  1. Run it locally 
holmes ask "what is wrong with the user-profile-import pod?" --model="anthropic/claude-sonnet-4-5" 

        

  1. Explore other features  

How to get involved 

HolmesGPT is entirely community-driven and welcomes all forms of contribution: 

Area How you can help Integrations Add new toolsets for your observability tools or CI/CD pipelines. Runbooks Encode operational expertise for others to reuse. Evaluation Help build benchmarks for AI reasoning accuracy and observability insights. Docs and tutorials Improve onboarding, create demos, or contribute walkthroughs. Community Join discussions around governance and CNCF Sandbox progression. 

All contributions follow the CNCF Code of Conduct

Further Resources 

Categories: CNCF Projects

Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)

Kubernetes Blog - Mon, 01/05/2026 - 13:30

Many production Kubernetes clusters blend on-demand (higher-SLA) and spot/preemptible (lower-SLA) nodes to optimize costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt-in with explicit thresholds like "I can tolerate nodes with failure probability up to 5%".

Today, Kubernetes taints and tolerations can match exact values or check for existence, but they can't compare numeric thresholds. You'd need to create discrete taint categories, use external admission controllers, or accept less-than-optimal placement decisions.

In Kubernetes v1.35, we're introducing Extended Toleration Operators as an alpha feature. This enhancement adds Gt (Greater Than) and Lt (Less Than) operators to spec.tolerations, enabling threshold-based scheduling decisions that unlock new possibilities for SLA-based placement, cost optimization, and performance-aware workload distribution.

The evolution of tolerations

Historically, Kubernetes supported two primary toleration operators:

  • Equal: The toleration matches a taint if the key and value are exactly equal
  • Exists: The toleration matches a taint if the key exists, regardless of value

While these worked well for categorical scenarios, they fell short for numeric comparisons. Starting with v1.35, we are closing this gap.

Consider these real-world scenarios:

  • SLA requirements: Schedule high-availability workloads only on nodes with failure probability below a certain threshold
  • Cost optimization: Allow cost-sensitive batch jobs to run on cheaper nodes that exceed a specific cost-per-hour value
  • Performance guarantees: Ensure latency-sensitive applications run only on nodes with disk IOPS or network bandwidth above minimum thresholds

Without numeric comparison operators, cluster operators have had to resort to workarounds like creating multiple discrete taint values or using external admission controllers, neither of which scale well or provide the flexibility needed for dynamic threshold-based scheduling.

Why extend tolerations instead of using NodeAffinity?

You might wonder: NodeAffinity already supports numeric comparison operators, so why extend tolerations? While NodeAffinity is powerful for expressing pod preferences, taints and tolerations provide critical operational benefits:

  • Policy orientation: NodeAffinity is per-pod, requiring every workload to explicitly opt-out of risky nodes. Taints invert control—nodes declare their risk level, and only pods with matching tolerations may land there. This provides a safer default; most pods stay away from spot/preemptible nodes unless they explicitly opt-in.
  • Eviction semantics: NodeAffinity has no eviction capability. Taints support the NoExecute effect with tolerationSeconds, enabling operators to drain and evict pods when a node's SLA degrades or spot instances receive termination notices.
  • Operational ergonomics: Centralized, node-side policy is consistent with other safety taints like disk-pressure and memory-pressure, making cluster management more intuitive.

This enhancement preserves the well-understood safety model of taints and tolerations while enabling threshold-based placement for SLA-aware scheduling.

Introducing Gt and Lt operators

Kubernetes v1.35 introduces two new operators for tolerations:

  • Gt (Greater Than): The toleration matches if the taint's numeric value is less than the toleration's value
  • Lt (Less Than): The toleration matches if the taint's numeric value is greater than the toleration's value

When a pod tolerates a taint with Lt, it's saying "I can tolerate nodes where this metric is less than my threshold". Since tolerations allow scheduling, the pod can run on nodes where the taint value is greater than the toleration value. Think of it as: "I tolerate nodes that are above my minimum requirements".

These operators work with numeric taint values and enable the scheduler to make sophisticated placement decisions based on continuous metrics rather than discrete categories.

Note:

Numeric values for Gt and Lt operators must be positive 64-bit integers without leading zeros. For example, "100" is valid, but "0100" (with leading zero) and "0" (zero value) are not permitted.

The Gt and Lt operators work with all taint effects: NoSchedule, NoExecute, and PreferNoSchedule.

Use cases and examples

Let's explore how Extended Toleration Operators solve real-world scheduling challenges.

Example 1: Spot instance protection with SLA thresholds

Many clusters mix on-demand and spot/preemptible nodes to optimize costs. Spot nodes offer significant savings but have higher failure rates. You want most workloads to avoid spot nodes by default, while allowing specific workloads to opt-in with clear SLA boundaries.

First, taint spot nodes with their failure probability (for example, 15% annual failure rate):

apiVersion: v1
kind: Node
metadata:
 name: spot-node-1
spec:
 taints:
 - key: "failure-probability"
 value: "15"
 effect: "NoExecute"

On-demand nodes have much lower failure rates:

apiVersion: v1
kind: Node
metadata:
 name: ondemand-node-1
spec:
 taints:
 - key: "failure-probability"
 value: "2"
 effect: "NoExecute"

Critical workloads can specify strict SLA requirements:

apiVersion: v1
kind: Pod
metadata:
 name: payment-processor
spec:
 tolerations:
 - key: "failure-probability"
 operator: "Lt"
 value: "5"
 effect: "NoExecute"
 tolerationSeconds: 30
 containers:
 - name: app
 image: payment-app:v1

This pod will only schedule on nodes with failure-probability less than 5 (meaning ondemand-node-1 with 2% but not spot-node-1 with 15%). The NoExecute effect with tolerationSeconds: 30 means if a node's SLA degrades (for example, cloud provider changes the taint value), the pod gets 30 seconds to gracefully terminate before forced eviction.

Meanwhile, a fault-tolerant batch job can explicitly opt-in to spot instances:

apiVersion: v1
kind: Pod
metadata:
 name: batch-job
spec:
 tolerations:
 - key: "failure-probability"
 operator: "Lt"
 value: "20"
 effect: "NoExecute"
 containers:
 - name: worker
 image: batch-worker:v1

This batch job tolerates nodes with failure probability up to 20%, so it can run on both on-demand and spot nodes, maximizing cost savings while accepting higher risk.

Example 2: AI workload placement with GPU tiers

AI and machine learning workloads often have specific hardware requirements. With Extended Toleration Operators, you can create GPU node tiers and ensure workloads land on appropriately powered hardware.

Taint GPU nodes with their compute capability score:

apiVersion: v1
kind: Node
metadata:
 name: gpu-node-a100
spec:
 taints:
 - key: "gpu-compute-score"
 value: "1000"
 effect: "NoSchedule"
---
apiVersion: v1
kind: Node
metadata:
 name: gpu-node-t4
spec:
 taints:
 - key: "gpu-compute-score"
 value: "500"
 effect: "NoSchedule"

A heavy training workload can require high-performance GPUs:

apiVersion: v1
kind: Pod
metadata:
 name: model-training
spec:
 tolerations:
 - key: "gpu-compute-score"
 operator: "Gt"
 value: "800"
 effect: "NoSchedule"
 containers:
 - name: trainer
 image: ml-trainer:v1
 resources:
 limits:
 nvidia.com/gpu: 1

This ensures the training pod only schedules on nodes with compute scores greater than 800 (like the A100 node), preventing placement on lower-tier GPUs that would slow down training.

Meanwhile, inference workloads with less demanding requirements can use any available GPU:

apiVersion: v1
kind: Pod
metadata:
 name: model-inference
spec:
 tolerations:
 - key: "gpu-compute-score"
 operator: "Gt"
 value: "400"
 effect: "NoSchedule"
 containers:
 - name: inference
 image: ml-inference:v1
 resources:
 limits:
 nvidia.com/gpu: 1

Example 3: Cost-optimized workload placement

For batch processing or non-critical workloads, you might want to minimize costs by running on cheaper nodes, even if they have lower performance characteristics.

Nodes can be tainted with their cost rating:

spec:
 taints:
 - key: "cost-per-hour"
 value: "50"
 effect: "NoSchedule"

A cost-sensitive batch job can express its tolerance for expensive nodes:

tolerations:
- key: "cost-per-hour"
 operator: "Lt"
 value: "100"
 effect: "NoSchedule"

This batch job will schedule on nodes costing less than $100/hour but avoid more expensive nodes. Combined with Kubernetes scheduling priorities, this enables sophisticated cost-tiering strategies where critical workloads get premium nodes while batch workloads efficiently use budget-friendly resources.

Example 4: Performance-based placement

Storage-intensive applications often require minimum disk performance guarantees. With Extended Toleration Operators, you can enforce these requirements at the scheduling level.

tolerations:
- key: "disk-iops"
 operator: "Gt"
 value: "3000"
 effect: "NoSchedule"

This toleration ensures the pod only schedules on nodes where disk-iops exceeds 3000. The Gt operator means "I need nodes that are greater than this minimum".

How to use this feature

Extended Toleration Operators is an alpha feature in Kubernetes v1.35. To try it out:

  1. Enable the feature gate on both your API server and scheduler:

    --feature-gates=TaintTolerationComparisonOperators=true
    
  2. Taint your nodes with numeric values representing the metrics relevant to your scheduling needs:

     kubectl taint nodes node-1 failure-probability=5:NoSchedule
     kubectl taint nodes node-2 disk-iops=5000:NoSchedule
    
  3. Use the new operators in your pod specifications:

     spec:
     tolerations:
     - key: "failure-probability"
     operator: "Lt"
     value: "1"
     effect: "NoSchedule"
    

Note:

As an alpha feature, Extended Toleration Operators may change in future releases and should be used with caution in production environments. Always test thoroughly in non-production clusters first.

What's next?

This alpha release is just the beginning. As we gather feedback from the community, we plan to:

  • Add support for CEL (Common Expression Language) expressions in tolerations and node affinity for even more flexible scheduling logic, including semantic versioning comparisons
  • Improve integration with cluster autoscaling for threshold-aware capacity planning
  • Graduate the feature to beta and eventually GA with production-ready stability

We're particularly interested in hearing about your use cases! Do you have scenarios where threshold-based scheduling would solve problems? Are there additional operators or capabilities you'd like to see?

Getting involved

This feature is driven by the SIG Scheduling community. Please join us to connect with the community and share your ideas and feedback around this feature and beyond.

You can reach the maintainers of this feature at:

For questions or specific inquiries related to Extended Toleration Operators, please reach out to the SIG Scheduling community. We look forward to hearing from you!

How can I learn more?

Categories: CNCF Projects, Kubernetes

Kubernetes v1.35: New level of efficiency with in-place Pod restart

Kubernetes Blog - Fri, 01/02/2026 - 13:30

The release of Kubernetes 1.35 introduces a powerful new feature that provides a much-requested capability: the ability to trigger a full, in-place restart of the Pod. This feature, Restart All Containers (alpha in 1.35), allows for an efficient way to reset a Pod's state compared to resource-intensive approach of deleting and recreating the entire Pod. This feature is especially useful for AI/ML workloads allowing application developers to concentrate on their core training logic while offloading complex failure-handling and recovery mechanisms to sidecars and declarative Kubernetes configuration. With RestartAllContainers and other planned enhancements, Kubernetes continues to add building blocks for creating the most flexible, robust, and efficient platforms for AI/ML workloads.

This new functionality is available by enabling the RestartAllContainersOnContainerExits feature gate. This alpha feature extends the Container Restart Rules feature, which graduated to beta in Kubernetes 1.35.

The problem: when a single container restart isn't enough and recreating pods is too costly

Kubernetes has long supported restart policies at the Pod level (restartPolicy) and, more recently, at the individual container level. These policies are great for handling crashes in a single, isolated process. However, many modern applications have more complex inter-container dependencies. For instance:

  • An init container prepares the environment by mounting a volume or generating a configuration file. If the main application container corrupts this environment, simply restarting that one container is not enough. The entire initialization process needs to run again.
  • A watcher sidecar monitors system health. If it detects an unrecoverable but retriable error state, it must trigger a restart of the main application container from a clean slate.
  • A sidecar that manages a remote resource fails. Even if the sidecar restarts on its own, the main container may be stuck trying to access an outdated or broken connection.

In all these cases, the desired action is not to restart a single container, but all of them. Previously, the only way to achieve this was to delete the Pod and have a controller (like a Job or ReplicaSet) create a new one. This process is slow and expensive, involving the scheduler, node resource allocation and re-initialization of networking and storage.

This inefficiency becomes even worse when handling large-scale AI/ML workloads (>= 1,000 Nodes with one Pod per Node). A common requirement for these synchronous workloads is that when a failure occurs (such as a Node crash), all Pods in the fleet must be recreated to reset the state before training can resume, even if all the other Pods were not directly affected by the failure. Deleting, creating and scheduling thousands of Pods simultaneously creates a massive bottleneck. The estimated overhead of this failure could cost $100,000 per month in wasted resources.

Handling these failures for AI/ML training jobs requires a complex integration touching both the training framework and Kubernetes, which are often fragile and toilsome. This feature introduces a Kubernetes-native solution, improving system robustness and allowing application developers to concentrate on their core training logic.

Another major benefit of restarting Pods in place is that keeping Pods on their assigned Nodes allows for further optimizations. For example, one can implement node-level caching tied to a specific Pod identity, something that is impossible when Pods are unnecessarily being recreated on different Nodes.

Introducing the RestartAllContainers action

To address this, Kubernetes v1.35 adds a new action to the container restart rules: RestartAllContainers. When a container exits in a way that matches a rule with this action, the kubelet initiates a fast, in-place restart of the Pod.

This in-place restart is highly efficient because it preserves the Pod's most important resources:

  • The Pod's UID, IP address and network namespace.
  • The Pod's sandbox and any attached devices.
  • All volumes, including emptyDir and mounted volumes from PVCs.

After terminating all running containers, the Pod's startup sequence is re-executed from the very beginning. This means all init containers are run again in order, followed by the sidecar and regular containers, ensuring a completely fresh start in a known-good environment. With the exception of ephemeral containers (which are terminated), all other containers—including those that previously succeeded or failed—will be restarted, regardless of their individual restart policies.

Use cases

1. Efficient restarts for ML/Batch jobs

For ML training jobs, rescheduling a worker Pod on failure is a costly operation that wastes valuable compute resources. On a 1,000-node training cluster, rescheduling overhead can waste over $100,000 in compute resources monthly.

With RestartAllContainers actions you can address this by enabling a much faster, hybrid recovery strategy: recreate only the "bad" Pods (e.g., those on unhealthy Nodes) while triggering RestartAllContainers for the remaining healthy Pods. Benchmarks show this reduces the recovery overhead from minutes to a few seconds.

With in-place restarts, a watcher sidecar can monitor the main training process. If it encounters a specific, retriable error, the watcher can exit with a designated code to trigger a fast reset of the worker Pod, allowing it to restart from the last checkpoint without involving the Job controller. This capability is now natively supported by Kubernetes.

Read more details about future development and JobSet features at KEP-467 JobSet in-place restart.

apiVersion: v1
kind: Pod
metadata:
 name: ml-worker-pod
spec:
 restartPolicy: Never
 initContainers:
 # This init container will re-run on every in-place restart
 - name: setup-environment
 image: my-repo/setup-worker:1.0
 - name: watcher-sidecar
 image: my-repo/watcher:1.0
 restartPolicy: Always
 restartPolicyRules:
 - action: RestartAllContainers
 onExit:
 exitCodes:
 operator: In
 # A specific exit code from the watcher triggers a full pod restart
 values: [88]
 containers:
 - name: main-application
 image: my-repo/training-app:1.0

2. Re-running init containers for a clean state

Imagine a scenario where an init container is responsible for fetching credentials or setting up a shared volume. If the main application fails in a way that corrupts this shared state, you need the init container to rerun.

By configuring the main application to exit with a specific code upon detecting such a corruption, you can trigger the RestartAllContainers action, guaranteeing that the init container provides a clean setup before the application restarts.

3. Handling high rate of similar tasks execution

There are cases when tasks are best represented as a Pod execution. And each task requires a clean execution. The task may be a game session backend or some queue item processing. If the rate of tasks is high, running the whole cycle of Pod creation, scheduling and initialization is simply too expensive, especially when tasks can be short. The ability to restart all containers from scratch enables a Kubernetes-native way to handle this scenario without custom solutions or frameworks.

How to use it

To try this feature, you must enable the RestartAllContainersOnContainerExits feature gate on your Kubernetes cluster components (API server and kubelet) running Kubernetes v1.35+. This alpha feature extends the ContainerRestartRules feature, which graduated to beta in v1.35 and is enabled by default.

Once enabled, you can add restartPolicyRules to any container (init, sidecar, or regular) and use the RestartAllContainers action.

The feature is designed to be easily usable on existing apps. However, if an application does not follow some best practices, it may cause issues for the application or for observability tooling. When enabling the feature, make sure that all containers are reentrant and that external tooling is prepared for init containers to re-run. Also, when restarting all containers, the kubelet does not run preStop hooks. This means containers must be designed to handle abrupt termination without relying on preStop hooks for graceful shutdown.

Observing the restart

To make this process observable, a new Pod condition, AllContainersRestarting, is added to the Pod's status. When a restart is triggered, this condition becomes True and it reverts to False once all containers have terminated and the Pod is ready to start its lifecycle anew. This provides a clear signal to users and other cluster components about the Pod's state.

All containers restarted by this action will have their restart count incremented in the container status.

Learn more

We want your feedback!

As an alpha feature, RestartAllContainers is ready for you to experiment with and any use cases and feedback are welcome. This feature is driven by the SIG Node community. If you are interested in getting involved, sharing your thoughts, or contributing, please join us!

You can reach SIG Node through:

Categories: CNCF Projects, Kubernetes

Kubernetes 1.35: Enhanced Debugging with Versioned z-pages APIs

Kubernetes Blog - Wed, 12/31/2025 - 13:30

Debugging Kubernetes control plane components can be challenging, especially when you need to quickly understand the runtime state of a component or verify its configuration. With Kubernetes 1.35, we're enhancing the z-pages debugging endpoints with structured, machine-parseable responses that make it easier to build tooling and automate troubleshooting workflows.

What are z-pages?

z-pages are special debugging endpoints exposed by Kubernetes control plane components. Introduced as an alpha feature in Kubernetes 1.32, these endpoints provide runtime diagnostics for components like kube-apiserver, kube-controller-manager, kube-scheduler, kubelet and kube-proxy. The name "z-pages" comes from the convention of using /*z paths for debugging endpoints.

Currently, Kubernetes supports two primary z-page endpoints:

/statusz
Displays high-level component information including version information, start time, uptime, and available debug paths
/flagz
Shows all command-line arguments and their values used to start the component (with confidential values redacted for security)

These endpoints are valuable for human operators who need to quickly inspect component state, but until now, they only returned plain text output that was difficult to parse programmatically.

What's new in Kubernetes 1.35?

Kubernetes 1.35 introduces structured, versioned responses for both /statusz and /flagz endpoints. This enhancement maintains backward compatibility with the existing plain text format while adding support for machine-readable JSON responses.

Backward compatible design

The new structured responses are opt-in. Without specifying an Accept header, the endpoints continue to return the familiar plain text format:

$ curl --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
--key /etc/kubernetes/pki/apiserver-kubelet-client.key \
--cacert /etc/kubernetes/pki/ca.crt \
https://localhost:6443/statusz
kube-apiserver statusz
Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only.
Started: Wed Oct 16 21:03:43 UTC 2024
Up: 0 hr 00 min 16 sec
Go version: go1.23.2
Binary version: 1.35.0-alpha.0.1595
Emulation version: 1.35
Paths: /healthz /livez /metrics /readyz /statusz /version

Structured JSON responses

To receive a structured response, include the appropriate Accept header:

Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz

This returns a versioned JSON response:

{
 "kind": "Statusz",
 "apiVersion": "config.k8s.io/v1alpha1",
 "metadata": {
 "name": "kube-apiserver"
 },
 "startTime": "2025-10-29T00:30:01Z",
 "uptimeSeconds": 856,
 "goVersion": "go1.23.2",
 "binaryVersion": "1.35.0",
 "emulationVersion": "1.35",
 "paths": [
 "/healthz",
 "/livez",
 "/metrics",
 "/readyz",
 "/statusz",
 "/version"
 ]
}

Similarly, /flagz supports structured responses with the header:

Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz

Example response:

{
 "kind": "Flagz",
 "apiVersion": "config.k8s.io/v1alpha1",
 "metadata": {
 "name": "kube-apiserver"
 },
 "flags": {
 "advertise-address": "192.168.8.4",
 "allow-privileged": "true",
 "authorization-mode": "[Node,RBAC]",
 "enable-priority-and-fairness": "true",
 "profiling": "true"
 }
}

Why structured responses matter

The addition of structured responses opens up several new possibilities:

1. Automated health checks and monitoring

Instead of parsing plain text, monitoring tools can now easily extract specific fields. For example, you can programmatically check if a component has been running with an unexpected emulated version or verify that critical flags are set correctly.

2. Better debugging tools

Developers can build sophisticated debugging tools that compare configurations across multiple components or track configuration drift over time. The structured format makes it trivial to diff configurations or validate that components are running with expected settings.

3. API versioning and stability

By introducing versioned APIs (starting with v1alpha1), we provide a clear path to stability. As the feature matures, we'll introduce v1beta1 and eventually v1, giving you confidence that your tooling won't break with future Kubernetes releases.

How to use structured z-pages

Prerequisites

Both endpoints require feature gates to be enabled:

  • /statusz: Enable the ComponentStatusz feature gate
  • /flagz: Enable the ComponentFlagz feature gate

Example: Getting structured responses

Here's an example using curl to retrieve structured JSON responses from the kube-apiserver:

# Get structured statusz response
curl \
 --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
 --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
 --cacert /etc/kubernetes/pki/ca.crt \
 -H "Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz" \
 https://localhost:6443/statusz | jq .

# Get structured flagz response
curl \
 --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
 --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
 --cacert /etc/kubernetes/pki/ca.crt \
 -H "Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz" \
 https://localhost:6443/flagz | jq .

Note:

The examples above use client certificate authentication and verify the server's certificate using --cacert. If you need to bypass certificate verification in a test environment, you can use --insecure (or -k), but this should never be done in production as it makes you vulnerable to man-in-the-middle attacks.

Important considerations

Alpha feature status

The structured z-page responses are an alpha feature in Kubernetes 1.35. This means:

  • The API format may change in future releases
  • These endpoints are intended for debugging, not production automation
  • You should avoid relying on them for critical monitoring workflows until they reach beta or stable status

Security and access control

z-pages expose internal component information and require proper access controls. Here are the key security considerations:

Authorization: Access to z-page endpoints is restricted to members of the system:monitoring group, which follows the same authorization model as other debugging endpoints like /healthz, /livez, and /readyz. This ensures that only authorized users and service accounts can access debugging information. If your cluster uses RBAC, you can manage access by granting appropriate permissions to this group.

Authentication: The authentication requirements for these endpoints depend on your cluster's configuration. Unless anonymous authentication is enabled for your cluster, you typically need to use authentication mechanisms (such as client certificates) to access these endpoints.

Information disclosure: These endpoints reveal configuration details about your cluster components, including:

  • Component versions and build information
  • All command-line arguments and their values (with confidential values redacted)
  • Available debug endpoints

Only grant access to trusted operators and debugging tools. Avoid exposing these endpoints to unauthorized users or automated systems that don't require this level of access.

Future evolution

As the feature matures, we (Kubernetes SIG Instrumentation) expect to:

  • Introduce v1beta1 and eventually v1 versions of the API
  • Gather community feedback on the response schema
  • Potentially add additional z-page endpoints based on user needs

Try it out

We encourage you to experiment with structured z-pages in a test environment:

  1. Enable the ComponentStatusz and ComponentFlagz feature gates on your control plane components
  2. Try querying the endpoints with both plain text and structured formats
  3. Build a simple tool or script that uses the structured data
  4. Share your feedback with the community

Learn more

Get involved

We'd love to hear your feedback! The structured z-pages feature is designed to make Kubernetes easier to debug and monitor. Whether you're building internal tooling, contributing to open source projects, or just exploring the feature, your input helps shape the future of Kubernetes observability.

If you have questions, suggestions, or run into issues, please reach out to SIG Instrumentation. You can find us on Slack or at our regular community meetings.

Happy debugging!

Categories: CNCF Projects, Kubernetes

Pages

Subscribe to articles.innovatingtomorrow.net aggregator - CNCF Projects