You are here

Kubernetes

Kubernetes v1.35 Sneak Peek

Kubernetes Blog - Tue, 11/25/2025 - 19:00

As the release of Kubernetes v1.35 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the project's overall health. This blog post outlines planned changes for the v1.35 release that the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes cluster(s), and to keep you up to date with the latest developments. The information below is based on the current status of the v1.35 release and is subject to change before the final release date.

Deprecations and removals for Kubernetes v1.35

cgroup v1 support

On Linux nodes, container runtimes typically rely on cgroups (short for "control groups"). Support for using cgroup v2 has been stable in Kubernetes since v1.25, providing an alternative to the original v1 cgroup support. While cgroup v1 provided the initial resource control mechanism, it suffered from well-known inconsistencies and limitations. Adding support for cgroup v2 allowed use of a unified control group hierarchy, improved resource isolation, and served as the foundation for modern features, making legacy cgroup v1 support ready for removal. The removal of cgroup v1 support will only impact cluster administrators running nodes on older Linux distributions that do not support cgroup v2; on those nodes, the kubelet will fail to start. Administrators must migrate their nodes to systems with cgroup v2 enabled. More details on compatibility requirements will be available in a blog post soon after the v1.35 release.

To learn more, read about cgroup v2;
you can also track the switchover work via KEP-5573: Remove cgroup v1 support.

Deprecation of ipvs mode in kube-proxy

Many releases ago, the Kubernetes project implemented an ipvs mode in kube-proxy. It was adopted as a way to provide high-performance service load balancing, with better performance than the existing iptables mode. However, maintaining feature parity between ipvs and other kube-proxy modes became difficult, due to technical complexity and diverging requirements. This created significant technical debt and made the ipvs backend impractical to support alongside newer networking capabilities.

The Kubernetes project intends to deprecate kube-proxy ipvs mode in the v1.35 release, to streamline the kube-proxy codebase. For Linux nodes, the recommended kube-proxy mode is already nftables.

You can find more in KEP-5495: Deprecate ipvs mode in kube-proxy

Kubernetes is deprecating containerd v1.y support

While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases of containerd, as a consequence of automated cgroup driver detection, the Kubernetes SIG Node community has formally agreed upon a final support timeline for containerd v1.X. Kubernetes v1.35 is the last release to offer this support (aligned with containerd 1.7 EOL).

This is a final warning that if you are using containerd 1.X, you must switch to 2.0 or later before upgrading Kubernetes to the next version. You are able to monitor the kubelet_cri_losing_support metric to determine if any nodes in your cluster are using a containerd version that will soon be unsupported.

You can find more in the official blog post or in KEP-4033: Discover cgroup driver from CRI

The following enhancements are some of those likely to be included in the v1.35 release. This is not a commitment, and the release content is subject to change.

Node declared features

When scheduling Pods, Kubernetes uses node labels, taints, and tolerations to match workload requirements with node capabilities. However, managing feature compatibility becomes challenging during cluster upgrades due to version skew between the control plane and nodes. This can lead to Pods being scheduled on nodes that lack required features, resulting in runtime failures.

The node declared features framework will introduce a standard mechanism for nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it can support, publishing this information to the control plane through a new .status.declaredFeatures field. Then, the kube-scheduler, admission controllers and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints, ensuring that Pods run only on compatible nodes.

This approach reduces manual node labeling, improves scheduling accuracy, and prevents incompatible pod placements proactively. It also integrates with the Cluster Autoscaler for informed scale-up decisions. Feature declarations are temporary and tied to Kubernetes feature gates, enabling safe rollout and cleanup.

Targeting alpha in v1.35, node declared features aims to solve version skew scheduling issues by making node capabilities explicit, enhancing reliability and cluster stability in heterogeneous version environments.

To learn more about this before the official documentation is published, you can read KEP-5328.

In-place update of Pod resources

Kubernetes is graduating in-place updates for Pod resources to General Availability (GA). This feature allows users to adjust cpu and memory resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications. Previous Kubernetes releases already allowed you to change infrastructure resources settings (requests and limits) for existing Pods. This allows for smoother vertical scaling, improves efficiency, and can also simplify solution development.

The Container Runtime Interface (CRI) has also been improved, extending the UpdateContainerResources API for Windows and future runtimes while allowing ContainerStatus to report real-time resource configurations. Together, these changes make scaling in Kubernetes faster, more flexible, and disruption-free. The feature was introduced as alpha in v1.27, graduated to beta in v1.33, and is targeting graduation to stable in v1.35.

You can find more in KEP-1287: In-place Update of Pod Resources

Pod certificates

When running microservices, Pods often require a strong cryptographic identity to authenticate with each other using mutual TLS (mTLS). While Kubernetes provides Service Account tokens, these are designed for authenticating to the API server, not for general-purpose workload identity.

Before this enhancement, operators had to rely on complex, external projects like SPIFFE/SPIRE or cert-manager to provision and rotate certificates for their workloads. But what if you could issue a unique, short-lived certificate to your Pods natively and automatically? KEP-4317 is designed to enable such native workload identity. It opens up various possibilities for securing pod-to-pod communication by allowing the kubelet to request and mount certificates for a Pod via a projected volume.

This provides a built-in mechanism for workload identity, complete with automated certificate rotation, significantly simplifying the setup of service meshes and other zero-trust network policies. This feature was introduced as alpha in v1.34 and is targeting beta in v1.35.

You can find more in KEP-4317: Pod Certificates

Numeric values for taints

Kubernetes is enhancing taints and tolerations by adding numeric comparison operators, such as Gt (Greater Than) and Lt (Less Than).

Previously, tolerations supported only exact (Equal) or existence (Exists) matches, which were not suitable for numeric properties such as reliability SLAs.

With this change, a Pod can use a toleration to "opt-in" to nodes that meet a specific numeric threshold. For example, a Pod can require a Node with an SLA taint value greater than 950 (operator: Gt, value: "950").

This approach is more powerful than Node Affinity because it supports the NoExecute effect, allowing Pods to be automatically evicted if a node's numeric value drops below the tolerated threshold.

You can find more in KEP-5471: Enable SLA-based Scheduling

User namespaces

When running Pods, you can use securityContext to drop privileges, but containers inside the pod often still run as root (UID 0). This simplicity poses a significant challenge, as that container UID 0 maps directly to the host's root user.

Before this enhancement, a container breakout vulnerability could grant an attacker full root access to the node. But what if you could dynamically remap the container's root user to a safe, unprivileged user on the host? KEP-127 specifically allows such native support for Linux User Namespaces. It opens up various possibilities for pod security by isolating container and host user/group IDs. This allows a process to have root privileges (UID 0) within its namespace, while running as a non-privileged, high-numbered UID on the host.

Released as alpha in v1.25 and beta in v1.30, this feature continues to progress through beta maturity, paving the way for truly "rootless" containers that drastically reduce the attack surface for a whole class of security vulnerabilities.

You can find more in KEP-127: User Namespaces

Support for mounting OCI images as volumes

When provisioning a Pod, you often need to bundle data, binaries, or configuration files for your containers. Before this enhancement, people often included that kind of data directly into the main container image, or required a custom init container to download and unpack files into an emptyDir. You can still take either of those approaches, of course.

But what if you could populate a volume directly from a data-only artifact in an OCI registry, just like pulling a container image? Kubernetes v1.31 added support for the image volume type, allowing Pods to pull and unpack OCI container image artifacts into a volume declaratively.

This allows for seamless distribution of data, binaries, or ML models using standard registry tooling, completely decoupling data from the container image and eliminating the need for complex init containers or startup scripts. This volume type has been in beta since v1.33 and will likely be enabled by default in v1.35.

You can try out the beta version of image volumes, or you can learn more about the plans from KEP-4639: OCI Volume Source.

Want to know more?

New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what's new in Kubernetes v1.35 as part of the CHANGELOG for that release.

The Kubernetes v1.35 release is planned for December 17, 2025. Stay tuned for updates!

You can also see the announcements of changes in the release notes for:

Get involved

The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support.

Categories: CNCF Projects, Kubernetes

Kubernetes Configuration Good Practices

Kubernetes Blog - Mon, 11/24/2025 - 19:00

Configuration is one of those things in Kubernetes that seems small until it's not. Configuration is at the heart of every Kubernetes workload. A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.

This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage. Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.

This blog is inspired by the original Configuration Best Practices page, which has evolved through contributions from many members of the Kubernetes community.

General configuration practices

Use the latest stable API version

Kubernetes evolves fast. Older APIs eventually get deprecated and stop working. So, whenever you are defining resources, make sure you are using the latest stable API version. You can always check with

kubectl api-resources

This simple step saves you from future compatibility issues.

Store configuration in version control

Never apply manifest files directly from your desktop. Always keep them in a version control system like Git, it's your safety net. If something breaks, you can instantly roll back to a previous commit, compare changes or recreate your cluster setup without panic.

Write configs in YAML not JSON

Write your configuration files using YAML rather than JSON. Both work technically, but YAML is just easier for humans. It's cleaner to read and less noisy and widely used in the community.

YAML has some sneaky gotchas with boolean values: Use only true or false. Don't write yes, no, on or off. They might work in one version of YAML but break in another. To be safe, quote anything that looks like a Boolean (for example "yes").

Keep configuration simple and minimal

Avoid setting default values that are already handled by Kubernetes. Minimal manifests are easier to debug, cleaner to review and less likely to break things later.

If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.
It's easier to track changes and apply them as a unit. See the Guestbook all-in-one.yaml file for an example of this syntax.

You can even apply entire directories with:

kubectl apply -f configs/

One command and boom everything in that folder gets deployed.

Add helpful annotations

Manifest files are not just for machines, they are for humans too. Use annotations to describe why something exists or what it does. A quick one-liner can save hours when debugging later and also allows better collaboration.

The most helpful annotation to set is kubernetes.io/description. It's like using comment, except that it gets copied into the API so that everyone else can see it even after you deploy.

Managing Workloads: Pods, Deployments, and Jobs

A common early mistake in Kubernetes is creating Pods directly. Pods work, but they don't reschedule themselves if something goes wrong.

Naked Pods (Pods not managed by a controller, such as Deployment or a StatefulSet) are fine for testing, but in real setups, they are risky.

Why? Because if the node hosting that Pod dies, the Pod dies with it and Kubernetes won't bring it back automatically.

Use Deployments for apps that should always be running

A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as RollingUpdate), is almost always preferable to creating Pods directly. You can roll out a new version, and if something breaks, roll back instantly.

Use Jobs for tasks that should finish

A Job is perfect when you need something to run once and then stop like database migration or batch processing task. It will retry if the pods fails and report success when it's done.

Service Configuration and Networking

Services are how your workloads talk to each other inside (and sometimes outside) your cluster. Without them, your pods exist but can't reach anyone. Let's make sure that doesn't happen.

Create Services before workloads that use them

When Kubernetes starts a Pod, it automatically injects environment variables for existing Services. So, if a Pod depends on a Service, create a Service before its corresponding backend workloads (Deployments or StatefulSets), and before any workloads that need to access it.

For example, if a Service named foo exists, all containers will get the following variables in their initial environment:

FOO_SERVICE_HOST=<the host the Service runs on>
FOO_SERVICE_PORT=<the port the Service runs on>

DNS based discovery doesn't have this problem, but it's a good habit to follow anyway.

Use DNS for Service discovery

If your cluster has the DNS add-on (most do), every Service automatically gets a DNS entry. That means you can access it by name instead of IP:

curl http://my-service.default.svc.cluster.local

It's one of those features that makes Kubernetes networking feel magical.

Avoid hostPort and hostNetwork unless absolutely necessary

You'll sometimes see these options in manifests:

hostPort: 8080
hostNetwork: true

But here's the thing: They tie your Pods to specific nodes, making them harder to schedule and scale. Because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. Unless you're debugging or building something like a network plugin, avoid them.

If you just need local access for testing, try kubectl port-forward:

kubectl port-forward deployment/web 8080:80

See Use Port Forwarding to access applications in a cluster to learn more. Or if you really need external access, use a type: NodePort Service. That's the safer, Kubernetes-native way.

Use headless Services for internal discovery

Sometimes, you don't want Kubernetes to load balance traffic. You want to talk directly to each Pod. That's where headless Services come in.

You create one by setting clusterIP: None. Instead of a single IP, DNS gives you a list of all Pods IPs, perfect for apps that manage connections themselves.

Working with labels effectively

Labels are key/value pairs that are attached to objects such as Pods. Labels help you organize, query and group your resources. They don't do anything by themselves, but they make everything else from Services to Deployments work together smoothly.

Use semantics labels

Good labels help you understand what's what, even after months later. Define and use labels that identify semantic attributes of your application or Deployment. For example;

labels:
 app.kubernetes.io/name: myapp
 app.kubernetes.io/component: web
 tier: frontend
 phase: test
  • app.kubernetes.io/name : what the app is
  • tier : which layer it belongs to (frontend/backend)
  • phase : which stage it's in (test/prod)

You can then use these labels to make powerful selectors. For example:

kubectl get pods -l tier=frontend

This will list all frontend Pods across your cluster, no matter which Deployment they came from. Basically you are not manually listing Pod names; you are just describing what you want. See the guestbook app for examples of this approach.

Use common Kubernetes labels

Kubernetes actually recommends a set of common labels. It's a standardized way to name things across your different workloads or projects. Following this convention makes your manifests cleaner, and it means that tools such as Headlamp, dashboard, or third-party monitoring systems can all automatically understand what's running.

Manipulate labels for debugging

Since controllers (like ReplicaSets or Deployments) use labels to manage Pods, you can remove a label to “detach” a Pod temporarily.

Example:

kubectl label pod mypod app-

The app- part removes the label key app. Once that happens, the controller won’t manage that Pod anymore. It’s like isolating it for inspection, a “quarantine mode” for debugging. To interactively remove or add labels, use kubectl label.

You can then check logs, exec into it and once done, delete it manually. That’s a super underrated trick every Kubernetes engineer should know.

Handy kubectl tips

These small tips make life much easier when you are working with multiple manifest files or clusters.

Apply entire directories

Instead of applying one file at a time, apply the whole folder:

# Using server-side apply is also a good practice
kubectl apply -f configs/ --server-side

This command looks for .yaml, .yml and .json files in that folder and applies them all together. It's faster, cleaner and helps keep things grouped by app.

Use label selectors to get or delete resources

You don't always need to type out resource names one by one. Instead, use selectors to act on entire groups at once:

kubectl get pods -l app=myapp
kubectl delete pod -l phase=test

It's especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.

Quickly create Deployments and Services

For quick experiments, you don't always need to write a manifest. You can spin up a Deployment right from the CLI:

kubectl create deployment webapp --image=nginx

Then expose it as a Service:

kubectl expose deployment webapp --port=80

This is great when you just want to test something before writing full manifests. Also, see Use a Service to Access an Application in a cluster for an example.

Conclusion

Cleaner configuration leads to calmer cluster administrators. If you stick to a few simple habits: keep configuration simple and minimal, version-control everything, use consistent labels, and avoid relying on naked Pods, you'll save yourself hours of debugging down the road.

The best part? Clean configurations stay readable. Even after months, you or anyone on your team can glance at them and know exactly what’s happening.

Categories: CNCF Projects, Kubernetes

Helm 4 Released

Helm Blog - Sun, 11/16/2025 - 19:00

On Wednesday November 12th, during the Helm 4 presentation at KubeCon + CloudNativeCon, Helm v4.0.0 was released. This is the first new major version of Helm in 6 years.

What's New

Helm v3 has served the Kubernetes community well for many years. During that time we saw new ways to use Helm, new applications installed via charts, the rise of Artifact Hub, and numerous tools that build on top of Helm. We also saw where we wanted to add features but the internal architecture of Helm didn't provide a path forward without breaking public APIs in the SDK. Helm 4 makes those changes to enable new features now and into the future.

Some of the new features include:

  • Redesigned plugin system that supports Web Assembly based plugins
  • Post-renderers are now plugins
  • Server side apply is now supported
  • Improved resource watching, to support waiting, based on kstatus
  • Local Content-based caching (e.g. for charts)
  • Logging via slog enabling SDK logging to integrate with modern loggers
  • Reproducible/Idempotent builds of chart archives
  • Updated SDK API including support for multiple chart API versions (new experimental v3 chart API version coming soon)

You can learn about more of the changes in the Helm 4 Overview.

Helm v3 Support

When a major version of software comes out, it takes awhile to make the transition. Helm v3 will continue to be supported to enable a clean transition period. The dates of continued support are:

  • Bug fixes until July 8th 2026.
  • Security fixes until November 11th 2026.

Helm releases updates on Wednesdays (typically the 2nd Wednesday in a month) and these dates correspond with release schedule dates. During this time there will be NO features backported other than updates to the Kubernetes client libraries that enable support of new Kubernetes versions.

Learn More

You can learn about the Helm changes in the overview or find all the changes in the full changelog. The documentation shares many more details as you can find all the ways Helm has stayed the same and the new features you can take advantage of.

Categories: CNCF Projects, Kubernetes

Ingress NGINX Retirement: What You Need to Know

Kubernetes Blog - Tue, 11/11/2025 - 13:30

To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.

We recommend migrating to one of the many alternatives. Consider migrating to Gateway API, the modern replacement for Ingress. If you must continue using Ingress, many alternative Ingress controllers are listed in the Kubernetes documentation. Continue reading for further information about the history and current state of Ingress NGINX, as well as next steps.

About Ingress NGINX

Ingress is the original user-friendly way to direct network traffic to workloads running on Kubernetes. (Gateway API is a newer way to achieve many of the same goals.) In order for an Ingress to work in your cluster, there must be an Ingress controller running. There are many Ingress controller choices available, which serve the needs of different users and use cases. Some are cloud-provider specific, while others have more general applicability.

Ingress NGINX was an Ingress controller, developed early in the history of the Kubernetes project as an example implementation of the API. It became very popular due to its tremendous flexibility, breadth of features, and independence from any particular cloud or infrastructure provider. Since those days, many other Ingress controllers have been created within the Kubernetes project by community groups, and by cloud native vendors. Ingress NGINX has continued to be one of the most popular, deployed as part of many hosted Kubernetes platforms and within innumerable independent users’ clusters.

History and Challenges

The breadth and flexibility of Ingress NGINX has caused maintenance challenges. Changing expectations about cloud native software have also added complications. What were once considered helpful options have sometimes come to be considered serious security flaws, such as the ability to add arbitrary NGINX configuration directives via the "snippets" annotations. Yesterday’s flexibility has become today’s insurmountable technical debt.

Despite the project’s popularity among users, Ingress NGINX has always struggled with insufficient or barely-sufficient maintainership. For years, the project has had only one or two people doing development work, on their own time, after work hours and on weekends. Last year, the Ingress NGINX maintainers announced their plans to wind down Ingress NGINX and develop a replacement controller together with the Gateway API community. Unfortunately, even that announcement failed to generate additional interest in helping maintain Ingress NGINX or develop InGate to replace it. (InGate development never progressed far enough to create a mature replacement; it will also be retired.)

Current State and Next Steps

Currently, Ingress NGINX is receiving best-effort maintenance. SIG Network and the Security Response Committee have exhausted our efforts to find additional support to make Ingress NGINX sustainable. To prioritize user safety, we must retire the project.

In March 2026, Ingress NGINX maintenance will be halted, and the project will be retired. After that time, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. The GitHub repositories will be made read-only and left available for reference.

Existing deployments of Ingress NGINX will not be broken. Existing project artifacts such as Helm charts and container images will remain available.

In most cases, you can check whether you use Ingress NGINX by running kubectl get pods \--all-namespaces \--selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.

We would like to thank the Ingress NGINX maintainers for their work in creating and maintaining this project–their dedication remains impressive. This Ingress controller has powered billions of requests in datacenters and homelabs all around the world. In a lot of ways, Kubernetes wouldn’t be where it is without Ingress NGINX, and we are grateful for so many years of incredible effort.

SIG Network and the Security Response Committee recommend that all Ingress NGINX users begin migration to Gateway API or another Ingress controller immediately. Many options are listed in the Kubernetes documentation: Gateway API, Ingress. Additional options may be available from vendors you work with.

Categories: CNCF Projects, Kubernetes

Announcing the 2025 Steering Committee Election Results

Kubernetes Blog - Sun, 11/09/2025 - 15:10

The 2025 Steering Committee Election is now complete. The Kubernetes Steering Committee consists of 7 seats, 4 of which were up for election in 2025. Incoming committee members serve a term of 2 years, and all members are elected by the Kubernetes Community.

The Steering Committee oversees the governance of the entire Kubernetes project. With that great power comes great responsibility. You can learn more about the steering committee’s role in their charter.

Thank you to everyone who voted in the election; your participation helps support the community’s continued health and success.

Results

Congratulations to the elected committee members whose two year terms begin immediately (listed in alphabetical order by GitHub handle):

They join continuing members:

Maciej Szulik and Paco Xu are returning Steering Committee Members.

Big thanks!

Thank you and congratulations on a successful election to this round’s election officers:

Thanks to the Emeritus Steering Committee Members. Your service is appreciated by the community:

And thank you to all the candidates who came forward to run for election.

Get involved with the Steering Committee

This governing body, like all of Kubernetes, is open to all. You can follow along with Steering Committee meeting notes and weigh in by filing an issue or creating a PR against their repo. They have an open meeting on the first Monday at 8am PT of every month. They can also be contacted at their public mailing list [email protected].

You can see what the Steering Committee meetings are all about by watching past meetings on the YouTube Playlist.

This post was adapted from one written by the Contributor Comms Subproject. If you want to write stories about the Kubernetes community, learn more about us.

Categories: CNCF Projects, Kubernetes

Gateway API 1.4: New Features

Kubernetes Blog - Thu, 11/06/2025 - 12:00

Gateway API logo

Ready to rock your Kubernetes networking? The Kubernetes SIG Network community presented the General Availability (GA) release of Gateway API (v1.4.0)! Released on October 6, 2025, version 1.4.0 reinforces the path for modern, expressive, and extensible service networking in Kubernetes.

Gateway API v1.4.0 brings three new features to the Standard channel (Gateway API's GA release channel):

  • BackendTLSPolicy for TLS between gateways and backends
  • supportedFeatures in GatewayClass status
  • Named rules for Routes

and introduces three new experimental features:

  • Mesh resource for service mesh configuration
  • Default gateways to ease configuration burden**
  • externalAuth filter for HTTPRoute

Graduations to Standard Channel

Backend TLS policy

Leads: Candace Holman, Norwin Schnyder, Katarzyna Łach

GEP-1897: BackendTLSPolicy

BackendTLSPolicy is a new Gateway API type for specifying the TLS configuration of the connection from the Gateway to backend pod(s). . Prior to the introduction of BackendTLSPolicy, there was no API specification that allowed encrypted traffic on the hop from Gateway to backend.

The BackendTLSPolicy validation configuration requires a hostname. This hostname serves two purposes. It is used as the SNI header when connecting to the backend and for authentication, the certificate presented by the backend must match this hostname, unless subjectAltNames is explicitly specified.

If subjectAltNames (SANs) are specified, the hostname is only used for SNI, and authentication is performed against the SANs instead. If you still need to authenticate against the hostname value in this case, you MUST add it to the subjectAltNames list.

BackendTLSPolicy validation configuration also requires either caCertificateRefs or wellKnownCACertificates. caCertificateRefs refer to one or more (up to 8) PEM-encoded TLS certificate bundles. If there are no specific certificates to use, then depending on your implementation, you may use wellKnownCACertificates, set to "System" to tell the Gateway to use an implementation-specific set of trusted CA Certificates.

In this example, the BackendTLSPolicy is configured to use certificates defined in the auth-cert ConfigMap to connect with a TLS-encrypted upstream connection where pods backing the auth service are expected to serve a valid certificate for auth.example.com. It uses subjectAltNames with a Hostname type, but you may also use a URI type.

apiVersion: gateway.networking.k8s.io/v1
kind: BackendTLSPolicy
metadata:
 name: tls-upstream-auth
spec:
 targetRefs:
 - kind: Service
 name: auth
 group: ""
 sectionName: "https"
 validation:
 caCertificateRefs:
 - group: "" # core API group
 kind: ConfigMap
 name: auth-cert
 subjectAltNames:
 - type: "Hostname"
 hostname: "auth.example.com"

In this example, the BackendTLSPolicy is configured to use system certificates to connect with a TLS-encrypted backend connection where Pods backing the dev Service are expected to serve a valid certificate for dev.example.com.

apiVersion: gateway.networking.k8s.io/v1
kind: BackendTLSPolicy
metadata:
 name: tls-upstream-dev
spec:
 targetRefs:
 - kind: Service
 name: dev
 group: ""
 sectionName: "btls"
 validation:
 wellKnownCACertificates: "System"
 hostname: dev.example.com

More information on the configuration of TLS in Gateway API can be found in Gateway API - TLS Configuration.

Status information about the features that an implementation supports

Leads: Lior Lieberman, Beka Modebadze

GEP-2162: Supported features in GatewayClass Status

GatewayClass status has a new field, supportedFeatures. This addition allows implementations to declare the set of features they support. This provides a clear way for users and tools to understand the capabilities of a given GatewayClass.

This feature's name for conformance tests (and GatewayClass status reporting) is SupportedFeatures. Implementations must populate the supportedFeatures field in the .status of the GatewayClass before the GatewayClass is accepted, or in the same operation.

Here’s an example of a supportedFeatures published under GatewayClass' .status:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
...
status:
 conditions:
 - lastTransitionTime: "2022-11-16T10:33:06Z"
 message: Handled by Foo controller
 observedGeneration: 1
 reason: Accepted
 status: "True"
 type: Accepted
 supportedFeatures:
 - HTTPRoute
 - HTTPRouteHostRewrite
 - HTTPRoutePortRedirect
 - HTTPRouteQueryParamMatching

Graduation of SupportedFeatures to Standard, helped improve the conformance testing process for Gateway API. The conformance test suite will now automatically run tests based on the features populated in the GatewayClass' status. This creates a strong, verifiable link between an implementation's declared capabilities and the test results, making it easier for implementers to run the correct conformance tests and for users to trust the conformance reports.

This means when the SupportedFeatures field is populated in the GatewayClass status there will be no need for additional conformance tests flags like –suported-features, or –exempt or –all-features. It's important to note that Mesh features are an exception to this and can be tested for conformance by using Conformance Profiles, or by manually providing any combination of features related flags until the dedicated resource graduates from the experimental channel.

Named rules for Routes

GEP-995: Adding a new name field to all xRouteRule types (HTTPRouteRule, GRPCRouteRule, etc.)

Leads: Guilherme Cassolato

This enhancement enables route rules to be explicitly identified and referenced across the Gateway API ecosystem. Some of the key use cases include:

  • Status: Allowing status conditions to reference specific rules directly by name.
  • Observability: Making it easier to identify individual rules in logs, traces, and metrics.
  • Policies: Enabling policies (GEP-713) to target specific route rules via the sectionName field in their targetRef[s].
  • Tooling: Simplifying filtering and referencing of route rules in tools such as gwctl, kubectl, and general-purpose utilities like jq and yq.
  • Internal configuration mapping: Facilitating the generation of internal configurations that reference route rules by name within gateway and mesh implementations.

This follows the same well-established pattern already adopted for Gateway listeners, Service ports, Pods (and containers), and many other Kubernetes resources.

While the new name field is optional (so existing resources remain valid), its use is strongly encouraged. Implementations are not expected to assign a default value, but they may enforce constraints such as immutability.

Finally, keep in mind that the name format is validated, and other fields (such as sectionName) may impose additional, indirect constraints.

Experimental channel changes

Enabling external Auth for HTTPRoute

Giving Gateway API the ability to enforce authentication and maybe authorization as well at the Gateway or HTTPRoute level has been a highly requested feature for a long time. (See the GEP-1494 issue for some background.)

This Gateway API release adds an Experimental filter in HTTPRoute that tells the Gateway API implementation to call out to an external service to authenticate (and, optionally, authorize) requests.

This filter is based on the Envoy ext_authz API, and allows talking to an Auth service that uses either gRPC or HTTP for its protocol.

Both methods allow the configuration of what headers to forward to the Auth service, with the HTTP protocol allowing some extra information like a prefix path.

A HTTP example might look like this (noting that this example requires the Experimental channel to be installed and an implementation that supports External Auth to actually understand the config):

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: require-auth
 namespace: default
spec:
 parentRefs:
 - name: your-gateway-here
 rules:
 - matches:
 - path:
 type: Prefix
 value: /admin
 filters:
 - type: ExternalAuth
 externalAuth:
 protocol: HTTP
 backendRef:
 name: auth-service
 http:
 # These headers are always sent for the HTTP protocol,
 # but are included here for illustrative purposes
 allowedHeaders:
 - Host
 - Method
 - Path
 - Content-Length
 - Authorization
 backendRefs:
 - name: admin-backend
 port: 8080

This allows the backend Auth service to use the supplied headers to make a determination about the authentication for the request.

When a request is allowed, the external Auth service will respond with a 200 HTTP response code, and optionally extra headers to be included in the request that is forwarded to the backend. When the request is denied, the Auth service will respond with a 403 HTTP response.

Since the Authorization header is used in many authentication methods, this method can be used to do Basic, Oauth, JWT, and other common authentication and authorization methods.

Mesh resource

Lead(s): Flynn

GEP-3949: Mesh-wide configuration and supported features

Gateway API v1.4.0 introduces a new experimental Mesh resource, which provides a way to configure mesh-wide settings and discover the features supported by a given mesh implementation. This resource is analogous to the Gateway resource and will initially be mainly used for conformance testing, with plans to extend its use to off-cluster Gateways in the future.

The Mesh resource is cluster-scoped and, as an experimental feature, is named XMesh and resides in the gateway.networking.x-k8s.io API group. A key field is controllerName, which specifies the mesh implementation responsible for the resource. The resource's status stanza indicates whether the mesh implementation has accepted it and lists the features the mesh supports.

One of the goals of this GEP is to avoid making it more difficult for users to adopt a mesh. To simplify adoption, mesh implementations are expected to create a default Mesh resource upon startup if one with a matching controllerName doesn't already exist. This avoids the need for manual creation of the resource to begin using a mesh.

The new XMesh API kind, within the gateway.networking.x-k8s.io/v1alpha1 API group, provides a central point for mesh configuration and feature discovery (source).

A minimal XMesh object specifies the controllerName:

apiVersion: gateway.networking.x-k8s.io/v1alpha1
kind: XMesh
metadata:
 name: one-mesh-to-mesh-them-all
spec:
 controllerName: one-mesh.example.com/one-mesh

The mesh implementation populates the status field to confirm it has accepted the resource and to list its supported features ( source):

status:
 conditions:
 - type: Accepted
 status: "True"
 reason: Accepted
 supportedFeatures:
 - name: MeshHTTPRoute
 - name: OffClusterGateway

Introducing default Gateways

Lead(s): Flynn

GEP-3793: Allowing Gateways to program some routes by default.

For application developers, one common piece of feedback has been the need to explicitly name a parent Gateway for every single north-south Route. While this explicitness prevents ambiguity, it adds friction, especially for developers who just want to expose their application to the outside world without worrying about the underlying infrastructure's naming scheme. To address this, we have introduce the concept of Default Gateways.

For application developers: Just "use the default"

As an application developer, you often don't care about the specific Gateway your traffic flows through, you just want it to work. With this enhancement, you can now create a Route and simply ask it to use a default Gateway.

This is done by setting the new useDefaultGateways field in your Route's spec.

Here’s a simple HTTPRoute that uses a default Gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: my-route
spec:
 useDefaultGateways: All
 rules:
 - backendRefs:
 - name: my-service
 port: 80

That's it! No more need to hunt down the correct Gateway name for your environment. Your Route is now a "defaulted Route."

For cluster operators: You're still in control

This feature doesn't take control away from cluster operators ("Chihiro"). In fact, they have explicit control over which Gateways can act as a default. A Gateway will only accept these defaulted Routes if it is configured to do so.

You can also use a ValidatingAdmissionPolicy to either require or even forbid for Routes to rely on a default Gateway.

As a cluster operator, you can designate a Gateway as a default by setting the (new) .spec.defaultScope field:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: my-default-gateway
 namespace: default
spec:
 defaultScope: All
 # ... other gateway configuration

Operators can choose to have no default Gateways, or even multiple.

How it works and key details

  • To maintain a clean, GitOps-friendly workflow, a default Gateway does not modify the spec.parentRefs of your Route. Instead, the binding is reflected in the Route's status field. You can always inspect the status.parents stanza of your Route to see exactly which Gateway or Gateways have accepted it. This preserves your original intent and avoids conflicts with CD tools.

  • The design explicitly supports having multiple Gateways designated as defaults within a cluster. When this happens, a defaulted Route will bind to all of them. This enables cluster operators to perform zero-downtime migrations and testing of new default Gateways.

  • You can create a single Route that handles both north-south traffic (traffic entering or leaving the cluster, via a default Gateway) and east-west/mesh traffic (traffic between services within the cluster), by explicitly referencing a Service in parentRefs.

Default Gateways represent a significant step forward in making the Gateway API simpler and more intuitive for everyday use cases, bridging the gap between the flexibility needed by operators and the simplicity desired by developers.

Configuring client certificate validation

Lead(s): Arko Dasgupta, Katarzyna Łach

GEP-91: Address connection coalescing security issue

This release brings updates for configuring client certificate validation, addressing a critical security vulnerability related to connection reuse. HTTP connection coalescing is a web performance optimization that allows a client to reuse an existing TLS connection for requests to different domains. While this reduces the overhead of establishing new connections, it introduces a security risk in the context of API gateways. The ability to reuse a single TLS connection across multiple Listeners brings the need to introduce shared client certificate configuration in order to avoid unauthorized access.

Why SNI-based mTLS is not the answer

One might think that using Server Name Indication (SNI) to differentiate between Listeners would solve this problem. However, TLS SNI is not a reliable mechanism for enforcing security policies in a connection coalescing scenario. A client could use a single TLS connection for multiple peer connections, as long as they are all covered by the same certificate. This means that a client could establish a connection by indicating one peer identity (using SNI), and then reuse that connection to access a different virtual host that is listening on the same IP address and port. That reuse, which is controlled by client side heuristics, could bypass mutual TLS policies that were specific to the second listener configuration.

Here's an example to help explain it:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: wildcard-tls-gateway
spec:
 gatewayClassName: example
 listeners:
 - name: foo-https
 protocol: HTTPS
 port: 443
 hostname: foo.example.com
 tls:
 certificateRefs:
 - group: "" # core API group
 kind: Secret
 name: foo-example-com-cert # SAN: foo.example.com
 - name: wildcard-https
 protocol: HTTPS
 port: 443
 hostname: "*.example.com"
 tls:
 certificateRefs:
 - group: "" # core API group
 kind: Secret
 name: wildcard-example-com-cert # SAN: *.example.com

I have configured a Gateway with two listeners, both having overlapping hostnames. My intention is for the foo-http listener to be accessible only by clients presenting the foo-example-com-cert certificate. In contrast, the wildcard-https listener should allow access to a broader audience using any certificate valid for the *.example.com domain.

Consider a scenario where a client initially connects to foo.example.com. The server requests and successfully validates the foo-example-com-cert certificate, establishing the connection. Subsequently, the same client wishes to access other sites within this domain, such as bar.example.com, which is handled by the wildcard-https listener. Due to connection reuse, clients can access wildcard-https backends without an additional TLS handshake on the existing connection. This process functions as expected.

However, a critical security vulnerability arises when the order of access is reversed. If a client first connects to bar.example.com and presents a valid bar.example.com certificate, the connection is successfully established. If this client then attempts to access foo.example.com, the existing connection's client certificate will not be re-validated. This allows the client to bypass the specific certificate requirement for the foo backend, leading to a serious security breach.

The solution: per-port TLS configuration

The updated Gateway API gains a tls field in the .spec of a Gateway, that allows you to define a default client certificate validation configuration for all Listeners, and then if needed override it on a per-port basis. This provides a flexible and powerful way to manage your TLS policies.

Here’s a look at the updated API definitions (shown as Go source code):

// GatewaySpec defines the desired state of Gateway.
type GatewaySpec struct {
 ...
 // GatewayTLSConfig specifies frontend tls configuration for gateway.
 TLS *GatewayTLSConfig `json:"tls,omitempty"`
}

// GatewayTLSConfig specifies frontend tls configuration for gateway.
type GatewayTLSConfig struct {
 // Default specifies the default client certificate validation configuration
 Default TLSConfig `json:"default"`

 // PerPort specifies tls configuration assigned per port.
 PerPort []TLSPortConfig `json:"perPort,omitempty"`
}

// TLSPortConfig describes a TLS configuration for a specific port.
type TLSPortConfig struct {
 // The Port indicates the Port Number to which the TLS configuration will be applied.
 Port PortNumber `json:"port"`

 // TLS store the configuration that will be applied to all Listeners handling
 // HTTPS traffic and matching given port.
 TLS TLSConfig `json:"tls"`
}

Breaking changes

Standard GRPCRoute - .spec field required (technicality)

The promotion of GRPCRoute to Standard introduces a minor but technically breaking change regarding the presence of the top-level .spec field. As part of achieving Standard status, the Gateway API has tightened the OpenAPI schema validation within the GRPCRoute CustomResourceDefinition (CRD) to explicitly ensure the spec field is required for all GRPCRoute resources. This change enforces stricter conformance to Kubernetes object standards and enhances the resource's stability and predictability. While it is highly unlikely that users were attempting to define a GRPCRoute without any specification, any existing automation or manifests that might have relied on a relaxed interpretation allowing a completely absent spec field will now fail validation and must be updated to include the .spec field, even if empty.

Experimental CORS support in HTTPRoute - breaking change for allowCredentials field

The Gateway API subproject has introduced a breaking change to the Experimental CORS support in HTTPRoute, concerning the allowCredentials field within the CORS policy. This field's definition has been strictly aligned with the upstream CORS specification, which dictates that the corresponding Access-Control-Allow-Credentials header must represent a Boolean value. Previously, the implementation might have been overly permissive, potentially accepting non-standard or string representations such as true due to relaxed schema validation. Users who were configuring CORS rules must now review their manifests and ensure the value for allowCredentials strictly conforms to the new, more restrictive schema. Any existing HTTPRoute definitions that do not adhere to this stricter validation will now be rejected by the API server, requiring a configuration update to maintain functionality.

Improving the development and usage experience

As part of this release, we have improved some of the developer experience workflow:

  • Added Kube API Linter to the CI/CD pipelines, reducing the burden of API reviewers and also reducing the amount of common mistakes.
  • Improving the execution time of CRD tests with the usage of envtest.

Additionally, as part of the effort to improve Gateway API usage experience, some efforts were made to remove some ambiguities and some old tech-debts from our documentation website:

  • The API reference is now explicit when a field is experimental.
  • The GEP (GatewayAPI Enhancement Proposal) navigation bar is automatically generated, reflecting the real status of the enhancements.

Try it out

Unlike other Kubernetes APIs, you don't need to upgrade to the latest version of Kubernetes to get the latest version of Gateway API. As long as you're running Kubernetes 1.26 or later, you'll be able to get up and running with this version of Gateway API.

To try out the API, follow the Getting Started Guide.

As of this writing, seven implementations are already conformant with Gateway API v1.4.0. In alphabetical order:

Get involved

Wondering when a feature will be added? There are lots of opportunities to get involved and help define the future of Kubernetes routing APIs for both ingress and service mesh.

The maintainers would like to thank everyone who's contributed to Gateway API, whether in the form of commits to the repo, discussion, ideas, or general support. We could never have made this kind of progress without the support of this dedicated and active community.

Categories: CNCF Projects, Kubernetes

Helm @ KubeCon + CloudNativeCon NA '25

Helm Blog - Mon, 11/03/2025 - 19:00

The Helm team is headed to KubeCon + CloudNativeCon NA '25 in Atlanta, Georgia next week and it's truly a special one for us! This time around, as we celebrate our 10th birthday (fun fact, Helm was launched at the first KubeCon in 2015), we will also be releasing the highly anticipated Helm 4! Join us for a series of exciting activities throughout the week -- read on for more details!

Helm Booth in Project Pavilion

Don't miss out on meeting our project maintainers at the Helm booth - we'll be hanging out the second half of each day. Drop by to ask questions, learn about what's in Helm 4, and pick up special Hazel swag celebrating Helm's 10th birthday!

  • TUES Nov 11 @ 03:30 PM - 07:45 PM ET

  • WED Nov 12 @ 02:00 PM - 05:00 PM ET

  • THUR Nov 13 @ 12:30 PM - 02:00 PM ET

LOCATION: 8B, back side of Flux and across from Argo

Tuesday November 11, 2025

Simplifying Advanced AI Model Serving on Kubernetes Using Helm Charts

TIME: 12:00 PM - 12:30 PM ET

LOCATION: Building B | Level 4 | B401-402

SPEAKERS: Ajay Vohra & Caron Zhang

The AI model serving landscape on Kubernetes presents practitioners with an overwhelming array of technology choices: From inference servers like Ray Serve and Triton Inference Server, inference engines like vLLM, and orchestration platforms like Ray Cluster and KServe. While this diversity drives innovation, it also creates complexity. Teams often prematurely standardize on limited technology stacks to manage this complexity.

This talk introduces an innovative Helm-based approach that abstracts the complexity of AI model serving while preserving the flexibility to leverage the best tools for each use case. Our solution is accelerator agnostic, and provides a consistent YAML interface for deploying and experimenting with various serving technologies.

We'll demonstrate this approach through two concrete examples of multi-node, multi-accelerator model serving with auto scaling: 1/ Ray Serve + vLLM + Ray Cluster, and 2/ LeaderWorkerSet + Triton Inference Server + vLLM + Ray Cluster + HPA.

Wednesday November 12, 2025

Maintainer Track: Introducing Helm 4

TIME: 11:00 AM - 11:30 AM ET

LOCATION: Building C | Level 3 | Georgia Ballroom 1

SPEAKERS: Matt Farina & Robert Sirchia (Helm Maintainers)

The wait is over! After six years with Helm v3, Helm v4 is finally here. In this session you'll learn about Helm v4, why there was 6 years between major versions (from backwards compatible feature development to maintainer ups and downs), what's new in Helm v4, how long Helm v3 will still be supported, and what comes next. Could that include a Helm v5?

Contribfest: Hands-On With Helm 4: Wasm Plugins, OCI, and Resource Sequencing. Oh My!

TIME: 02:15pm - 03:30 PM ET

LOCATION: Building B | Level 2 | B207

SPEAKERS: Andrew Block, Scott Rigby, & George Jenkins (Helm Maintainers)

Join Helm maintainers for an interactive session contributing to core Helm and building integrations with some of Helm 4's emerging features. We'll guide contributors through creating Helm 4's newest enhancements including WebAssembly plugins, enhancements to how OCI content is manged, and implementing resource sequencing for controlled deployment order. Attendees will explore how to build Download/Postrender/CLI plugins in WebAssembly, develop capabilities related to changes to Helm's management of OCI content including repository prefixes and aliases, and use approaches for sequencing chart deployments beyond Helm's traditional mechanisms.

This session is geared toward anyone interested in Helm development including leveraging and building upon some of the latest features associated with Helm 4!

Helm 4 Release Party

TIME: 06:00 PM - 09:00 PM EST

LOCATION: Max Lager's Wood-Fired Grill & Brewery

Replicated and the CNCF are throwing a Helm 4 Release Party to celebrate the release of Helm 4! Drop by for a low country boil and hang out with the Helm project maintainers for the night! See the invitation here, and don't forget to save your spot – RSVP here.

Thursday November 13, 2025

Mission Abort: Intercepting Dangerous Deletes Before Helm Hits Apply

TIME: 01:45 PM - 02:15 PM ET

LOCATION: Building B | Level 5 | Thomas Murphy Ballroom 4

SPEAKERS: Payal Godhani

What if your next Helm deployment silently deletes a LoadBalancer, a Gateway, or an entire namespace? We’ve lived that nightmare—multiple times. In this talk, we’ll share how we turned painful Sev1 outages into a resilient, guardrail-first deployment strategy. By integrating Helm Diff and Argo CD Diff, we built a system that scans every deployment for destructive changes—like the removal of LoadBalancers, KGateways, Services, PVCs, or Namespaces—and blocks them unless explicitly approved. This second-layer approval acts as a safety circuit for your release pipelines. No guesswork. No blind deploys. Just real-time visibility into what’s about to break—before it actually does. Whether you’re managing a single cluster or an entire fleet, this talk will show you how to stop fearing Helm and start trusting it again. Because resilience isn’t about avoiding failure—it’s about learning, adapting, and building guardrails that protect everyone.

Categories: CNCF Projects, Kubernetes

7 Common Kubernetes Pitfalls (and How I Learned to Avoid Them)

Kubernetes Blog - Mon, 10/20/2025 - 11:30

It’s no secret that Kubernetes can be both powerful and frustrating at times. When I first started dabbling with container orchestration, I made more than my fair share of mistakes enough to compile a whole list of pitfalls. In this post, I want to walk through seven big gotchas I’ve encountered (or seen others run into) and share some tips on how to avoid them. Whether you’re just kicking the tires on Kubernetes or already managing production clusters, I hope these insights help you steer clear of a little extra stress.

1. Skipping resource requests and limits

The pitfall: Not specifying CPU and memory requirements in Pod specifications. This typically happens because Kubernetes does not require these fields, and workloads can often start and run without them—making the omission easy to overlook in early configurations or during rapid deployment cycles.

Context: In Kubernetes, resource requests and limits are critical for efficient cluster management. Resource requests ensure that the scheduler reserves the appropriate amount of CPU and memory for each pod, guaranteeing that it has the necessary resources to operate. Resource limits cap the amount of CPU and memory a pod can use, preventing any single pod from consuming excessive resources and potentially starving other pods. When resource requests and limits are not set:

  1. Resource Starvation: Pods may get insufficient resources, leading to degraded performance or failures. This is because Kubernetes schedules pods based on these requests. Without them, the scheduler might place too many pods on a single node, leading to resource contention and performance bottlenecks.
  2. Resource Hoarding: Conversely, without limits, a pod might consume more than its fair share of resources, impacting the performance and stability of other pods on the same node. This can lead to issues such as other pods getting evicted or killed by the Out-Of-Memory (OOM) killer due to lack of available memory.

How to avoid it:

  • Start with modest requests (for example 100m CPU, 128Mi memory) and see how your app behaves.
  • Monitor real-world usage and refine your values; the HorizontalPodAutoscaler can help automate scaling based on metrics.
  • Keep an eye on kubectl top pods or your logging/monitoring tool to confirm you’re not over- or under-provisioning.

My reality check: Early on, I never thought about memory limits. Things seemed fine on my local cluster. Then, on a larger environment, Pods got OOMKilled left and right. Lesson learned. For detailed instructions on configuring resource requests and limits for your containers, please refer to Assign Memory Resources to Containers and Pods (part of the official Kubernetes documentation).

2. Underestimating liveness and readiness probes

The pitfall: Deploying containers without explicitly defining how Kubernetes should check their health or readiness. This tends to happen because Kubernetes will consider a container “running” as long as the process inside hasn’t exited. Without additional signals, Kubernetes assumes the workload is functioning—even if the application inside is unresponsive, initializing, or stuck.

Context:
Liveness, readiness, and startup probes are mechanisms Kubernetes uses to monitor container health and availability.

  • Liveness probes determine if the application is still alive. If a liveness check fails, the container is restarted.
  • Readiness probes control whether a container is ready to serve traffic. Until the readiness probe passes, the container is removed from Service endpoints.
  • Startup probes help distinguish between long startup times and actual failures.

How to avoid it:

  • Add a simple HTTP livenessProbe to check a health endpoint (for example /healthz) so Kubernetes can restart a hung container.
  • Use a readinessProbe to ensure traffic doesn’t reach your app until it’s warmed up.
  • Keep probes simple. Overly complex checks can create false alarms and unnecessary restarts.

My reality check: I once forgot a readiness probe for a web service that took a while to load. Users hit it prematurely, got weird timeouts, and I spent hours scratching my head. A 3-line readiness probe would have saved the day.

For comprehensive instructions on configuring liveness, readiness, and startup probes for containers, please refer to Configure Liveness, Readiness and Startup Probes in the official Kubernetes documentation.

3. “We’ll just look at container logs” (famous last words)

The pitfall: Relying solely on container logs retrieved via kubectl logs. This often happens because the command is quick and convenient, and in many setups, logs appear accessible during development or early troubleshooting. However, kubectl logs only retrieves logs from currently running or recently terminated containers, and those logs are stored on the node’s local disk. As soon as the container is deleted, evicted, or the node is restarted, the log files may be rotated out or permanently lost.

How to avoid it:

  • Centralize logs using CNCF tools like Fluentd or Fluent Bit to aggregate output from all Pods.
  • Adopt OpenTelemetry for a unified view of logs, metrics, and (if needed) traces. This lets you spot correlations between infrastructure events and app-level behavior.
  • Pair logs with Prometheus metrics to track cluster-level data alongside application logs. If you need distributed tracing, consider CNCF projects like Jaeger.

My reality check: The first time I lost Pod logs to a quick restart, I realized how flimsy “kubectl logs” can be on its own. Since then, I’ve set up a proper pipeline for every cluster to avoid missing vital clues.

4. Treating dev and prod exactly the same

The pitfall: Deploying the same Kubernetes manifests with identical settings across development, staging, and production environments. This often occurs when teams aim for consistency and reuse, but overlook that environment-specific factors—such as traffic patterns, resource availability, scaling needs, or access control—can differ significantly. Without customization, configurations optimized for one environment may cause instability, poor performance, or security gaps in another.

How to avoid it:

  • Use environment overlays or kustomize to maintain a shared base while customizing resource requests, replicas, or config for each environment.
  • Extract environment-specific configuration into ConfigMaps and / or Secrets. You can use a specialized tool such as Sealed Secrets to manage confidential data.
  • Plan for scale in production. Your dev cluster can probably get away with minimal CPU/memory, but prod might need significantly more.

My reality check: One time, I scaled up replicaCount from 2 to 10 in a tiny dev environment just to “test.” I promptly ran out of resources and spent half a day cleaning up the aftermath. Oops.

5. Leaving old stuff floating around

The pitfall: Leaving unused or outdated resources—such as Deployments, Services, ConfigMaps, or PersistentVolumeClaims—running in the cluster. This often happens because Kubernetes does not automatically remove resources unless explicitly instructed, and there is no built-in mechanism to track ownership or expiration. Over time, these forgotten objects can accumulate, consuming cluster resources, increasing cloud costs, and creating operational confusion, especially when stale Services or LoadBalancers continue to route traffic.

How to avoid it:

  • Label everything with a purpose or owner label. That way, you can easily query resources you no longer need.
  • Regularly audit your cluster: run kubectl get all -n <namespace> to see what’s actually running, and confirm it’s all legit.
  • Adopt Kubernetes’ Garbage Collection: K8s docs show how to remove dependent objects automatically.
  • Leverage policy automation: Tools like Kyverno can automatically delete or block stale resources after a certain period, or enforce lifecycle policies so you don’t have to remember every single cleanup step.

My reality check: After a hackathon, I forgot to tear down a “test-svc” pinned to an external load balancer. Three weeks later, I realized I’d been paying for that load balancer the entire time. Facepalm.

6. Diving too deep into networking too soon

The pitfall: Introducing advanced networking solutions—such as service meshes, custom CNI plugins, or multi-cluster communication—before fully understanding Kubernetes' native networking primitives. This commonly occurs when teams implement features like traffic routing, observability, or mTLS using external tools without first mastering how core Kubernetes networking works: including Pod-to-Pod communication, ClusterIP Services, DNS resolution, and basic ingress traffic handling. As a result, network-related issues become harder to troubleshoot, especially when overlays introduce additional abstractions and failure points.

How to avoid it:

  • Start small: a Deployment, a Service, and a basic ingress controller such as one based on NGINX (e.g., Ingress-NGINX).
  • Make sure you understand how traffic flows within the cluster, how service discovery works, and how DNS is configured.
  • Only move to a full-blown mesh or advanced CNI features when you actually need them, complex networking adds overhead.

My reality check: I tried Istio on a small internal app once, then spent more time debugging Istio itself than the actual app. Eventually, I stepped back, removed Istio, and everything worked fine.

7. Going too light on security and RBAC

The pitfall: Deploying workloads with insecure configurations, such as running containers as the root user, using the latest image tag, disabling security contexts, or assigning overly broad RBAC roles like cluster-admin. These practices persist because Kubernetes does not enforce strict security defaults out of the box, and the platform is designed to be flexible rather than opinionated. Without explicit security policies in place, clusters can remain exposed to risks like container escape, unauthorized privilege escalation, or accidental production changes due to unpinned images.

How to avoid it:

  • Use RBAC to define roles and permissions within Kubernetes. While RBAC is the default and most widely supported authorization mechanism, Kubernetes also allows the use of alternative authorizers. For more advanced or external policy needs, consider solutions like OPA Gatekeeper (based on Rego), Kyverno, or custom webhooks using policy languages such as CEL or Cedar.
  • Pin images to specific versions (no more :latest!). This helps you know what’s actually deployed.
  • Look into Pod Security Admission (or other solutions like Kyverno) to enforce non-root containers, read-only filesystems, etc.

My reality check: I never had a huge security breach, but I’ve heard plenty of cautionary tales. If you don’t tighten things up, it’s only a matter of time before something goes wrong.

Final thoughts

Kubernetes is amazing, but it’s not psychic, it won’t magically do the right thing if you don’t tell it what you need. By keeping these pitfalls in mind, you’ll avoid a lot of headaches and wasted time. Mistakes happen (trust me, I’ve made my share), but each one is a chance to learn more about how Kubernetes truly works under the hood. If you’re curious to dive deeper, the official docs and the community Slack are excellent next steps. And of course, feel free to share your own horror stories or success tips, because at the end of the day, we’re all in this cloud native adventure together.

Happy Shipping!

Categories: CNCF Projects, Kubernetes

Helm Turns 10

Helm Blog - Sat, 10/18/2025 - 20:00

Ten years ago, in a hackathon shortly after the release of Kubernetes 1.1.0, Helm was born.

commit ecad6e2ef9523a0218864ec552bbfc724f0b9d3d
Author: Matt Butcher <[email protected]>
Date: Mon Oct 19 17:43:26 2015 -0600

initial add

The first commit can be found on the helm-classic Git repository where the codebase for Helm v1 is located. This is the original Helm, before it merged with Deployment Manager and was folded into the Kubernetes project.

This commit was just the beginning. Helm would be shown off at the first KubeCon, just a few weeks later. From there Helm development would take off and a community of developers and charts would follow.

Happy 10th Birthday, Helm!

Happy 10th Birthday Helm

Categories: CNCF Projects, Kubernetes

Helm Turns 10

Helm Blog - Sat, 10/18/2025 - 20:00

Ten years ago, in a hackathon shortly after the release of Kubernetes 1.1.0, Helm was born.

commit ecad6e2ef9523a0218864ec552bbfc724f0b9d3d
Author: Matt Butcher <[email protected]>
Date: Mon Oct 19 17:43:26 2015 -0600

 initial add

The first commit can be found on the helm-classic Git repository where the codebase for Helm v1 is located. This is the original Helm, before it merged with Deployment Manager and was folded into the Kubernetes project.

Categories: CNCF Projects, Kubernetes

Spotlight on Policy Working Group

Kubernetes Blog - Fri, 10/17/2025 - 20:00

(Note: The Policy Working Group has completed its mission and is no longer active. This article reflects its work, accomplishments, and insights into how a working group operates.)

In the complex world of Kubernetes, policies play a crucial role in managing and securing clusters. But have you ever wondered how these policies are developed, implemented, and standardized across the Kubernetes ecosystem? To answer that, let's take a look back at the work of the Policy Working Group.

The Policy Working Group was dedicated to a critical mission: providing an overall architecture that encompasses both current policy-related implementations and future policy proposals in Kubernetes. Their goal was both ambitious and essential: to develop a universal policy architecture that benefits developers and end-users alike.

Through collaborative methods, this working group strove to bring clarity and consistency to the often complex world of Kubernetes policies. By focusing on both existing implementations and future proposals, they ensured that the policy landscape in Kubernetes remains coherent and accessible as the technology evolves.

This blog post dives deeper into the work of the Policy Working Group, guided by insights from its former co-chairs:

Interviewed by Arujjwal Negi.

These co-chairs explained what the Policy Working Group was all about.

Introduction

Hello, thank you for the time! Let’s start with some introductions, could you tell us a bit about yourself, your role, and how you got involved in Kubernetes?

Jim Bugwadia: My name is Jim Bugwadia, and I am a co-founder and the CEO at Nirmata which provides solutions that automate security and compliance for cloud-native workloads. At Nirmata, we have been working with Kubernetes since it started in 2014. We initially built a Kubernetes policy engine in our commercial platform and later donated it to CNCF as the Kyverno project. I joined the CNCF Kubernetes Policy Working Group to help build and standardize various aspects of policy management for Kubernetes and later became a co-chair.

Andy Suderman: My name is Andy Suderman and I am the CTO of Fairwinds, a managed Kubernetes-as-a-Service provider. I began working with Kubernetes in 2016 building a web conferencing platform. I am an author and/or maintainer of several Kubernetes-related open-source projects such as Goldilocks, Pluto, and Polaris. Polaris is a JSON-schema-based policy engine, which started Fairwinds' journey into the policy space and my involvement in the Policy Working Group.

Poonam Lamba: My name is Poonam Lamba, and I currently work as a Product Manager for Google Kubernetes Engine (GKE) at Google. My journey with Kubernetes began back in 2017 when I was building an SRE platform for a large enterprise, using a private cloud built on Kubernetes. Intrigued by its potential to revolutionize the way we deployed and managed applications at the time, I dove headfirst into learning everything I could about it. Since then, I've had the opportunity to build the policy and compliance products for GKE. I lead and contribute to GKE CIS benchmarks. I am involved with the Gatekeeper project as well as I have contributed to Policy-WG for over 2 years and served as a co-chair for the group.

Responses to the following questions represent an amalgamation of insights from the former co-chairs.

About Working Groups

One thing even I am not aware of is the difference between a working group and a SIG. Can you help us understand what a working group is and how it is different from a SIG?

Unlike SIGs, working groups are temporary and focused on tackling specific, cross-cutting issues or projects that may involve multiple SIGs. Their lifespan is defined, and they disband once they've achieved their objective. Generally, working groups don't own code or have long-term responsibility for managing a particular area of the Kubernetes project.

(To know more about SIGs, visit the list of Special Interest Groups)

You mentioned that Working Groups involve multiple SIGS. What SIGS was the Policy WG closely involved with, and how did you coordinate with them?

The group collaborated closely with Kubernetes SIG Auth throughout our existence, and more recently, the group also worked with SIG Security since its formation. Our collaboration occurred in a few ways. We provided periodic updates during the SIG meetings to keep them informed of our progress and activities. Additionally, we utilize other community forums to maintain open lines of communication and ensured our work aligned with the broader Kubernetes ecosystem. This collaborative approach helped the group stay coordinated with related efforts across the Kubernetes community.

Policy WG

Why was the Policy Working Group created?

To enable a broad set of use cases, we recognize that Kubernetes is powered by a highly declarative, fine-grained, and extensible configuration management system. We've observed that a Kubernetes configuration manifest may have different portions that are important to various stakeholders. For example, some parts may be crucial for developers, while others might be of particular interest to security teams or address operational concerns. Given this complexity, we believe that policies governing the usage of these intricate configurations are essential for success with Kubernetes.

Our Policy Working Group was created specifically to research the standardization of policy definitions and related artifacts. We saw a need to bring consistency and clarity to how policies are defined and implemented across the Kubernetes ecosystem, given the diverse requirements and stakeholders involved in Kubernetes deployments.

Can you give me an idea of the work you did in the group?

We worked on several Kubernetes policy-related projects. Our initiatives included:

  • We worked on a Kubernetes Enhancement Proposal (KEP) for the Kubernetes Policy Reports API. This aims to standardize how policy reports are generated and consumed within the Kubernetes ecosystem.
  • We conducted a CNCF survey to better understand policy usage in the Kubernetes space. This helped gauge the practices and needs across the community at the time.
  • We wrote a paper that will guide users in achieving PCI-DSS compliance for containers. This is intended to help organizations meet important security standards in their Kubernetes environments.
  • We also worked on a paper highlighting how shifting security down can benefit organizations. This focuses on the advantages of implementing security measures earlier in the development and deployment process.

Can you tell us what were the main objectives of the Policy Working Group and some of your key accomplishments?

The charter of the Policy WG was to help standardize policy management for Kubernetes and educate the community on best practices.

To accomplish this we updated the Kubernetes documentation (Policies | Kubernetes), produced several whitepapers (Kubernetes Policy Management, Kubernetes GRC), and created the Policy Reports API (API reference) which standardizes reporting across various tools. Several popular tools such as Falco, Trivy, Kyverno, kube-bench, and others support the Policy Report API. A major milestone for the Policy WG was promoting the Policy Reports API to a SIG-level API or finding it a stable home.

Beyond that, as ValidatingAdmissionPolicy and MutatingAdmissionPolicy approached GA in Kubernetes, a key goal of the WG was to guide and educate the community on the tradeoffs and appropriate usage patterns for these built-in API objects and other CNCF policy management solutions like OPA/Gatekeeper and Kyverno.

Challenges

What were some of the major challenges that the Policy Working Group worked on?

During our work in the Policy Working Group, we encountered several challenges:

  • One of the main issues we faced was finding time to consistently contribute. Given that many of us have other professional commitments, it can be difficult to dedicate regular time to the working group's initiatives.

  • Another challenge we experienced was related to our consensus-driven model. While this approach ensures that all voices are heard, it can sometimes lead to slower decision-making processes. We valued thorough discussion and agreement, but this can occasionally delay progress on our projects.

  • We've also encountered occasional differences of opinion among group members. These situations require careful navigation to ensure that we maintain a collaborative and productive environment while addressing diverse viewpoints.

  • Lastly, we've noticed that newcomers to the group may find it difficult to contribute effectively without consistent attendance at our meetings. The complex nature of our work often requires ongoing context, which can be challenging for those who aren't able to participate regularly.

Can you tell me more about those challenges? How did you discover each one? What has the impact been? What were some strategies you used to address them?

There are no easy answers, but having more contributors and maintainers greatly helps! Overall the CNCF community is great to work with and is very welcoming to beginners. So, if folks out there are hesitating to get involved, I highly encourage them to attend a WG or SIG meeting and just listen in.

It often takes a few meetings to fully understand the discussions, so don't feel discouraged if you don't grasp everything right away. We made a point to emphasize this and encouraged new members to review documentation as a starting point for getting involved.

Additionally, differences of opinion were valued and encouraged within the Policy-WG. We adhered to the CNCF core values and resolve disagreements by maintaining respect for one another. We also strove to timebox our decisions and assign clear responsibilities to keep things moving forward.

This is where our discussion about the Policy Working Group ends. The working group, and especially the people who took part in this article, hope this gave you some insights into the group's aims and workings. You can get more info about Working Groups here.

Categories: CNCF Projects, Kubernetes

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Kubernetes Blog - Sun, 10/05/2025 - 20:00

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Map view showing relationships between resources

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

NodePool default metrics shown with controls to see different frequencies

NodeClaim default metrics shown with controls to see different frequencies

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Pod Placement Decisions data including reason, from, pod, message, and age

Node decision data including Type, Reason, Node, From, Message

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Config editor with validation support

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Node claims data including Name, Status, Instance Type, CPU, Zone, Age, and Actions

Node Pools data including Name, NodeClass, CPU, Memory, Nodes, Status, Age, Actions

EC2 Node Classes data including Name, Cluster, Instance Profile, Status, IAM Role, Age, and Actions

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Pending Pods data including Name, Namespace, Type, Reason, From, and Message

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name Tested Extra provider specific info supported AWS ✅ ✅ Azure ✅ ✅ AlibabaCloud ❌ ❌ Bizfly Cloud ❌ ❌ Cluster API ❌ ❌ GCP ❌ ❌ Proxmox ❌ ❌ Oracle Cloud Infrastructure (OCI) ❌ ❌

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

Categories: CNCF Projects, Kubernetes

GKE 10 years and SIG Networking, With Antonio Ojea

Kubernetes Podcast - Wed, 10/01/2025 - 05:19

Today we talk to Antonio Ojea. Antonio is a software engineer at Google and one of the core maintainers of Kubernetes. He is one of the Tech Lead of SIG Networking and Testing and a member of the Steering Committee.

 

Do you have something cool to share? Some questions? Let us know:

- web: kubernetespodcast.com

- mail: [email protected]

- twitter: @kubernetespod

- bluesky: @kubernetespodcast.com

 

News of the week Links from the interview
Categories: Kubernetes

Announcing Changed Block Tracking API support (alpha)

Kubernetes Blog - Thu, 09/25/2025 - 09:00

We're excited to announce the alpha support for a changed block tracking mechanism. This enhances the Kubernetes storage ecosystem by providing an efficient way for CSI storage drivers to identify changed blocks in PersistentVolume snapshots. With a driver that can use the feature, you could benefit from faster and more resource-efficient backup operations.

If you're eager to try this feature, you can skip to the Getting Started section.

What is changed block tracking?

Changed block tracking enables storage systems to identify and track modifications at the block level between snapshots, eliminating the need to scan entire volumes during backup operations. The improvement is a change to the Container Storage Interface (CSI), and also to the storage support in Kubernetes itself. With the alpha feature enabled, your cluster can:

  • Identify allocated blocks within a CSI volume snapshot
  • Determine changed blocks between two snapshots of the same volume
  • Streamline backup operations by focusing only on changed data blocks

For Kubernetes users managing large datasets, this API enables significantly more efficient backup processes. Backup applications can now focus only on the blocks that have changed, rather than processing entire volumes.

Note:

As of now, the Changed Block Tracking API is supported only for block volumes and not for file volumes. CSI drivers that manage file-based storage systems will not be able to implement this capability.

Benefits of changed block tracking support in Kubernetes

As Kubernetes adoption grows for stateful workloads managing critical data, the need for efficient backup solutions becomes increasingly important. Traditional full backup approaches face challenges with:

  • Long backup windows: Full volume backups can take hours for large datasets, making it difficult to complete within maintenance windows.
  • High resource utilization: Backup operations consume substantial network bandwidth and I/O resources, especially for large data volumes and data-intensive applications.
  • Increased storage costs: Repetitive full backups store redundant data, causing storage requirements to grow linearly even when only a small percentage of data actually changes between backups.

The Changed Block Tracking API addresses these challenges by providing native Kubernetes support for incremental backup capabilities through the CSI interface.

Key components

The implementation consists of three primary components:

  1. CSI SnapshotMetadata Service API: An API, offered by gRPC, that provides volume snapshot and changed block data.
  2. SnapshotMetadataService API: A Kubernetes CustomResourceDefinition (CRD) that advertises CSI driver metadata service availability and connection details to cluster clients.
  3. External Snapshot Metadata Sidecar: An intermediary component that connects CSI drivers to backup applications via a standardized gRPC interface.

Implementation requirements

Storage provider responsibilities

If you're an author of a storage integration with Kubernetes and want to support the changed block tracking feature, you must implement specific requirements:

  1. Implement CSI RPCs: Storage providers need to implement the SnapshotMetadata service as defined in the CSI specifications protobuf. This service requires server-side streaming implementations for the following RPCs:

    • GetMetadataAllocated: For identifying allocated blocks in a snapshot
    • GetMetadataDelta: For determining changed blocks between two snapshots
  2. Storage backend capabilities: Ensure the storage backend has the capability to track and report block-level changes.

  3. Deploy external components: Integrate with the external-snapshot-metadata sidecar to expose the snapshot metadata service.

  4. Register custom resource: Register the SnapshotMetadataService resource using a CustomResourceDefinition and create a SnapshotMetadataService custom resource that advertises the availability of the metadata service and provides connection details.

  5. Support error handling: Implement proper error handling for these RPCs according to the CSI specification requirements.

Backup solution responsibilities

A backup solution looking to leverage this feature must:

  1. Set up authentication: The backup application must provide a Kubernetes ServiceAccount token when using the Kubernetes SnapshotMetadataService API. Appropriate access grants, such as RBAC RoleBindings, must be established to authorize the backup application ServiceAccount to obtain such tokens.

  2. Implement streaming client-side code: Develop clients that implement the streaming gRPC APIs defined in the schema.proto file. Specifically:

    • Implement streaming client code for GetMetadataAllocated and GetMetadataDelta methods
    • Handle server-side streaming responses efficiently as the metadata comes in chunks
    • Process the SnapshotMetadataResponse message format with proper error handling

    The external-snapshot-metadata GitHub repository provides a convenient iterator support package to simplify client implementation.

  3. Handle large dataset streaming: Design clients to efficiently handle large streams of block metadata that could be returned for volumes with significant changes.

  4. Optimize backup processes: Modify backup workflows to use the changed block metadata to identify and only transfer changed blocks to make backups more efficient, reducing both backup duration and resource consumption.

Getting started

To use changed block tracking in your cluster:

  1. Ensure your CSI driver supports volume snapshots and implements the snapshot metadata capabilities with the required external-snapshot-metadata sidecar
  2. Make sure the SnapshotMetadataService custom resource is registered using CRD
  3. Verify the presence of a SnapshotMetadataService custom resource for your CSI driver
  4. Create clients that can access the API using appropriate authentication (via Kubernetes ServiceAccount tokens)

The API provides two main functions:

  • GetMetadataAllocated: Lists blocks allocated in a single snapshot
  • GetMetadataDelta: Lists blocks changed between two snapshots

What’s next?

Depending on feedback and adoption, the Kubernetes developers hope to push the CSI Snapshot Metadata implementation to Beta in the future releases.

Where can I learn more?

For those interested in trying out this new feature:

How do I get involved?

This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. On behalf of SIG Storage, I would like to offer a huge thank you to the contributors who helped review the design and implementation of the project, including but not limited to the following:

Thank also to everyone who has contributed to the project, including others who helped review the KEP and the CSI spec PR

For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We always welcome new contributors.

The SIG also holds regular Data Protection Working Group meetings. New attendees are welcome to join our discussions.

Categories: CNCF Projects, Kubernetes

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Kubernetes Blog - Mon, 09/22/2025 - 20:00

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Map view showing relationships between resources

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

NodePool default metrics shown with controls to see different frequencies

NodeClaim default metrics shown with controls to see different frequencies

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Pod Placement Decisions data including reason, from, pod, message, and age

Node decision data including Type, Reason, Node, From, Message

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Config editor with validation support

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Node claims data including Name, Status, Instance Type, CPU, Zone, Age, and Actions

Node Pools data including Name, NodeClass, CPU, Memory, Nodes, Status, Age, Actions

EC2 Node Classes data including Name, Cluster, Instance Profile, Status, IAM Role, Age, and Actions

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Pending Pods data including Name, Namespace, Type, Reason, From, and Message

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name Tested Extra provider specific info supported AWS ✅ ✅ Azure ✅ ✅ AlibabaCloud ❌ ❌ Bizfly Cloud ❌ ❌ Cluster API ❌ ❌ GCP ❌ ❌ Proxmox ❌ ❌ Oracle Cloud Infrastructure (OCI) ❌ ❌

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

Categories: CNCF Projects, Kubernetes

Kubernetes v1.34: Pod Level Resources Graduated to Beta

Kubernetes Blog - Mon, 09/22/2025 - 14:30

On behalf of the Kubernetes community, I am thrilled to announce that the Pod Level Resources feature has graduated to Beta in the Kubernetes v1.34 release and is enabled by default! This significant milestone introduces a new layer of flexibility for defining and managing resource allocation for your Pods. This flexibility stems from the ability to specify CPU and memory resources for the Pod as a whole. Pod level resources can be combined with the container-level specifications to express the exact resource requirements and limits your application needs.

Pod-level specification for resources

Until recently, resource specifications that applied to Pods were primarily defined at the individual container level. While effective, this approach sometimes required duplicating or meticulously calculating resource needs across multiple containers within a single Pod. As a beta feature, Kubernetes allows you to specify the CPU, memory and hugepages resources at the Pod-level. This means you can now define resource requests and limits for an entire Pod, enabling easier resource sharing without requiring granular, per-container management of these resources where it's not needed.

Why does Pod-level specification matter?

This feature enhances resource management in Kubernetes by offering flexible resource management at both the Pod and container levels.

  • It provides a consolidated approach to resource declaration, reducing the need for meticulous, per-container management, especially for Pods with multiple containers.

  • Pod-level resources enable containers within a pod to share unused resoures amongst themselves, promoting efficient utilization within the pod. For example, it prevents sidecar containers from becoming performance bottlenecks. Previously, a sidecar (e.g., a logging agent or service mesh proxy) hitting its individual CPU limit could be throttled and slow down the entire Pod, even if the main application container had plenty of spare CPU. With pod-level resources, the sidecar and the main container can share Pod's resource budget, ensuring smooth operation during traffic spikes - either the whole Pod is throttled or all containers work.

  • When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This gives you – and cluster administrators - a powerful way to enforce overall resource boundaries for your Pods.

    For scheduling, if a pod-level request is explicitly defined, the scheduler uses that specific value to find a suitable node, insteaf of the aggregated requests of the individual containers. At runtime, the pod-level limit acts as a hard ceiling for the combined resource usage of all containers. Crucially, this pod-level limit is the absolute enforcer; even if the sum of the individual container limits is higher, the total resource consumption can never exceed the pod-level limit.

  • Pod-level resources are prioritized in influencing the Quality of Service (QoS) class of the Pod.

  • For Pods running on Linux nodes, the Out-Of-Memory (OOM) score adjustment calculation considers both pod-level and container-level resources requests.

  • Pod-level resources are designed to be compatible with existing Kubernetes functionalities, ensuring a smooth integration into your workflows.

How to specify resources for an entire Pod

Using PodLevelResources feature gate requires Kubernetes v1.34 or newer for all cluster components, including the control plane and every node. This feature gate is in beta and enabled by default in v1.34.

Example manifest

You can specify CPU, memory and hugepages resources directly in the Pod spec manifest at the resources field for the entire Pod.

Here’s an example demonstrating a Pod with both CPU and memory requests and limits defined at the Pod level:

apiVersion: v1
kind: Pod
metadata:
 name: pod-resources-demo
 namespace: pod-resources-example
spec:
 # The 'resources' field at the Pod specification level defines the overall
 # resource budget for all containers within this Pod combined.
 resources: # Pod-level resources
 # 'limits' specifies the maximum amount of resources the Pod is allowed to use.
 # The sum of the limits of all containers in the Pod cannot exceed these values.
 limits:
 cpu: "1" # The entire Pod cannot use more than 1 CPU core.
 memory: "200Mi" # The entire Pod cannot use more than 200 MiB of memory.
 # 'requests' specifies the minimum amount of resources guaranteed to the Pod.
 # This value is used by the Kubernetes scheduler to find a node with enough capacity.
 requests:
 cpu: "1" # The Pod is guaranteed 1 CPU core when scheduled.
 memory: "100Mi" # The Pod is guaranteed 100 MiB of memory when scheduled.
 containers:
 - name: main-app-container
 image: nginx
 ...
 # This container has no resource requests or limits specified.
 - name: auxiliary-container
 image: fedora
 command: ["sleep", "inf"]
 ...
 # This container has no resource requests or limits specified.

In this example, the pod-resources-demo Pod as a whole requests 1 CPU and 100 MiB of memory, and is limited to 1 CPU and 200 MiB of memory. The containers within will operate under these overall Pod-level constraints, as explained in the next section.

Interaction with container-level resource requests or limits

When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This means the node allocates resources based on the pod-level specifications.

Consider a Pod with two containers where pod-level CPU and memory requests and limits are defined, and only one container has its own explicit resource definitions:

apiVersion: v1
kind: Pod
metadata:
 name: pod-resources-demo
 namespace: pod-resources-example
spec:
 resources:
 limits:
 cpu: "1"
 memory: "200Mi"
 requests:
 cpu: "1"
 memory: "100Mi"
 containers:
 - name: main-app-container
 image: nginx
 resources:
 requests:
 cpu: "0.5"
 memory: "50Mi"
 - name: auxiliary-container
 image: fedora
 command: [ "sleep", "inf"]
 # This container has no resource requests or limits specified.
  • Pod-Level Limits: The pod-level limits (cpu: "1", memory: "200Mi") establish an absolute boundary for the entire Pod. The sum of resources consumed by all its containers is enforced at this ceiling and cannot be surpassed.

  • Resource Sharing and Bursting: Containers can dynamically borrow any unused capacity, allowing them to burst as needed, so long as the Pod's aggregate usage stays within the overall limit.

  • Pod-Level Requests: The pod-level requests (cpu: "1", memory: "100Mi") serve as the foundational resource guarantee for the entire Pod. This value informs the scheduler's placement decision and represents the minimum resources the Pod can rely on during node-level contention.

  • Container-Level Requests: Container-level requests create a priority system within the Pod's guaranteed budget. Because main-app-container has an explicit request (cpu: "0.5", memory: "50Mi"), it is given precedence for its share of resources under resource pressure over the auxiliary-container, which has no such explicit claim.

Limitations

  • First of all, in-place resize of pod-level resources is not supported for Kubernetes v1.34 (or earlier). Attempting to modify the pod-level resource limits or requests on a running Pod results in an error: the resize is rejected. The v1.34 implementation of Pod level resources focuses on allowing initial declaration of an overall resource envelope, that applies to the entire Pod. That is distinct from in-place pod resize, which (despite what the name might suggest) allows you to make dynamic adjustments to container resource requests and limits, within a running Pod, and potentially without a container restart. In-place resizing is also not yet a stable feature; it graduated to Beta in the v1.33 release.

  • Only CPU, memory, and hugepages resources can be specified at pod-level.

  • Pod-level resources are not supported for Windows pods. If the Pod specification explicitly targets Windows (e.g., by setting spec.os.name: "windows"), the API server will reject the Pod during the validation step. If the Pod is not explicitly marked for Windows but is scheduled to a Windows node (e.g., via a nodeSelector), the Kubelet on that Windows node will reject the Pod during its admission process.

  • The Topology Manager, Memory Manager and CPU Manager do not align pods and containers based on pod-level resources as these resource managers don't currently support pod-level resources.

Getting started and providing feedback

Ready to explore Pod Level Resources feature? You'll need a Kubernetes cluster running version 1.34 or later. Remember to enable the PodLevelResources feature gate across your control plane and all nodes.

As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:

Categories: CNCF Projects, Kubernetes

Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)

Kubernetes Blog - Fri, 09/19/2025 - 14:30

Have you ever made a typo when expanding your persistent volumes in Kubernetes? Meant to specify 2TB but specified 20TiB? This seemingly innocuous problem was kinda hard to fix - and took the project almost 5 years to fix. Automated recovery from storage expansion has been around for a while in beta; however, with the v1.34 release, we have graduated this to general availability.

While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).

What if you make a mistake and then realize immediately? With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested size hadn't finished, you can amend the size requested. Kubernetes will automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the latest size you specified.

I'll walk through an example of how all of this works.

Reducing PVC size to recover from failed expansion

Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously specified 10TB to 100TB - but you make a typo and specify 1000TB.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: myclaim
spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 1000TB # newly specified size - but incorrect!

Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to 1000TB is never going to succeed.

In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size, that is smaller than the mistake, provided it is still larger than the original size of the actual PersistentVolume.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: myclaim
spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 100TB # Corrected size; has to be greater than 10TB.
 # You cannot shrink the volume below its actual size.

This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.

This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it must be still higher than the original size in .status.capacity. Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.

Improved error handling and observability of volume expansion

Implementing what might look like a relatively minor change also required us to almost fully redo how volume expansion works under the hood in Kubernetes. There are new API fields available in PVC objects which you can monitor to observe progress of volume expansion.

Improved observability of in-progress expansion

You can query .status.allocatedResourceStatus['storage'] of a PVC to monitor progress of a volume expansion operation. For a typical block volume, this should transition between ControllerResizeInProgress, NodeResizePending and NodeResizeInProgress and become nil/empty when volume expansion has finished.

If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - ControllerResizeInfeasible or NodeResizeInfeasible.

You can also observe size towards which Kubernetes is working by watching pvc.status.allocatedResources.

Improved error handling and reporting

Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.

Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate pvc.status.conditions with error keys ControllerResizeError or NodeResizeError when volume expansion fails.

Fixes long standing bugs in resizing workflows

This feature also has allowed us to fix long standing bugs in resizing workflow such as Kubernetes issue #115294. If you observe anything broken, please report your bugs to https://github.com/kubernetes/kubernetes/issues, along with details about how to reproduce the problem.

Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA without feedback from @msau42, @jsafrane and @xing-yang.

All of the contributors who worked on this also appreciate the input provided by @thockin and @liggitt at various Kubernetes contributor summits.

Categories: CNCF Projects, Kubernetes

Kubernetes v1.34: DRA Consumable Capacity

Kubernetes Blog - Thu, 09/18/2025 - 14:30

Dynamic Resource Allocation (DRA) is a Kubernetes API for managing scarce resources across Pods and containers. It enables flexible resource requests, going beyond simply allocating N number of devices to support more granular usage scenarios. With DRA, users can request specific types of devices based on their attributes, define custom configurations tailored to their workloads, and even share the same resource among multiple containers or Pods.

In this blog, we focus on the device sharing feature and dive into a new capability introduced in Kubernetes 1.34: DRA consumable capacity, which extends DRA to support finer-grained device sharing.

Background: device sharing via ResourceClaims

From the beginning, DRA introduced the ability for multiple Pods to share a device by referencing the same ResourceClaim. This design decouples resource allocation from specific hardware, allowing for more dynamic and reusable provisioning of devices.

In Kubernetes 1.33, the new support for partitionable devices allowed resource drivers to advertise slices of a device that are available, rather than exposing the entire device as an all-or-nothing resource. This enabled Kubernetes to model shareable hardware more accurately.

But there was still a missing piece: it didn't yet support scenarios where the device driver manages fine-grained, dynamic portions of a device resource — like network bandwidth — based on user demand, or to share those resources independently of ResourceClaims, which are restricted by their spec and namespace.

That’s where consumable capacity for DRA comes in.

Benefits of DRA consumable capacity support

Here's a taste of what you get in a cluster with the DRAConsumableCapacity feature gate enabled.

Device sharing across multiple ResourceClaims or DeviceRequests

Resource drivers can now support sharing the same device — or even a slice of a device — across multiple ResourceClaims or across multiple DeviceRequests.

This means that Pods from different namespaces can simultaneously share the same device, if permitted and supported by the specific DRA driver.

Device resource allocation

Kubernetes extends the allocation algorithm in the scheduler to support allocating a portion of a device's resources, as defined in the capacity field. The scheduler ensures that the total allocated capacity across all consumers never exceeds the device’s total capacity, even when shared across multiple ResourceClaims or DeviceRequests. This is very similar to the way the scheduler allows Pods and containers to share allocatable resources on Nodes; in this case, it allows them to share allocatable (consumable) resources on Devices.

This feature expands support for scenarios where the device driver is able to manage resources within a device and on a per-process basis — for example, allocating a specific amount of memory (e.g., 8 GiB) from a virtual GPU, or setting bandwidth limits on virtual network interfaces allocated to specific Pods. This aims to provide safe and efficient resource sharing.

DistinctAttribute constraint

This feature also introduces a new constraint: DistinctAttribute, which is the complement of the existing MatchAttribute constraint.

The primary goal of DistinctAttribute is to prevent the same underlying device from being allocated multiple times within a single ResourceClaim, which could happen since we are allocating shares (or subsets) of devices. This constraint ensures that each allocation refers to a distinct resource, even if they belong to the same device class.

It is useful for use cases such as allocating network devices connecting to different subnets to expand coverage or provide redundancy across failure domains.

How to use consumable capacity?

DRAConsumableCapacity is introduced as an alpha feature in Kubernetes 1.34. The feature gate DRAConsumableCapacity must be enabled in kubelet, kube-apiserver, kube-scheduler and kube-controller-manager.

--feature-gates=...,DRAConsumableCapacity=true

As a DRA driver developer

As a DRA driver developer writing in Golang, you can make a device within a ResourceSlice allocatable to multiple ResourceClaims (or devices.requests) by setting AllowMultipleAllocations to true.

Device {
 ...
 AllowMultipleAllocations: ptr.To(true),
 ...
}

Additionally, you can define a policy to restrict how each device's Capacity should be consumed by each DeviceRequest by defining RequestPolicy field in the DeviceCapacity. The example below shows how to define a policy that requires a GPU with 40 GiB of memory to allocate at least 5 GiB per request, with each allocation in multiples of 5 GiB.

DeviceCapacity{
 Value: resource.MustParse("40Gi"),
 RequestPolicy: &CapacityRequestPolicy{
 Default: ptr.To(resource.MustParse("5Gi")),
 ValidRange: &CapacityRequestPolicyRange {
 Min: ptr.To(resource.MustParse("5Gi")),
 Step: ptr.To(resource.MustParse("5Gi")),
 }
 }
}

This will be published to the ResourceSlice, as partially shown below:

apiVersion: resource.k8s.io/v1
kind: ResourceSlice
...
spec:
 devices:
 - name: gpu0
 allowMultipleAllocations: true
 capacity:
 memory:
 value: 40Gi
 requestPolicy:
 default: 5Gi
 validRange:
 min: 5Gi
 step: 5Gi

An allocated device with a specified portion of consumed capacity will have a ShareID field set in the allocation status.

claim.Status.Allocation.Devices.Results[i].ShareID

This ShareID allows the driver to distinguish between different allocations that refer to the same device or same statically-partitioned slice but come from different ResourceClaim requests.
It acts as a unique identifier for each shared slice, enabling the driver to manage and enforce resource limits independently across multiple consumers.

As a consumer

As a consumer (or user), the device resource can be requested with a ResourceClaim like this:

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
...
spec:
 devices:
 requests: # for devices
 - name: req0
 exactly:
 - deviceClassName: resource.example.com
 capacity:
 requests: # for resources which must be provided by those devices
 memory: 10Gi

This configuration ensures that the requested device can provide at least 10GiB of memory.

Notably that any resource.example.com device that has at least 10GiB of memory can be allocated. If a device that does not support multiple allocations is chosen, the allocation would consume the entire device. To filter only devices that support multiple allocations, you can define a selector like this:

selectors:
 - cel:
 expression: |-
 device.allowMultipleAllocations == true

Integration with DRA device status

In device sharing, general device information is provided through the resource slice. However, some details are set dynamically after allocation. These can be conveyed using the .status.devices field of a ResourceClaim. That field is only published in clusters where the DRAResourceClaimDeviceStatus feature gate is enabled.

If you do have device status support available, a driver can expose additional device-specific information beyond the ShareID. One particularly useful use case is for virtual networks, where a driver can include the assigned IP address(es) in the status. This is valuable for both network service operations and troubleshooting.

You can find more information by watching our recording at: KubeCon Japan 2025 - Reimagining Cloud Native Networks: The Critical Role of DRA.

What can you do next?

  • Check out the CNI DRA Driver project for an example of DRA integration in Kubernetes networking. Try integrating with network resources like macvlan, ipvlan, or smart NICs.

  • Start enabling the DRAConsumableCapacity feature gate and experimenting with virtualized or partitionable devices. Specify your workloads with consumable capacity (for example: fractional bandwidth or memory).

  • Let us know your feedback:

    • ✅ What worked well?
    • ⚠️ What didn’t?

    If you encountered issues to fix or opportunities to enhance, please file a new issue and reference KEP-5075 there, or reach out via Slack (#wg-device-management).

Conclusion

Consumable capacity support enhances the device sharing capability of DRA by allowing effective device sharing across namespaces, across claims, and tailored to each Pod’s actual needs. It also empowers drivers to enforce capacity limits, improves scheduling accuracy, and unlocks new use cases like bandwidth-aware networking and multi-tenant device sharing.

Try it out, experiment with consumable resources, and help shape the future of dynamic resource allocation in Kubernetes!

Further Reading

Categories: CNCF Projects, Kubernetes

Pages

Subscribe to articles.innovatingtomorrow.net aggregator - Kubernetes