Feed aggregator
Kubernetes v1.35: Introducing Workload Aware Scheduling
Scheduling large workloads is a much more complex and fragile operation than scheduling a single Pod, as it often requires considering all Pods together instead of scheduling each one independently. For example, when scheduling a machine learning batch job, you often need to place each worker strategically, such as on the same rack, to make the entire process as efficient as possible. At the same time, the Pods that are part of such a workload are very often identical from the scheduling perspective, which fundamentally changes how this process should look.
There are many custom schedulers adapted to perform workload scheduling efficiently,
but considering how common and important workload scheduling is to Kubernetes users,
especially in the AI era with the growing number of use cases,
it is high time to make workloads a first-class citizen for kube-scheduler and support them natively.
Workload aware scheduling
The recent 1.35 release of Kubernetes delivered the first tranche of workload aware scheduling improvements. These are part of a wider effort that is aiming to improve scheduling and management of workloads. The effort will span over many SIGs and releases, and is supposed to gradually expand capabilities of the system toward reaching the north star goal, which is seamless workload scheduling and management in Kubernetes including, but not limited to, preemption and autoscaling.
Kubernetes v1.35 introduces the Workload API that you can use to describe the desired shape
as well as scheduling-oriented requirements of the workload. It comes with an initial implementation
of gang scheduling that instructs the kube-scheduler to schedule gang Pods in the all-or-nothing fashion.
Finally, we improved scheduling of identical Pods (that typically make a gang) to speed up the process
thanks to the opportunistic batching feature.
Workload API
The new Workload API resource is part of the scheduling.k8s.io/v1alpha1
API group.
This resource acts as a structured, machine-readable definition of the scheduling requirements
of a multi-Pod application. While user-facing workloads like Jobs define what to run, the Workload resource
determines how a group of Pods should be scheduled and how its placement should be managed
throughout its lifecycle.
A Workload allows you to define a group of Pods and apply a scheduling policy to them.
Here is what a gang scheduling configuration looks like. You can define a podGroup named workers
and apply the gang policy with a minCount of 4.
apiVersion: scheduling.k8s.io/v1alpha1
kind: Workload
metadata:
name: training-job-workload
namespace: some-ns
spec:
podGroups:
- name: workers
policy:
gang:
# The gang is schedulable only if 4 pods can run at once
minCount: 4
When you create your Pods, you link them to this Workload using the new workloadRef field:
apiVersion: v1
kind: Pod
metadata:
name: worker-0
namespace: some-ns
spec:
workloadRef:
name: training-job-workload
podGroup: workers
...
How gang scheduling works
The gang policy enforces all-or-nothing placement. Without gang scheduling,
a Job might be partially scheduled, consuming resources without being able to run,
leading to resource wastage and potential deadlocks.
When you create Pods that are part of a gang-scheduled pod group, the scheduler's GangScheduling
plugin manages the lifecycle independently for each pod group (or replica key):
-
When you create your Pods (or a controller makes them for you), the scheduler blocks them from scheduling, until:
- The referenced Workload object is created.
- The referenced pod group exists in a Workload.
- The number of pending Pods in that group meets your
minCount.
-
Once enough Pods arrive, the scheduler tries to place them. However, instead of binding them to nodes immediately, the Pods wait at a
Permitgate. -
The scheduler checks if it has found valid assignments for the entire group (at least the
minCount).- If there is room for the group, the gate opens, and all Pods are bound to nodes.
- If only a subset of the group pods was successfully scheduled within a timeout (set to 5 minutes), the scheduler rejects all of the Pods in the group. They go back to the queue, freeing up the reserved resources for other workloads.
We'd like to point out that that while this is a first implementation, the Kubernetes project firmly intends to improve and expand the gang scheduling algorithm in future releases. Benefits we hope to deliver include a single-cycle scheduling phase for a whole gang, workload-level preemption, and more, moving towards the north star goal.
Opportunistic batching
In addition to explicit gang scheduling, v1.35 introduces opportunistic batching. This is a Beta feature that improves scheduling latency for identical Pods.
Unlike gang scheduling, this feature does not require the Workload API or any explicit opt-in on the user's part. It works opportunistically within the scheduler by identifying Pods that have identical scheduling requirements (container images, resource requests, affinities, etc.). When the scheduler processes a Pod, it can reuse the feasibility calculations for subsequent identical Pods in the queue, significantly speeding up the process.
Most users will benefit from this optimization automatically, without taking any special steps, provided their Pods meet the following criteria.
Restrictions
Opportunistic batching works under specific conditions. All fields used by the kube-scheduler
to find a placement must be identical between Pods. Additionally, using some features
disables the batching mechanism for those Pods to ensure correctness.
Note that you may need to review your kube-scheduler configuration
to ensure it is not implicitly disabling batching for your workloads.
See the docs for more details about restrictions.
The north star vision
The project has a broad ambition to deliver workload aware scheduling. These new APIs and scheduling enhancements are just the first steps. In the near future, the effort aims to tackle:
- Introducing a workload scheduling phase
- Improved support for multi-node DRA and topology aware scheduling
- Workload-level preemption
- Improved integration between scheduling and autoscaling
- Improved interaction with external workload schedulers
- Managing placement of workloads throughout their entire lifecycle
- Multi-workload scheduling simulations
And more. The priority and implementation order of these focus areas are subject to change. Stay tuned for further updates.
Getting started
To try the workload aware scheduling improvements:
- Workload API: Enable the
GenericWorkloadfeature gate on bothkube-apiserverandkube-scheduler, and ensure thescheduling.k8s.io/v1alpha1API group is enabled. - Gang scheduling: Enable the
GangSchedulingfeature gate onkube-scheduler(requires the Workload API to be enabled). - Opportunistic batching: As a Beta feature, it is enabled by default in v1.35.
You can disable it using the
OpportunisticBatchingfeature gate onkube-schedulerif needed.
We encourage you to try out workload aware scheduling in your test clusters and share your experiences to help shape the future of Kubernetes scheduling. You can send your feedback by:
- Reaching out via Slack (#sig-scheduling).
- Commenting on the workload aware scheduling tracking issue
- Filing a new issue in the Kubernetes repository.
Learn more
- Read the KEPs for Workload API and gang scheduling and Opportunistic batching.
- Track the Workload aware scheduling issue for recent updates.
Are We Ready to Be Governed by Artificial Intelligence?
Artificial Intelligence (AI) overlords are a common trope in science-fiction dystopias, but the reality looks much more prosaic. The technologies of artificial intelligence are already pervading many aspects of democratic government, affecting our lives in ways both large and small. This has occurred largely without our notice or consent. The result is a government incrementally transformed by AI rather than the singular technological overlord of the big screen.
Let us begin with the executive branch. One of the most important functions of this branch of government is to administer the law, including the human services on which so many Americans rely. Many of these programs have long been operated by a mix of humans and machines, even if not previously using modern AI tools such as ...
Friday Squid Blogging: Squid Camouflage
New research:
Abstract: Coleoid cephalopods have the most elaborate camouflage system in the animal kingdom. This enables them to hide from or deceive both predators and prey. Most studies have focused on benthic species of octopus and cuttlefish, while studies on squid focused mainly on the chromatophore system for communication. Camouflage adaptations to the substrate while moving has been recently described in the semi-pelagic oval squid (Sepioteuthis lessoniana). Our current study focuses on the same squid’s complex camouflage to substrate in a stationary, motionless position. We observed disruptive, uniform, and mottled chromatic body patterns, and we identified a threshold of contrast between dark and light chromatic components that simplifies the identification of disruptive chromatic body pattern. We found that arm postural components are related to the squid position in the environment, either sitting directly on the substrate or hovering just few centimeters above the substrate. Several of these context-dependent body patterns have not yet been observed in ...
IoT Hack
Someone hacked an Italian ferry.
It looks like the malware was installed by someone on the ferry, and not remotely.
Urban VPN Proxy Surreptitiously Intercepts AI Chats
This is pretty scary:
Urban VPN Proxy targets conversations across ten AI platforms: ChatGPT, Claude, Gemini, Microsoft Copilot, Perplexity, DeepSeek, Grok (xAI), Meta AI.
For each platform, the extension includes a dedicated “executor” script designed to intercept and capture conversations. The harvesting is enabled by default through hardcoded flags in the extension’s configuration.
There is no user-facing toggle to disable this. The only way to stop the data collection is to uninstall the extension entirely.
[…]
The data collection operates independently of the VPN functionality. Whether the VPN is connected or not, the harvesting runs continuously in the background...
Kubernetes v1.35: Fine-grained Supplemental Groups Control Graduates to GA
On behalf of Kubernetes SIG Node, we are pleased to announce the graduation of fine-grained supplemental groups control to General Availability (GA) in Kubernetes v1.35!
The new Pod field, supplementalGroupsPolicy, was introduced as an opt-in alpha feature for Kubernetes v1.31, and then had graduated to beta in v1.33.
Now, the feature is generally available.
This feature allows you to implement more precise control over supplemental groups in Linux containers that can strengthen the security posture particularly in accessing volumes.
Moreover, it also enhances the transparency of UID/GID details in containers, offering improved security oversight.
If you are planning to upgrade your cluster from v1.32 or an earlier version, please be aware that some behavioral breaking change introduced since beta (v1.33). For more details, see the behavioral changes introduced in beta and the upgrade considerations sections of the previous blog for graduation to beta.
Motivation: Implicit group memberships defined in /etc/group in the container image
Even though the majority of Kubernetes cluster admins/users may not be aware of this,
by default Kubernetes merges group information from the Pod with information defined in /etc/group in the container image.
Here's an example; a Pod manifest that specifies spec.securityContext.runAsUser: 1000, spec.securityContext.runAsGroup: 3000 and spec.securityContext.supplementalGroups: 4000 as part of the Pod's security context.
apiVersion: v1
kind: Pod
metadata:
name: implicit-groups-example
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
supplementalGroups: [4000]
containers:
- name: example-container
image: registry.k8s.io/e2e-test-images/agnhost:2.45
command: [ "sh", "-c", "sleep 1h" ]
securityContext:
allowPrivilegeEscalation: false
What is the result of id command in the example-container container? The output should be similar to this:
uid=1000 gid=3000 groups=3000,4000,50000
Where does group ID 50000 in supplementary groups (groups field) come from, even though 50000 is not defined in the Pod's manifest at all? The answer is /etc/group file in the container image.
Checking the contents of /etc/group in the container image contains something like the following:
user-defined-in-image:x:1000:
group-defined-in-image:x:50000:user-defined-in-image
This shows that the container's primary user 1000 belongs to the group 50000 in the last entry.
Thus, the group membership defined in /etc/group in the container image for the container's primary user is implicitly merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.
What's wrong with it?
The implicitly merged group information from /etc/group in the container image poses a security risk. These implicit GIDs can't be detected or validated by policy engines because there's no record of them in the Pod manifest. This can lead to unexpected access control issues, particularly when accessing volumes (see kubernetes/kubernetes#112879 for details) because file permission is controlled by UID/GIDs in Linux.
Fine-grained supplemental groups control in a Pod: supplementaryGroupsPolicy
To tackle this problem, a Pod's .spec.securityContext now includes supplementalGroupsPolicy field.
This field lets you control how Kubernetes calculates the supplementary groups for container processes within a Pod. The available policies are:
-
Merge: The group membership defined in
/etc/groupfor the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backward compatibility). -
Strict: Only the group IDs specified in
fsGroup,supplementalGroups, orrunAsGroupare attached as supplementary groups to the container processes. Group memberships defined in/etc/groupfor the container's primary user are ignored.
I'll explain how the Strict policy works. The following Pod manifest specifies supplementalGroupsPolicy: Strict:
apiVersion: v1
kind: Pod
metadata:
name: strict-supplementalgroups-policy-example
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
supplementalGroups: [4000]
supplementalGroupsPolicy: Strict
containers:
- name: example-container
image: registry.k8s.io/e2e-test-images/agnhost:2.45
command: [ "sh", "-c", "sleep 1h" ]
securityContext:
allowPrivilegeEscalation: false
The result of id command in the example-container container should be similar to this:
uid=1000 gid=3000 groups=3000,4000
You can see Strict policy can exclude group 50000 from groups!
Thus, ensuring supplementalGroupsPolicy: Strict (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.
Note:
A container with sufficient privileges can change its process identity.
The supplementalGroupsPolicy only affect the initial process identity.
Read on for more details.
Attached process identity in Pod status
This feature also exposes the process identity attached to the first container process of the container
via .status.containerStatuses[].user.linux field. It would be helpful to see if implicit group IDs are attached.
...
status:
containerStatuses:
- name: ctr
user:
linux:
gid: 3000
supplementalGroups:
- 3000
- 4000
uid: 1000
...
Note:
Please note that the values in status.containerStatuses[].user.linux field is the firstly attached
process identity to the first container process in the container. If the container has sufficient privilege
to call system calls related to process identity (e.g. setuid(2), setgid(2) or setgroups(2), etc.), the container process can change its identity. Thus, the actual process identity will be dynamic.
There are several ways to restrict these permissions in containers. We suggest the belows as simple solutions:
- setting
privilege: falseandallowPrivilegeEscalation: falsein your container'ssecurityContext, or - conform your pod to
Restrictedpolicy in Pod Security Standard.
Also, kubelet has no visibility into NRI plugins or container runtime internal workings. Cluster Administrator configuring nodes or highly privilege workloads with the permission of a local administrator may change supplemental groups for any pod. However this is outside of a scope of Kubernetes control and should not be a concern for security-hardened nodes.
Strict policy requires up-to-date container runtimes
The high level container runtime (e.g. containerd, CRI-O) plays a key role for calculating supplementary group ids
that will be attached to the containers. Thus, supplementalGroupsPolicy: Strict requires a CRI runtime that support this feature.
The old behavior (supplementalGroupsPolicy: Merge) can work with a CRI runtime that does not support this feature,
because this policy is fully backward compatible.
Here are some CRI runtimes that support this feature, and the versions you need to be running:
- containerd: v2.0 or later
- CRI-O: v1.31 or later
And, you can see if the feature is supported in the Node's .status.features.supplementalGroupsPolicy field. Please note that this field is different from status.declaredFeatures introduced in KEP-5328: Node Declared Features(formerly Node Capabilities).
apiVersion: v1
kind: Node
...
status:
features:
supplementalGroupsPolicy: true
As container runtimes support this feature universally, various security policies may start enforcing the Strict behavior as more secure. It is the best practice to ensure that your Pods are ready for this enforcement and all supplemental groups are transparently declared in Pod spec, rather than in images.
Getting involved
This enhancement was driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!
How can I learn more?
- Configure a Security Context for a Pod or Container
for the further details of
supplementalGroupsPolicy - KEP-3619: Fine-grained SupplementalGroups control
Denmark Accuses Russia of Conducting Two Cyberattacks
News:
The Danish Defence Intelligence Service (DDIS) announced on Thursday that Moscow was behind a cyber-attack on a Danish water utility in 2024 and a series of distributed denial-of-service (DDoS) attacks on Danish websites in the lead-up to the municipal and regional council elections in November.
The first, it said, was carried out by the pro-Russian group known as Z-Pentest and the second by NoName057(16), which has links to the Russian state.
Slashdot thread.
Kubernetes v1.35: Kubelet Configuration Drop-in Directory Graduates to GA
With the recent v1.35 release of Kubernetes, support for a kubelet configuration drop-in directory is generally available. The newly stable feature simplifies the management of kubelet configuration across large, heterogeneous clusters.
With v1.35, the kubelet command line argument --config-dir is production-ready and fully supported,
allowing you to specify a directory containing kubelet configuration drop-in files.
All files in that directory will be automatically merged with your main kubelet configuration.
This allows cluster administrators to maintain a cohesive base configuration for kubelets while enabling targeted customizations for different node groups or use cases, and without complex tooling or manual configuration management.
The problem: managing kubelet configuration at scale
As Kubernetes clusters grow larger and more complex, they often include heterogeneous node pools with different hardware capabilities, workload requirements, and operational constraints. This diversity necessitates different kubelet configurations across node groups—yet managing these varied configurations at scale becomes increasingly challenging. Several pain points emerge:
- Configuration drift: Different nodes may have slightly different configurations, leading to inconsistent behavior
- Node group customization: GPU nodes, edge nodes, and standard compute nodes often require different kubelet settings
- Operational overhead: Maintaining separate, complete configuration files for each node type is error-prone and difficult to audit
- Change management: Rolling out configuration changes across heterogeneous node pools requires careful coordination
Before this support was added to Kubernetes, cluster administrators had to choose between using a single monolithic configuration file for all nodes, manually maintaining multiple complete configuration files, or relying on separate tooling. Each approach had its own drawbacks. This graduation to stable gives cluster administrators a fully supported fourth way to solve that challenge.
Example use cases
Managing heterogeneous node pools
Consider a cluster with multiple node types: standard compute nodes, high-capacity nodes (such as those with GPUs or large amounts of memory), and edge nodes with specialized requirements.
Base configuration
File: 00-base.conf
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- "10.96.0.10"
clusterDomain: cluster.local
High-capacity node override
File: 50-high-capacity-nodes.conf
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 50
systemReserved:
memory: "4Gi"
cpu: "1000m"
Edge node override
File: 50-edge-nodes.conf (edge compute typically has lower capacity)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
memory.available: "500Mi"
nodefs.available: "5%"
With this structure, high-capacity nodes apply both the base configuration and the capacity-specific overrides, while edge nodes apply the base configuration with edge-specific settings.
Gradual configuration rollouts
When rolling out configuration changes, you can:
- Add a new drop-in file with a high numeric prefix (e.g.,
99-new-feature.conf) - Test the changes on a subset of nodes
- Gradually roll out to more nodes
- Once stable, merge changes into the base configuration
Viewing the merged configuration
Since configuration is now spread across multiple files, you can inspect the final merged configuration using the kubelet's /configz endpoint:
# Start kubectl proxy
kubectl proxy
# In another terminal, fetch the merged configuration
# Change the '<node-name>' placeholder before running the curl command
curl -X GET http://127.0.0.1:8001/api/v1/nodes/<node-name>/proxy/configz | jq .
This shows the actual configuration the kubelet is using after all merging has been applied. The merged configuration also includes any configuration settings that were specified via kubelet command-line arguments.
For detailed setup instructions, configuration examples, and merging behavior, see the official documentation:
Good practices
When using the kubelet configuration drop-in directory:
-
Test configurations incrementally: Always test new drop-in configurations on a subset of nodes before rolling out cluster-wide to minimize risk
-
Version control your drop-ins: Store your drop-in configuration files in version control (or the configuration source from which these are generated) alongside your infrastructure as code to track changes and enable easy rollbacks
-
Use numeric prefixes for predictable ordering: Name files with numeric prefixes (e.g.,
00-,50-,90-) to explicitly control merge order and make the configuration layering obvious to other administrators -
Be mindful of temporary files: Some text editors automatically create backup files (such as
.bak,.swp, or files with~suffix) in the same directory when editing. Ensure these temporary or backup files are not left in the configuration directory, as they may be processed by the kubelet
Acknowledgments
This feature was developed through the collaborative efforts of SIG Node. Special thanks to all contributors who helped design, implement, test, and document this feature across its journey from alpha in v1.28, through beta in v1.30, to GA in v1.35.
To provide feedback on this feature, join the Kubernetes Node Special Interest Group, participate in discussions on the public Slack channel (#sig-node), or file an issue on GitHub.
Get involved
If you have feedback or questions about kubelet configuration management, or want to share your experience using this feature, join the discussion:
- SIG Node community page
- Kubernetes Slack in the #sig-node channel
- SIG Node mailing list
SIG Node would love to hear about your experiences using this feature in production!
Microsoft Is Finally Killing RC4
After twenty-six years, Microsoft is finally upgrading the last remaining instance of the encryption algorithm RC4 in Windows.
of the most visible holdouts in supporting RC4 has been Microsoft. Eventually, Microsoft upgraded Active Directory to support the much more secure AES encryption standard. But by default, Windows servers have continued to respond to RC4-based authentication requests and return an RC4-based response. The RC4 fallback has been a favorite weakness hackers have exploited to compromise enterprise networks. Use of RC4 played a ...
Kubernetes 1.35: Timbernetes, with Drew Hagen
Drew Hagen, the release lead for Kubernetes 1.35, discusses the theme of the release, Timbernetes, which symbolizes resilience and diversity in the Kubernetes community. He shares insights from his experience as a release lead, highlights key features and enhancements in the new version, and addresses the importance of coordination in release management. Drew also touches on the deprecations in the release and the future of Kubernetes, including its applications in edge computing.
Do you have something cool to share? Some questions? Let us know:
- web: kubernetespodcast.com
- mail: [email protected]
- twitter: @kubernetespod
- bluesky: @kubernetespodcast.com
Links from the interview
Avoiding Zombie Cluster Members When Upgrading to etcd v3.6
This article is a mirror of an original that was recently published to the official etcd blog. The key takeaway? Always upgrade to etcd v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired, and avoids zombie members.
Issue summary
Recently, the etcd community addressed an issue that may appear when users upgrade from v3.5 to v3.6. This bug can cause the cluster to report "zombie members", which are etcd nodes that were removed from the database cluster some time ago, and are re-appearing and joining database consensus. The etcd cluster is then inoperable until these zombie members are removed.
In etcd v3.5 and earlier, the v2store was the source of truth for membership data, even though the v3store was also present. As a part of our v2store deprecation plan, in v3.6 the v3store is the source of truth for cluster membership. Through a bug report we found out that, in some older clusters, v2store and v3store could become inconsistent. This inconsistency manifests after upgrading as seeing old, removed "zombie" cluster members re-appearing in the cluster.
The fix and upgrade path
We’ve added a mechanism in etcd v3.5.26 to automatically sync v3store from v2store, ensuring that affected clusters are repaired before upgrading to 3.6.x.
To support the many users currently upgrading to 3.6, we have provided the following safe upgrade path:
- Upgrade your cluster to v3.5.26 or later.
- Wait and confirm that all members are healthy post-update.
- Upgrade to v3.6.
We are unable to provide a safe workaround path for users who have some obstacle preventing updating to v3.5.26. As such, if v3.5.26 is not available from your packaging source or vendor, you should delay upgrading to v3.6 until it is.
Additional technical detail
Information below is offered for reference only. Users can follow the safe upgrade path without knowledge of the following details.
This issue is encountered with clusters that have been running in production on etcd v3.5.25 or earlier. It is a side effect of adding and removing members from the cluster, or recovering the cluster from failure. This means that the issue is more likely the older the etcd cluster is, but it cannot be ruled out for any user regardless of the age of the cluster.
etcd maintainers, working with issue reporters, have found three possible triggers for the issue based on symptoms and an analysis of etcd code and logs:
- Bug in
etcdctl snapshot restore(v3.4 and old versions): When restoring a snapshot usingetcdctl snapshot restore, etcdctl was supposed to remove existing members before adding the new ones. In v3.4, due to a bug, old members were not removed, resulting in zombie members. Refer to the comment on etcdctl. --force-new-clusterin v3.5 and earlier versions: In rare cases, forcibly creating a new single-member cluster did not fully remove old members, leaving zombies. The issue was resolved in v3.5.22. Please refer to this PR in the Raft project for detailed technical information.- --unsafe-no-sync enabled: If
--unsafe-no-syncis enabled, in rare cases etcd might persist a membership change to v3store but crash before writing it to the WAL, causing inconsistency between v2store and v3store. This is a problem for single-member clusters. For multi-member clusters, forcibly creating a new single-member cluster from the crashed node’s data may lead to zombie members.
Note
--unsafe-no-sync is generally not recommended, as it may break the guarantees given by the consensus protocol.
Importantly, there may be other triggers for v2store and v3store membership data becoming inconsistent that we have not yet found. This means that you cannot assume that you are safe just because you have not performed any of the three actions above. Once users are upgraded to etcd v3.6, v3store becomes the source of membership data, and further inconsistency is not possible.
Advanced users who want to verify the consistency between v2store and v3store can follow the steps described in this comment. This check is not required to fix the issue, nor does SIG etcd recommend bypassing the v3.5.26 update regardless of the results of the check.
Key takeaway
Always upgrade to v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired and avoids zombie members.
Acknowledgements
We would like to thank Christian Baumann for reporting this long-standing upgrade issue. His report and follow-up work helped bring the issue to our attention so that we could investigate and resolve it upstream.
Friday Squid Blogging: Petting a Squid
Video from Reddit shows what could go wrong when you try to pet a—looks like a Humboldt—squid.
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
Kubernetes 1.35: In-Place Pod Resize Graduates to Stable
This release marks a major step: more than 6 years after its initial conception, the In-Place Pod Resize feature (also known as In-Place Pod Vertical Scaling), first introduced as alpha in Kubernetes v1.27, and graduated to beta in Kubernetes v1.33, is now stable (GA) in Kubernetes 1.35!
This graduation is a major milestone for improving resource efficiency and flexibility for workloads running on Kubernetes.
What is in-place Pod Resize?
In the past, the CPU and memory resources allocated to a container in a Pod were immutable. This meant changing them required deleting and recreating the entire Pod. For stateful services, batch jobs, or latency-sensitive workloads, this was an incredibly disruptive operation.
In-Place Pod Resize makes CPU and memory requests and limits mutable, allowing you to adjust these resources within a running Pod, often without requiring a container restart.
Key Concept:
- Desired Resources: A container's
spec.containers[*].resourcesfield now represents the desired resources. For CPU and memory, these fields are now mutable. - Actual Resources: The
status.containerStatuses[*].resourcesfield reflects the resources currently configured for a running container. - Triggering a Resize: You can request a resize by updating the desired
requestsandlimitsin the Pod's specification by utilizing the newresizesubresource.
How can I start using in-place Pod Resize?
Detailed usage instructions and examples are provided in the official documentation: Resize CPU and Memory Resources assigned to Containers.
How does this help me?
In-place Pod Resize is a foundational building block that unlocks seamless, vertical autoscaling and improvements to workload efficiency.
- Resources adjusted without disruption Workloads sensitive to latency or restarts can have their resources modified in-place without downtime or loss of state.
- More powerful autoscaling Autoscalers are now empowered to adjust resources and with less
impact. For example, Vertical Pod Autoscaler (VPA)'s
InPlaceOrRecreateupdate mode, which leverages this feature, has graduated to beta. This allows resources to be adjusted automatically and seamlessly based on usage with minimal disruption.- See AEP-4016 for more details.
- Address transient resource needs Workloads that temporarily need more resources can be adjusted quickly. This enables features like the CPU Startup Boost (AEP-7862) where applications can request more CPU during startup and then automatically scale back down.
Here are a few examples of some use cases:
- A game server that needs to adjust its size with shifting player count.
- A pre-warmed worker that can be shrunk while unused but inflated with the first request.
- Dynamically scale with load for efficient bin-packing.
- Increased resources for JIT compilation on startup.
Changes between beta (1.33) and stable (1.35)
Since the initial beta in v1.33, development effort has primarily been around stabilizing the feature and improving its usability based on community feedback. Here are the primary changes for the stable release:
- Memory limit decrease Decreasing memory limits was previously prohibited. This restriction has been lifted, and memory limit decreases are now permitted. The Kubelet attempts to prevent OOM-kills by allowing the resize only if the current memory usage is below the new desired limit. However, this check is best-effort and not guaranteed.
- Prioritized resizes If a node doesn't have enough room to accept all resize requests, Deferred resizes
are reattempted based on the following priority:
- PriorityClass
- QoS class
- Duration Deferred, with older requests prioritized first.
- Pod Level Resources (Alpha) Support for in-place Pod Resize with Pod Level Resources has been introduced behind its own feature gate, which is alpha in v1.35.
- Increased observability: There are now new Kubelet metrics and Pod events specifically associated with In-Place Pod Resize to help users track and debug resource changes.
What's next?
The graduation of In-Place Pod Resize to stable opens the door for powerful integrations across the Kubernetes ecosystem. There are several areas for futher improvement that are currently planned.
Integration with autoscalers and other projects
There are planned integrations with several autoscalers and other projects to improve workload efficiency at a larger scale. Some projects under discussion:
- VPA CPU startup boost (AEP-7862): Allows applications to request more CPU at startup and scale back down after a specific period of time.
- VPA Support for in-place updates (AEP-4016): VPA support for
InPlaceOrRecreatehas recently graduated to beta, with the eventual goal being to graduate the feature to stable. Support forInPlacemode is still being worked on; see this pull request. - Ray autoscaler: Plans to leverage In-Place Pod Resize to improve workload efficiency. See this Google Cloud blog post for more details.
- Agent-sandbox "Soft-Pause": Investigating leveraging in-place Pod Resize for better improved latency. See the Github issue for more details.
- Runtime support: Java and Python runtimes do not support resizing memory without restart. There is an open conversation with the Java developers, see the bug.
If you have a project that could benefit from integration with in-place pod resize, please reach out using the channels listed in the feedback section!
Feature expansion
Today, In-Place Pod Resize is prohibited when used in combination with: swap, the static CPU Manager, and the static Memory Manager. Additionally, resources other than CPU and memory are still immutable. Expanding the set of supported features and resources is under consideration as more feedback about community needs comes in.
There are also plans to support workload preemption; if there is not enough room on the node for the resize of a high priority pod, the goal is to enable policies to automatically evict a lower-priority pod or upsize the node.
Improved stability
-
Resolve kubelet-scheduler race conditions There are known race conditions between the kubelet and scheduler with regards to in-place pod resize. Work is underway to resolve these issues over the next few releases. See the issue for more details.
-
Safer memory limit decrease The Kubelet's best-effort check for OOM-kill prevention can be made even safer by moving the memory usage check into the container runtime itself. See the issue for more details.
Providing feedback
Looking to further build on this foundational feature, please share your feedback on how to improve and extend this feature. You can share your feedback through GitHub issues, mailing lists, or Slack channels related to the Kubernetes #sig-node and #sig-autoscaling communities.
Thank you to everyone who contributed to making this long-awaited feature a reality!
Dismantling Defenses: Trump 2.0 Cyber Year in Review
The Trump administration has pursued a staggering range of policy pivots this past year that threaten to weaken the nation’s ability and willingness to address a broad spectrum of technology challenges, from cybersecurity and privacy to countering disinformation, fraud and corruption. These shifts, along with the president’s efforts to restrict free speech and freedom of the press, have come at such a rapid clip that many readers probably aren’t even aware of them all.
FREE SPEECH
President Trump has repeatedly claimed that a primary reason he lost the 2020 election was that social media and Big Tech companies had conspired to silence conservative voices and stifle free speech. Naturally, the president’s impulse in his second term has been to use the levers of the federal government in an effort to limit the speech of everyday Americans, as well as foreigners wishing to visit the United States.
In September, Donald Trump signed a national security directive known as NSPM-7, which directs federal law enforcement officers and intelligence analysts to target “anti-American” activity, including any “tax crimes” involving extremist groups who defrauded the IRS. According to extensive reporting by journalist Ken Klippenstein, the focus of the order is on those expressing “opposition to law and immigration enforcement; extreme views in favor of mass migration and open borders; adherence to radical gender ideology,” as well as “anti-Americanism,” “anti-capitalism,” and “anti-Christianity.”
Earlier this month, Attorney General Pam Bondi issued a memo advising the FBI to compile a list of Americans whose activities “may constitute domestic terrorism.” Bondi also ordered the FBI to establish a “cash reward system” to encourage the public to report suspected domestic terrorist activity. The memo states that domestic terrorism could include “opposition to law and immigration enforcement” or support for “radical gender ideology.”
The Trump administration also is planning to impose social media restrictions on tourists as the president continues to ramp up travel restrictions for foreign visitors. According to a notice from U.S. Customs and Border Protection (CBP), tourists — including those from Britain, Australia, France, and Japan — will soon be required to provide five years of their social media history.
The CBP said it will also collect “several high value data fields,” including applicants’ email addresses from the past 10 years, their telephone numbers used in the past five years, and names and details of family members. Wired reported in October that the US CBP executed more device searches at the border in the first three months of the year than any other previous quarter.
The new requirements from CBP add meat to the bones of Executive Order 14161, which in the name of combating “foreign terrorist and public safety threats” granted broad new authority that civil rights groups warn could enable a renewed travel ban and expanded visa denials or deportations based on perceived ideology. Critics alleged the order’s vague language around “public safety threats,” creates latitude for targeting individuals based on political views, national origin, or religion. At least 35 nations are now under some form of U.S. travel restrictions.
CRIME AND CORRUPTION
In February, Trump ordered executive branch agencies to stop enforcing the U.S. Foreign Corrupt Practices Act, which froze foreign bribery investigations, and even allows for “remedial actions” of past enforcement actions deemed “inappropriate.”
The White House also disbanded the Kleptocracy Asset Recovery Initiative and KleptoCapture Task Force — units which proved their value in corruption cases and in seizing the assets of sanctioned Russian oligarchs — and diverted resources away from investigating white-collar crime.
Also in February, Attorney General Pam Bondi dissolved the FBI’s Foreign Influence Task Force, an entity created during Trump’s first term designed to counter the influence of foreign governments on American politics.
In March 2025, Reuters reported that several U.S. national security agencies had halted work on a coordinated effort to counter Russian sabotage, disinformation and cyberattacks. Former President Joe Biden had ordered his national security team to establish working groups to monitor the issue amid warnings from U.S. intelligence that Russia was escalating a shadow war against Western nations.
In a test of prosecutorial independence, Trump’s Justice Department ordered prosecutors to drop the corruption case against New York Mayor Eric Adams. The fallout was immediate: Multiple senior officials resigned in protest, the case was reassigned, and chaos engulfed the Southern District of New York (SDNY) – historically one of the nation’s most aggressive offices for pursuing public corruption, white-collar crime, and cybercrime cases.
When it comes to cryptocurrency, the administration has shifted regulators at the U.S. Securities and Exchange Commission (SEC) away from enforcement to cheerleading an industry that has consistently been plagued by scams, fraud and rug-pulls. The SEC in 2025 systematically retreated from enforcement against cryptocurrency operators, dropping major cases against Coinbase, Binance, and others.
Perhaps the most troubling example involves Justin Sun, the Chinese-born founder of crypto currency company Tron. In 2023, the SEC charged Sun with fraud and market manipulation. Sun subsequently invested $75 million in the Trump family’s World Liberty Financial (WLF) tokens, became the top holder of the $TRUMP memecoin, and secured a seat at an exclusive dinner with the president.
In late February 2025, the SEC dropped its lawsuit. Sun promptly took Tron public through a reverse merger arranged by Dominari Securities, a firm with Trump family ties. Democratic lawmakers have urged the SEC to investigate what they call “concerning ties to President Trump and his family” as potential conflicts of interest and foreign influence.
In October, President Trump pardoned Changpeng Zhao, the founder of the world’s largest cryptocurrency exchange Binance. In 2023, Zhao and his company pled guilty to failing to prevent money laundering on the platform. Binance paid a $4 billion fine, and Zhao served a four-month sentence. As CBS News observed last month, shortly after Zhao’s pardon application, he was at the center of a blockbuster deal that put the Trump’s family’s WLF on the map.
“Zhao is a citizen of the United Arab Emirates in the Persian Gulf and in May, an Emirati fund put $2 billion in Zhao’s Binance,” 60 Minutes reported. “Of all the currencies in the world, the deal was done in World Liberty crypto.”
SEC Chairman Paul Atkins has made the agency’s new posture towards crypto explicit, stating “most crypto tokens are not securities.” At the same time, President Trump has directed the Department of Labor and the SEC to expand 401(k) access to private equity and crypto — assets that regulators have historically restricted for retail investors due to high risk, fees, opacity, and illiquidity. The executive order explicitly prioritizes “curbing ERISA litigation,” and reducing accountability for fiduciaries while shifting risk onto ordinary workers’ retirement savings.
At the White House’s behest, the U.S. Treasury in March suspended the Corporate Transparency Act, a law that required companies to reveal their real owners. Finance experts warned the suspension would bring back shell companies and “open the flood gates of dirty money” through the US, such as funds from drug gangs, human traffickers, and fraud groups.
Trump’s clemency decisions have created a pattern of freed criminals committing new offenses, including Jonathan Braun, whose sentence for drug trafficking was commuted during Trump’s first term, was found guilty in 2025 of violating supervised release and faces new charges.
Eliyahu Weinstein, who received a commutation in January 2021 for running a Ponzi scheme, was sentenced in November 2025 to 37 years for running a new Ponzi scheme. The administration has also granted clemency to a growing list of white-collar criminals: David Gentile, a private equity executive sentenced to seven years for securities and wire fraud (functionally a ponzi-like scheme), and Trevor Milton, the Nikola founder sentenced to four years for defrauding investors over electric vehicle technology. The message: Financial crimes against ordinary investors are no big deal.
At least 10 of the January 6 insurrectionists pardoned by President Trump have already been rearrested, charged or sentenced for other crimes, including plotting the murder of FBI agents, child sexual assault, possession of child sexual abuse material and reckless homicide while driving drunk.
The administration also imposed sanctions against the International Criminal Court (ICC). On February 6, 2025, Executive Order 14203 authorized asset freezes and visa restrictions against ICC officials investigating U.S. citizens or allies, primarily in response to the ICC’s arrest warrants for Israeli Prime Minister Benjamin Netanyahu over alleged war crimes in Gaza.
Earlier this month the president launched the “Gold Card,” a visa scheme established by an executive order in September that offers wealthy individuals and corporations expedited paths to U.S. residency and citizenship in exchange for $1 million for individuals and $2 million for companies, plus ongoing fees. The administration says it is also planning to offer a “platinum” version of the card that offers special tax breaks — for a cool $5 million.
FEDERAL CYBERSECURITY
President Trump campaigned for a second term insisting that the previous election was riddled with fraud and had been stolen from him. Shortly after Mr. Trump took the oath of office for a second time, he fired the head of the Cybersecurity and Infrastructure Security Agency (CISA) — Chris Krebs (no relation) — for having the audacity to state publicly that the 2020 election was the most secure in U.S. history.
Mr. Trump revoked Krebs’s security clearances, ordered a Justice Department investigation into his election security work, and suspended the security clearances of employees at SentinelOne, the cybersecurity firm where Krebs worked as chief intelligence and public policy officer. The executive order was the first direct presidential action against any US cybersecurity company. Krebs subsequently resigned from SentinelOne, telling The Wall Street Journal he was leaving to push back on Trump’s efforts “to go after corporate interests and corporate relationships.”
The president also dismissed all 15 members of the Cyber Safety Review Board (CSRB), a nonpartisan government entity established in 2022 with a mandate to investigate the security failures behind major cybersecurity events — likely because those advisors included Chris Krebs.
At the time, the CSRB was in the middle of compiling a much-anticipated report on the root causes of Chinese government-backed digital intrusions into at least nine U.S. telecommunications providers. Not to be outdone, the Federal Communication Commission quickly moved to roll back a previous ruling that required U.S. telecom carriers to implement stricter cybersecurity measures.
Meanwhile, CISA has lost roughly a third of its workforce this year amid mass layoffs and deferred resignations. When the government shutdown began in October, CISA laid off even more employees and furloughed 65 percent of the remaining staff, leaving only 900 employees working without pay.
Additionally, the Department of Homeland Security has reassigned CISA cyber specialists to jobs supporting the president’s deportation agenda. As Bloomberg reported earlier this year, CISA employees were given a week to accept the new roles or resign, and some of the reassignments included relocations to new geographic areas.
The White House has signaled that it plans to cut an additional $491 million from CISA’s budget next year, cuts that primarily target CISA programs focused on international affairs and countering misinformation and foreign propaganda. The president’s budget proposal justified the cuts by repeating debunked claims about CISA engaging in censorship.
The Trump administration has pursued a similar reorganization at the FBI: The Washington Post reported in October that a quarter of all FBI agents have now been reassigned from national security threats to immigration enforcement. Reuters reported last week that the replacement of seasoned leaders at the FBI and Justice Department with Trump loyalists has led to an unprecedented number of prosecutorial missteps, resulting in a 21 percent dismissal rate of the D.C. U.S. attorney’s office criminal complaints over eight weeks, compared to a mere .5% dismissal rate over the prior 10 years.
“These mistakes are causing department attorneys to lose credibility with federal courts, with some judges quashing subpoenas, threatening criminal contempt and issuing opinions that raise questions about their conduct,” Reuters reported. “Grand juries have also in some cases started rejecting indictments, a highly unusual event since prosecutors control what evidence gets presented.”
In August, the DHS banned state and local governments from using cyber grants on services provided by the Multi-State Information Sharing and Analysis Center (MS-ISAC), a group that for more than 20 years has shared critical cybersecurity intelligence across state lines and provided software and other resources at free or heavily discounted rates. Specifically, DHS barred states from spending funds on services offered by the Elections Infrastructure ISAC, which was effectively shuttered after DHS pulled its funding in February.
Cybersecurity Dive reports that the Trump administration’s massive workforce cuts, along with widespread mission uncertainty and a persistent leadership void, have interrupted federal agencies’ efforts to collaborate with the businesses and local utilities that run and protect healthcare facilities, water treatment plans, energy companies and telecommunications networks. The publication said the changes came after the US government eliminated CIPAC — a framework that allowed private companies to share cyber and threat intel without legal penalties.
“Government leaders have canceled meetings with infrastructure operators, forced out their longtime points of contact, stopped attending key industry events and scrapped a coordination program that made companies feel comfortable holding sensitive talks about cyberattacks and other threats with federal agencies,” Cybersecurity Dive’s Eric Geller wrote.
Both the National Security Agency (NSA) and U.S. Cyber Command have been without a leader since Trump dismissed Air Force General Timothy Haugh in April, allegedly for disloyalty to the president and at the suggestion of far-right conspiracy theorist Laura Loomer. The nomination of Army Lt. Gen. William Hartman for the same position fell through in October. The White House has ordered the NSA to cut 8 percent of its civilian workforce (between 1,500 and 2,000 employees).
As The Associated Press reported in August, the Office of the Director of National Intelligence plans to dramatically reduce its workforce and cut its budget by more than $700 million annually. Director of National Intelligence Tulsi Gabbard said the cuts were warranted because ODNI had become “bloated and inefficient, and the intelligence community is rife with abuse of power, unauthorized leaks of classified intelligence, and politicized weaponization of intelligence.”
The firing or forced retirements of so many federal employees has been a boon to foreign intelligence agencies. Chinese intelligence agencies, for example, reportedly moved quickly to take advantage of the mass layoffs, using a network of front companies to recruit laid-off U.S. government employees for “consulting work.” Former workers with the Defense Department’s Defense Digital Service who resigned en-masse earlier this year thanks to DOGE encroaching on their mission have been approached by the United Arab Emirates to work on artificial intelligence for the oil kingdom’s armed forces, albeit reportedly with the blessing of the Trump administration.
PRESS FREEDOM
President Trump has filed multibillion-dollar lawsuits against a number of major news outlets over news segments or interviews that allegedly portrayed him in a negative light, suing the networks ABC, the BBC, the CBS parent company Paramount, The Wall Street Journal, and The New York Times, among others.
The president signed an executive order aimed at slashing public subsidies to PBS and NPR, alleging “bias” in the broadcasters’ reporting. In July, Congress approved a request from Trump to cut $1.1 billion in federal funding for the Corporation for Public Broadcasting, the nonprofit entity that funds PBS and NPR.
Brendan Carr, the president’s pick to run the Federal Communications Commission (FCC), initially pledged to “dismantle the censorship cartel and restore free speech rights for everyday Americans.” But on January 22, 2025, the FCC reopened complaints against ABC, CBS and NBC over their coverage of the 2024 election. The previous FCC chair had dismissed the complaints as attacks on the First Amendment and an attempt to weaponize the agency for political purposes.
President Trump in February seized control of the White House Correspondents’ Association, the nonprofit entity that decides which media outlets should have access to the White House and the press pool that follows the president. The president invited an additional 32 media outlets, mostly conservative or right-wing organizations.
According to the journalism group Poynter.org, there are three religious networks, all of which lean conservative, as well as a mix of outlets that includes a legacy paper, television networks, and a digital outlet powered by artificial intelligence. Trump also barred The Associated Press from the White House over their refusal to refer to the Gulf of Mexico as the Gulf of America.
Under Trump appointee Kari Lake, the U.S. Agency for Global Media moved to dismantle Voice of America, Radio Free Europe/Radio Liberty, and other networks that for decades served as credible news sources behind authoritarian lines. Courts blocked shutdown orders, but the damage continues through administrative leave, contract terminations, and funding disputes.
President Trump this term has fired most of the people involved in processing Freedom of Information Act (FOIA) requests for government agencies. FOIA is an indispensable tool used by journalists and the public to request government records, and to hold leaders accountable.
Petitioning the government, particularly when it ignores your requests, often requires challenging federal agencies in court. But that becomes far more difficult if the most competent law firms start to shy away from cases that may involve crossing the president and his administration. On March 22, the president issued a memorandum that directs heads of the Justice and Homeland Security Departments to “seek sanctions against attorneys and law firms who engage in frivolous, unreasonable and vexatious litigation against the United States,” or in matters that come before federal agencies.
The Trump administration announced increased vetting of applicants for H-1B visas for highly skilled workers, with an internal State Department memo saying that anyone involved in “censorship” of free speech should be considered for rejection.
Executive Order 14161, issued in 2025 on “foreign terrorist and public safety threats,” granted broad new authority that civil rights groups warn could enable a renewed travel ban and expanded visa denials or deportations based on perceived ideology. Critics charged that the order’s vague language around “public safety threats” creates latitude for targeting individuals based on political views, national origin, or religion.
CONSUMER PROTECTION, PRIVACY
At the beginning of this year, President Trump ordered staffers at the Consumer Financial Protection Bureau (CFPB) to stop most work. Created by Congress in 2011 to be a clearinghouse of consumer complaints, the CFPB has sued some of the nation’s largest financial institutions for violating consumer protection laws. The CFPB says its actions have put nearly $18 billion back in Americans’ pockets in the form of monetary compensation or canceled debts, and imposed $4 billion in civil money penalties against violators.
The Trump administration said it planned to fire up to 90 percent of all CFPB staff, but a recent federal appeals court ruling in Washington tossed out an earlier decision that would have allowed the firings to proceed. Reuters reported this week that an employee union and others have battled against it in court for ten months, during which the agency has been almost completely idled.
The CFPB’s acting director is Russell Vought, a key architect of the GOP policy framework Project 2025. Under Vought’s direction, the CFPB in May quietly withdrew a data broker protection rule intended to limit the ability of U.S. data brokers to sell personal information on Americans.
Despite the Federal Reserve’s own post-mortem explicitly blaming Trump-era deregulation for the 2023 Silicon Valley Bank collapse, which triggered a fast-moving crisis requiring emergency weekend bailouts of banks, Trump’s banking regulators in 2025 doubled down. They loosened capital requirements, narrowed definitions of “unsafe” banking practices, and stripped specific risk categories from supervisory frameworks. The setup for another banking crisis requiring taxpayer intervention is now in place.
The Privacy Act of 1974, one of the few meaningful federal privacy laws, was built on the principles of consent and separation in response to the abuses of power that came to light during the Watergate era. The law states that when an individual provides personal information to a federal agency to receive a particular service, that data must be used solely for its original purpose.
Nevertheless, it emerged in June that the Trump administration has built a central database of all US citizens. According to NPR, the White House plans to use the new platform during upcoming elections to verify the identity and citizenship status of US voters. The database was built by the Department of Homeland Security and the Department of Governmental Efficiency and is being rolled out in phases to US states.
DOGE
Probably the biggest ungotten scoop of 2025 is the inside story of what happened to all of the personal, financial and other sensitive data that was accessed by workers at the so-called Department of Government Efficiency (DOGE). President Trump tapped Elon Musk to lead the newly created department, which was mostly populated by current and former employees of Musk’s various technology companies (including a former denizen of the cybercrime community known as the “Com”). It soon emerged that the DOGE team was using artificial intelligence to surveil at least one federal agency’s communications for hostility to Mr. Trump and his agenda.
DOGE employees were able to access and synthesize data taken from a large number of previously separate and highly guarded federal databases, including those at the Social Security Administration, the Department of Homeland Security, the Office of Personnel Management, and the U.S. Department of the Treasury. DOGE staffers did so largely by circumventing or dismantling security measures designed to detect and prevent misuse of federal databases, including standard incident response protocols, auditing, and change-tracking mechanisms.
For example, an IT expert with the National Labor Relations Board (NLRB) alleges that DOGE employees likely downloaded gigabytes of data from agency case files in early March, using short-lived accounts that were configured to leave few traces of network activity. The NLRB whistleblower said the large data outflows coincided with multiple blocked login attempts from addresses in Russia, which attempted to use valid credentials for a newly-created DOGE user account.
The stated goal of DOGE was to reduce bureaucracy and to massively cut costs — mainly by eliminating funding for a raft of federal initiatives that had already been approved by Congress. The DOGE website claimed those efforts reduced “wasteful” and “fraudulent” federal spending by more than $200 billion. However, multiple independent reviews by news organizations determined the true “savings” DOGE achieved was off by a couple of orders of magnitude, and was likely closer to $2 billion.
At the same time DOGE was slashing federal programs, President Trump fired at least 17 inspectors general at federal agencies — the very people tasked with actually identifying and stopping waste, fraud and abuse at the federal level. Those included several agencies (such as the NLRB) that had open investigations into one or more of Mr. Musk’s companies for allegedly failing to comply with protocols aimed at protecting state secrets. In September, a federal judge found the president unlawfully fired the agency watchdogs, but none of them have been reinstated.
Where is DOGE now? Reuters reported last month that as far as the White House is concerned, DOGE no longer exists, even though it technically has more than half a year left to its charter. Meanwhile, who exactly retains access to federal agency data that was fed by DOGE into AI tools is anyone’s guess.
KrebsOnSecurity would like to thank the anonymous researcher NatInfoSec for assisting with the research on this story.
AI Advertising Company Hacked
At least some of this is coming to light:
Doublespeed, a startup backed by Andreessen Horowitz (a16z) that uses a phone farm to manage at least hundreds of AI-generated social media accounts and promote products has been hacked. The hack reveals what products the AI-generated accounts are promoting, often without the required disclosure that these are advertisements, and allowed the hacker to take control of more than 1,000 smartphones that power the company.
The hacker, who asked for anonymity because he feared retaliation from the company, said he reported the vulnerability to Doublespeed on October 31. At the time of writing, the hacker said he still has access to the company’s backend, including the phone farm itself. ...
Turning automation spend into a measurable advantage
Attestation vs. integrity in a zero-trust world
Kubernetes v1.35: Job Managed By Goes GA
In Kubernetes v1.35, the ability to specify an external Job controller (through .spec.managedBy) graduates to General Availability.
This feature allows external controllers to take full responsibility for Job reconciliation, unlocking powerful scheduling patterns like multi-cluster dispatching with MultiKueue.
Why delegate Job reconciliation?
The primary motivation for this feature is to support multi-cluster batch scheduling architectures, such as MultiKueue.
The MultiKueue architecture distinguishes between a Management Cluster and a pool of Worker Clusters:
- The Management Cluster is responsible for dispatching Jobs but not executing them. It needs to accept Job objects to track status, but it skips the creation and execution of Pods.
- The Worker Clusters receive the dispatched Jobs and execute the actual Pods.
- Users usually interact with the Management Cluster. Because the status is automatically propagated back, they can observe the Job's progress "live" without accessing the Worker Clusters.
- In the Worker Clusters, the dispatched Jobs run as regular Jobs managed by the built-in Job controller, with no
.spec.managedByset.
By using .spec.managedBy, the MultiKueue controller on the Management Cluster can take over the reconciliation of a Job. It copies the status from the "mirror" Job running on the Worker Cluster back to the Management Cluster.
Why not just disable the Job controller? While one could theoretically achieve this by disabling the built-in Job controller entirely, this is often impossible or impractical for two reasons:
- Managed Control Planes: In many cloud environments, the Kubernetes control plane is locked, and users cannot modify controller manager flags.
- Hybrid Cluster Role: Users often need a "hybrid" mode where the Management Cluster dispatches some heavy workloads to remote clusters but still executes smaller or control-plane-related Jobs in the Management Cluster.
.spec.managedByallows this granularity on a per-Job basis.
How .spec.managedBy works
The .spec.managedBy field indicates which controller is responsible for the Job, specifically there are two modes of operation:
- Standard: if unset or set to the reserved value
kubernetes.io/job-controller, the built-in Job controller reconciles the Job as usual (standard behavior). - Delegation: If set to any other value, the built-in Job controller skips reconciliation entirely for that Job.
To prevent orphaned Pods or resource leaks, this field is immutable. You cannot transfer a running Job from one controller to another.
If you are looking into implementing an external controller, be aware that your controller needs to be conformant with the definitions for the Job API. In order to enforce the conformance, a significant part of the effort was to introduce the extensive Job status validation rules. Navigate to the How can you learn more? section for more details.
Ecosystem Adoption
The .spec.managedBy field is rapidly becoming the standard interface for delegating control in the Kubernetes batch ecosystem.
Various custom workload controllers are adding this field (or an equivalent) to allow MultiKueue to take over their reconciliation and orchestrate them across clusters:
While it is possible to use .spec.managedBy to implement a custom Job controller from scratch, we haven't observed that yet. The feature is specifically designed to support delegation patterns, like MultiKueue, without reinventing the wheel.
How can you learn more?
If you want to dig deeper:
Read the user-facing documentation for:
Deep dive into the design history:
- The Kubernetes Enhancement Proposal (KEP) Job's managed-by mechanism including introduction of the extensive Job status validation rules.
- The Kueue KEP for MultiKueue.
Explore how MultiKueue uses .spec.managedBy in practice in the task guide for running Jobs across clusters.
Acknowledgments
As with any Kubernetes feature, a lot of people helped shape this one through design discussions, reviews, test runs, and bug reports.
We would like to thank, in particular:
- Maciej Szulik - for guidance, mentorship, and reviews.
- Filip Křepinský - for guidance, mentorship, and reviews.
Get involved
This work was sponsored by the Kubernetes Batch Working Group in close collaboration with the SIG Apps, and with strong input from the SIG Scheduling community.
If you are interested in batch scheduling, multi-cluster solutions, or further improving the Job API:
- Join us in the Batch WG and SIG Apps meetings.
- Subscribe to the WG Batch Slack channel.
Someone Boarded a Plane at Heathrow Without a Ticket or Passport
I’m sure there’s a story here:
Sources say the man had tailgated his way through to security screening and passed security, meaning he was not detected carrying any banned items.
The man deceived the BA check-in agent by posing as a family member who had their passports and boarding passes inspected in the usual way.
Cilium releases 2025 annual report: A decade of cloud native networking
A decade on from its first commit in 2015, 2025 marks a significant milestone for the Cilium project. The community has published the 2025 Cilium Annual Report: A Decade of Cloud Native Networking, which reflects on the project’s evolution, key milestones, and notable developments over the past year.
What began as an experimental container networking effort has grown into a mature, widely adopted platform, bringing together cloud native networking, observability, and security through an eBPF-based architecture.As Cilium enters its second decade, the community continues to grow in both size and momentum, with sustained high-volume development, widespread production adoption, and expanding use cases including virtual machines and large-scale AI infrastructure.
We invite you to explore the 2025 Annual Report and celebrate a decade of cloud native networking with the community.
For any questions or feedback, please reach out to [email protected].