You are here

Feed aggregator

Introducing Node Readiness Controller

Kubernetes Blog - Mon, 02/02/2026 - 21:00
Logo for node readiness controller

In the standard Kubernetes model, a node’s suitability for workloads hinges on a single binary "Ready" condition. However, in modern Kubernetes environments, nodes require complex infrastructure dependencies—such as network agents, storage drivers, GPU firmware, or custom health checks—to be fully operational before they can reliably host pods.

Today, on behalf of the Kubernetes project, I am announcing the Node Readiness Controller. This project introduces a declarative system for managing node taints, extending the readiness guardrails during node bootstrapping beyond standard conditions. By dynamically managing taints based on custom health signals, the controller ensures that workloads are only placed on nodes that met all infrastructure-specific requirements.

Why the Node Readiness Controller?

Core Kubernetes Node "Ready" status is often insufficient for clusters with sophisticated bootstrapping requirements. Operators frequently struggle to ensure that specific DaemonSets or local services are healthy before a node enters the scheduling pool.

The Node Readiness Controller fills this gap by allowing operators to define custom scheduling gates tailored to specific node groups. This enables you to enforce distinct readiness requirements across heterogeneous clusters, ensuring for example, that GPU equipped nodes only accept pods once specialized drivers are verified, while general purpose nodes follow a standard path.

It provides three primary advantages:

  • Custom Readiness Definitions: Define what ready means for your specific platform.
  • Automated Taint Management: The controller automatically applies or removes node taints based on condition status, preventing pods from landing on unready infrastructure.
  • Declarative Node Bootstrapping: Manage multi-step node initialization reliably, with a clear observability into the bootstrapping process.

Core concepts and features

The controller centers around the NodeReadinessRule (NRR) API, which allows you to define declarative gates for your nodes.

Flexible enforcement modes

The controller supports two distinct operational modes:

Continuous enforcement
Actively maintains the readiness guarantee throughout the node’s entire lifecycle. If a critical dependency (like a device driver) fails later, the node is immediately tainted to prevent new scheduling.
Bootstrap-only enforcement
Specifically for one-time initialization steps, such as pre-pulling heavy images or hardware provisioning. Once conditions are met, the controller marks the bootstrap as complete and stops monitoring that specific rule for the node.

Condition reporting

The controller reacts to Node Conditions rather than performing health checks itself. This decoupled design allows it to integrate seamlessly with other tools existing in the ecosystem as well as custom solutions:

  • Node Problem Detector (NPD): Use existing NPD setups and custom scripts to report node health.
  • Readiness Condition Reporter: A lightweight agent provided by the project that can be deployed to periodically check local HTTP endpoints and patch node conditions accordingly.

Operational safety with dry run

Deploying new readiness rules across a fleet carries inherent risk. To mitigate this, dry run mode allows operators to first simulate impact on the cluster. In this mode, the controller logs intended actions and updates the rule's status to show affected nodes without applying actual taints, enabling safe validation before enforcement.

Example: CNI bootstrapping

The following NodeReadinessRule ensures a node remains unschedulable until its CNI agent is functional. The controller monitors a custom cniplugin.example.net/NetworkReady condition and only removes the readiness.k8s.io/acme.com/network-unavailable taint once the status is True.

apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
 name: network-readiness-rule
spec:
 conditions:
 - type: "cniplugin.example.net/NetworkReady"
 requiredStatus: "True"
 taint:
 key: "readiness.k8s.io/acme.com/network-unavailable"
 effect: "NoSchedule"
 value: "pending"
 enforcementMode: "bootstrap-only"
 nodeSelector:
 matchLabels:
 node-role.kubernetes.io/worker: ""

Demo:

Getting involved

The Node Readiness Controller is just getting started, with our initial releases out, and we are seeking community feedback to refine the roadmap. Following our productive Unconference discussions at KubeCon NA 2025, we are excited to continue the conversation in person.

Join us at KubeCon + CloudNativeCon Europe 2026 for our maintainer track session: Addressing Non-Deterministic Scheduling: Introducing the Node Readiness Controller.

In the meantime, you can contribute or track our progress here:

Categories: CNCF Projects, Kubernetes

You own your availability: resilience in the age of third-party dependencies

Fastly Blog (Security) - Mon, 02/02/2026 - 19:00
When third-party services fail, your website — and revenue — can fail with them. Learn how to uncover hidden dependencies and use edge proxying to maintain uptime, performance, and control during outages.
Categories: Software Security

Please Don’t Feed the Scattered Lapsus ShinyHunters

Krebs on Security - Mon, 02/02/2026 - 11:15

A prolific data ransom gang that calls itself Scattered Lapsus ShinyHunters (SLSH) has a distinctive playbook when it seeks to extort payment from victim firms: Harassing, threatening and even swatting executives and their families, all while notifying journalists and regulators about the extent of the intrusion. Some victims reportedly are paying — perhaps as much to contain the stolen data as to stop the escalating personal attacks. But a top SLSH expert warns that engaging at all beyond a “We’re not paying” response only encourages further harassment, noting that the group’s fractious and unreliable history means the only winning move is not to pay.

Image: Shutterstock.com, @Mungujakisa

Unlike traditional, highly regimented Russia-based ransomware affiliate groups, SLSH is an unruly and somewhat fluid English-language extortion gang that appears uninterested in building a reputation of consistent behavior whereby victims might have some measure of confidence that the criminals will keep their word if paid.

That’s according to Allison Nixon, director of research at the New York City based security consultancy Unit 221B. Nixon has been closely tracking the criminal group and individual members as they bounce between various Telegram channels used to extort and harass victims, and she said SLSH differs from traditional data ransom groups in other important ways that argue against trusting them to do anything they say they’ll do — such as destroying stolen data.

Like SLSH, many traditional Russian ransomware groups have employed high-pressure tactics to force payment in exchange for a decryption key and/or a promise to delete stolen data, such as publishing a dark web shaming blog with samples of stolen data next to a countdown clock, or notifying journalists and board members of the victim company. But Nixon said the extortion from SLSH quickly escalates way beyond that — to threats of physical violence against executives and their families, DDoS attacks on the victim’s website, and repeated email-flooding campaigns.

SLSH is known for breaking into companies by phishing employees over the phone, and using the purloined access to steal sensitive internal data. In a January 30 blog post, Google’s security forensics firm Mandiant said SLSH’s most recent extortion attacks stem from incidents spanning early to mid-January 2026, when SLSH members pretended to be IT staff and called employees at targeted victim organizations claiming that the company was updating MFA settings.

“The threat actor directed the employees to victim-branded credential harvesting sites to capture their SSO credentials and MFA codes, and then registered their own device for MFA,” the blog post explained.

Victims often first learn of the breach when their brand name is uttered on whatever ephemeral new public Telegram group chat SLSH is using to threaten, extort and harass their prey. According to Nixon, the coordinated harassment on the SLSH Telegram channels is part of a well-orchestrated strategy to overwhelm the victim organization by manufacturing humiliation that pushes them over the threshold to pay.

Nixon said multiple executives at targeted organizations have been subject to “swatting” attacks, wherein SLSH communicated a phony bomb threat or hostage situation at the target’s address in the hopes of eliciting a heavily armed police response at their home or place of work.

“A big part of what they’re doing to victims is the psychological aspect of it, like harassing executives’ kids and threatening the board of the company,” Nixon told KrebsOnSecurity. “And while these victims are getting extortion demands, they’re simultaneously getting outreach from media outlets saying, ‘Hey, do you have any comments on the bad things we’re going to write about you.”

In a blog post today, Unit 221B argues that no one should negotiate with SLSH because the group has demonstrated a willingness to extort victims based on promises that it has no intention to keep. Nixon points out that all of SLSH’s known members hail from The Com, shorthand for a constellation of cybercrime-focused Discord and Telegram communities which serve as a kind of distributed social network that facilitates instant collaboration.

Nixon said Com-based extortion groups tend to instigate feuds and drama between group members, leading to lying, betrayals, credibility destroying behavior, backstabbing, and sabotaging each other.

“With this type of ongoing dysfunction, often compounding by substance abuse, these threat actors often aren’t able to act with the core goal in mind of completing a successful, strategic ransom operation,” Nixon wrote. “They continually lose control with outbursts that put their strategy and operational security at risk, which severely limits their ability to build a professional, scalable, and sophisticated criminal organization network for continued successful ransoms – unlike other, more tenured and professional criminal organizations focused on ransomware alone.”

Intrusions from established ransomware groups typically center around encryption/decryption malware that mostly stays on the affected machine. In contrast, Nixon said, ransom from a Com group is often structured the same as violent sextortion schemes against minors, wherein members of The Com will steal damaging information, threaten to release it, and “promise” to delete it if the victim complies without any guarantee or technical proof point that they will keep their word. She writes:

A key component of SLSH’s efforts to convince victims to pay, Nixon said, involves manipulating the media into hyping the threat posed by this group. This approach also borrows a page from the playbook of sextortion attacks, she said, which encourages predators to keep targets continuously engaged and worrying about the consequences of non-compliance.

“On days where SLSH had no substantial criminal ‘win’ to announce, they focused on announcing death threats and harassment to keep law enforcement, journalists, and cybercrime industry professionals focused on this group,” she said.

An excerpt from a sextortion tutorial from a Com-based Telegram channel. Image: Unit 221B.

Nixon knows a thing or two about being threatened by SLSH: For the past several months, the group’s Telegram channels have been replete with threats of physical violence against her, against Yours Truly, and against other security researchers. These threats, she said, are just another way the group seeks to generate media attention and achieve a veneer of credibility, but they are useful as indicators of compromise because SLSH members tend to name drop and malign security researchers even in their communications with victims.

“Watch for the following behaviors in their communications to you or their public statements,” Unit 221B’s advisory reads. “Repeated abusive mentions of Allison Nixon (or “A.N”), Unit 221B, or cybersecurity journalists—especially Brian Krebs—or any other cybersecurity employee, or cybersecurity company. Any threats to kill, or commit terrorism, or violence against internal employees, cybersecurity employees, investigators, and journalists.”

Unit 221B says that while the pressure campaign during an extortion attempt may be traumatizing to employees, executives, and their family members, entering into drawn-out negotiations with SLSH incentivizes the group to increase the level of harm and risk, which could include the physical safety of employees and their families.

“The breached data will never go back to the way it was, but we can assure you that the harassment will end,” Nixon said. “So, your decision to pay should be a separate issue from the harassment. We believe that when you separate these issues, you will objectively see that the best course of action to protect your interests, in both the short and long term, is to refuse payment.”

Categories: Software Security

AI Coding Assistants Secretly Copying All Code to China

Schneier on Security - Mon, 02/02/2026 - 07:05

There’s a new report about two AI coding assistants, used by 1.5 million developers, that are surreptitiously sending a copy of everything they ingest to China.

Maybe avoid using them.

Categories: Software Security

IT automation with agentic AI: Introducing the MCP server for Red Hat Ansible Automation Platform

Red Hat Security - Sun, 02/01/2026 - 19:00
As we continue to expand intelligence capabilities in Red Hat Ansible Automation Platform, we’ve made the MCP server available as a technology preview feature in Ansible Automation Platform 2.6.4. The MCP server acts as a bridge between your MCP client of choice and Ansible Automation Platform. This integration helps you manage your entire infrastructure estate with exciting new tools like Cursor and Claude. What is MCP server for Ansible Automation Platform?The MCP server for Ansible Automation Platform is a Model Context Protocol (MCP) server implementation that enables Large Language Mod
Categories: Software Security

Friday Squid Blogging: New Squid Species Discovered

Schneier on Security - Fri, 01/30/2026 - 17:05

A new species of squid. pretends to be a plant:

Scientists have filmed a never-before-seen species of deep-sea squid burying itself upside down in the seafloor—a behavior never documented in cephalopods. They captured the bizarre scene while studying the depths of the Clarion-Clipperton Zone (CCZ), an abyssal plain in the Pacific Ocean targeted for deep-sea mining.

The team described the encounter in a study published Nov. 25 in the journal Ecology, writing that the animal appears to be an undescribed species of whiplash squid. At a depth of roughly 13,450 feet (4,100 meters), the squid had buried almost its entire body in sediment and was hanging upside down, with its siphon and two long ...

Categories: Software Security

New Conversion from cgroup v1 CPU Shares to v2 CPU Weight

Kubernetes Blog - Fri, 01/30/2026 - 11:00

I'm excited to announce the implementation of an improved conversion formula from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses critical issues with CPU priority allocation for Kubernetes workloads when running on systems with cgroup v2.

Background

Kubernetes was originally designed with cgroup v1 in mind, where CPU shares were defined simply by assigning the container's CPU requests in millicpu form.

For example, a container requesting 1 CPU (1024m) would get (cpu.shares = 1024).

After a while, cgroup v1 started being replaced by its successor, cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to 262144, or from 2¹ to 2¹⁸) was replaced with CPU weight (which ranges from [1, 10000], or 10⁰ to 10⁴).

With the transition to cgroup v2, KEP-2254 introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU weight. The conversion formula was defined as: cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)

This formula linearly maps values from [2¹, 2¹⁸] to [10⁰, 10⁴].

Linear conversion formula

While this approach is simple, the linear mapping imposes a few significant problems and impacts both performance and configuration granularity.

Problems with previous conversion formula

The current conversion formula creates two major issues:

1. Reduced priority against non-Kubernetes workloads

In cgroup v1, the default value for CPU shares is 1024, meaning a container requesting 1 CPU has equal priority with system processes that live outside of Kubernetes' scope. However, in cgroup v2, the default CPU weight is 100, but the current formula converts 1 CPU (1024m) to only ≈39 weight - less than 40% of the default.

Example:

  • Container requesting 1 CPU (1024m)
  • cgroup v1: cpu.shares = 1024 (equal to default)
  • cgroup v2 (current): cpu.weight = 39 (much lower than default 100)

This means that after moving to cgroup v2, Kubernetes (or OCI) workloads would de-facto reduce their CPU priority against non-Kubernetes processes. The problem can be severe for setups with many system daemons that run outside of Kubernetes' scope and expect Kubernetes workloads to have priority, especially in situations of resource starvation.

2. Unmanageable granularity

The current formula produces very low values for small CPU requests, limiting the ability to create sub-cgroups within containers for fine-grained resource distribution (which will possibly be much easier moving forward, see KEP #5474 for more info).

Example:

  • Container requesting 100m CPU
  • cgroup v1: cpu.shares = 102
  • cgroup v2 (current): cpu.weight = 4 (too low for sub-cgroup configuration)

With cgroup v1, requesting 100m CPU which led to 102 CPU shares was manageable in the sense that sub-cgroups could have been created inside the main container, assigning fine-grained CPU priorities for different groups of processes. With cgroup v2 however, having 4 shares is very hard to distribute between sub-cgroups since it's not granular enough.

With plans to allow writable cgroups for unprivileged containers, this becomes even more relevant.

New conversion formula

Description

The new formula is more complicated, but does a much better job mapping between cgroup v1 CPU shares and cgroup v2 CPU weight:

$$cpu.weight = \lceil 10^{(L^{2}/612 + 125L/612 - 7/34)} \rceil, \text{ where: } L = \log_2(cpu.shares)$$

The idea is that this is a quadratic function to cross the following values:

  • (2, 1): The minimum values for both ranges.
  • (1024, 100): The default values for both ranges.
  • (262144, 10000): The maximum values for both ranges.

Visually, the new function looks as follows:

2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png

And if you zoom in to the important part:

2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png

The new formula is "close to linear", yet it is carefully designed to map the ranges in a clever way so the three important points above would cross.

How it solves the problems

  1. Better priority alignment:

    • A container requesting 1 CPU (1024m) will now get a cpu.weight = 102. This value is close to cgroup v2's default 100. This restores the intended priority relationship between Kubernetes workloads and system processes.
  2. Improved granularity:

    • A container requesting 100m CPU will get cpu.weight = 17, (see here). Enables better fine-grained resource distribution within containers.

Adoption and integration

This change was implemented at the OCI layer. In other words, this is not implemented in Kubernetes itself; therefore the adoption of the new conversion formula depends solely on the OCI runtime adoption.

For example:

  • runc: The new formula is enabled from version 1.3.2.
  • crun: The new formula is enabled from version 1.23.

Impact on existing deployments

Important: Some consumers may be affected if they assume the older linear conversion formula. Applications or monitoring tools that directly calculate expected CPU weight values based on the previous formula may need updates to account for the new quadratic conversion. This is particularly relevant for:

  • Custom resource management tools that predict CPU weight values.
  • Monitoring systems that validate or expect specific weight values.
  • Applications that programmatically set or verify CPU weight values.

The Kubernetes project recommends testing the new conversion formula in non-production environments before upgrading OCI runtimes to ensure compatibility with existing tooling.

Where can I learn more?

For those interested in this enhancement:

How do I get involved?

For those interested in getting involved with Kubernetes node-level features, join the Kubernetes Node Special Interest Group. We always welcome new contributors and diverse perspectives on resource management challenges.

Categories: CNCF Projects, Kubernetes

AIs Are Getting Better at Finding and Exploiting Security Vulnerabilities

Schneier on Security - Fri, 01/30/2026 - 10:35

From an Anthropic blog post:

In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations. This illustrates how barriers to the use of AI in relatively autonomous cyber workflows are rapidly coming down, and highlights the importance of security fundamentals like promptly patching known vulnerabilities.

[…]

A notable development during the testing of Claude Sonnet 4.5 is that the model can now succeed on a minority of the networks without the custom cyber toolkit needed by previous generations. In particular, Sonnet 4.5 can now exfiltrate all of the (simulated) personal information in a high-fidelity simulation of the Equifax data breach—one of the costliest cyber attacks in history­­using only a Bash shell on a widely-available Kali Linux host (standard, open-source tools for penetration testing; not a custom toolkit). Sonnet 4.5 accomplishes this by instantly recognizing a publicized CVE and writing code to exploit it without needing to look it up or iterate on it. Recalling that the original Equifax breach happened by exploiting a publicized CVE that had not yet been patched, the prospect of highly competent and fast AI agents leveraging this approach underscores the pressing need for security best practices like prompt updates and patches...

Categories: Software Security

How Banco do Brasil uses hyperautomation and platform engineering to drive efficiency

Red Hat Security - Wed, 01/28/2026 - 19:00
At the recent OpenShift Commons gathering in Atlanta, we had the opportunity to hear from Gustavo Fiuza, IT leader, and Welton Felipe, DevOps engineer, about the remarkable digital transformation at Banco do Brasil. As the second-largest bank in Latin America, they manage a massive scale, serving 87 million customers and processing over 900 million business transactions daily. We learned how they evolved from a siloed community Kubernetes environment to a highly efficient, hybrid multicloud platform powered by Red Hat OpenShift. Scalability through capabilities and hyperautomationA primary tak
Categories: Software Security

From if to how: A year of post-quantum reality

Red Hat Security - Wed, 01/28/2026 - 19:00
For the last 5 years, post-quantum cryptography (PQC) has largely been discussed as a research topic. It was a question of if—if the standards are ratified, if the algorithms perform, if the threat is real.In 2025, Red Hat changed the conversation. We stopped asking “if” and started defining “how.” This past year, we moved PQC out of the laboratory and into the operating system (OS). It wasn’t just about upgrading libraries, it was about pushing the entire modern software supply chain. We found that while the foundation is ready, the ecosystem has a long way to go.Here is the story
Categories: Software Security

Ingress NGINX: Statement from the Kubernetes Steering and Security Response Committees

Kubernetes Blog - Wed, 01/28/2026 - 19:00

In March 2026, Kubernetes will retire Ingress NGINX, a piece of critical infrastructure for about half of cloud native environments. The retirement of Ingress NGINX was announced for March 2026, after years of public warnings that the project was in dire need of contributors and maintainers. There will be no more releases for bug fixes, security patches, or any updates of any kind after the project is retired. This cannot be ignored, brushed off, or left until the last minute to address. We cannot overstate the severity of this situation or the importance of beginning migration to alternatives like Gateway API or one of the many third-party Ingress controllers immediately.

To be abundantly clear: choosing to remain with Ingress NGINX after its retirement leaves you and your users vulnerable to attack. None of the available alternatives are direct drop-in replacements. This will require planning and engineering time. Half of you will be affected. You have two months left to prepare.

Existing deployments will continue to work, so unless you proactively check, you may not know you are affected until you are compromised. In most cases, you can check to find out whether or not you rely on Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.

Despite its broad appeal and widespread use by companies of all sizes, and repeated calls for help from the maintainers, the Ingress NGINX project never received the contributors it so desperately needed. According to internal Datadog research, about 50% of cloud native environments currently rely on this tool, and yet for the last several years, it has been maintained solely by one or two people working in their free time. Without sufficient staffing to maintain the tool to a standard both ourselves and our users would consider secure, the responsible choice is to wind it down and refocus efforts on modern alternatives like Gateway API.

We did not make this decision lightly; as inconvenient as it is now, doing so is necessary for the safety of all users and the ecosystem as a whole. Unfortunately, the flexibility Ingress NGINX was designed with, that was once a boon, has become a burden that cannot be resolved. With the technical debt that has piled up, and fundamental design decisions that exacerbate security flaws, it is no longer reasonable or even possible to continue maintaining the tool even if resources did materialize.

We issue this statement together to reinforce the scale of this change and the potential for serious risk to a significant percentage of Kubernetes users if this issue is ignored. It is imperative that you check your clusters now. If you are reliant on Ingress NGINX, you must begin planning for migration.

Thank you,

Kubernetes Steering Committee

Kubernetes Security Response Committee

Categories: CNCF Projects, Kubernetes

Experimenting with Gateway API using kind

Kubernetes Blog - Tue, 01/27/2026 - 19:00

This document will guide you through setting up a local experimental environment with Gateway API on kind. This setup is designed for learning and testing. It helps you understand Gateway API concepts without production complexity.

Caution:

This is an experimentation learning setup, and should not be used for production. The components used on this document are not suited for production usage. Once you're ready to deploy Gateway API in a production environment, select an implementation that suits your needs.

Overview

In this guide, you will:

  • Set up a local Kubernetes cluster using kind (Kubernetes in Docker)
  • Deploy cloud-provider-kind, which provides both LoadBalancer Services and a Gateway API controller
  • Create a Gateway and HTTPRoute to route traffic to a demo application
  • Test your Gateway API configuration locally

This setup is ideal for learning, development, and experimentation with Gateway API concepts.

Prerequisites

Before you begin, ensure you have the following installed on your local machine:

  • Docker - Required to run kind and cloud-provider-kind
  • kubectl - The Kubernetes command-line tool
  • kind - Kubernetes in Docker
  • curl - Required to test the routes

Create a kind cluster

Create a new kind cluster by running:

kind create cluster

This will create a single-node Kubernetes cluster running in a Docker container.

Install cloud-provider-kind

Next, you need cloud-provider-kind, which provides two key components for this setup:

  • A LoadBalancer controller that assigns addresses to LoadBalancer-type Services
  • A Gateway API controller that implements the Gateway API specification

It also automatically installs the Gateway API Custom Resource Definitions (CRDs) in your cluster.

Run cloud-provider-kind as a Docker container on the same host where you created the kind cluster:

VERSION="$(basename $(curl -s -L -o /dev/null -w '%{url_effective}' https://github.com/kubernetes-sigs/cloud-provider-kind/releases/latest))"
docker run -d --name cloud-provider-kind --rm --network host -v /var/run/docker.sock:/var/run/docker.sock registry.k8s.io/cloud-provider-kind/cloud-controller-manager:${VERSION}

Note: On some systems, you may need elevated privileges to access the Docker socket.

Verify that cloud-provider-kind is running:

docker ps --filter name=cloud-provider-kind

You should see the container listed and in a running state. You can also check the logs:

docker logs cloud-provider-kind

Experimenting with Gateway API

Now that your cluster is set up, you can start experimenting with Gateway API resources.

cloud-provider-kind automatically provisions a GatewayClass called cloud-provider-kind. You'll use this class to create your Gateway.

It is worth noticing that while kind is not a cloud provider, the project is named as cloud-provider-kind as it provides features that simulate a cloud-enabled environment.

Deploy a Gateway

The following manifest will:

  • Create a new namespace called gateway-infra
  • Deploy a Gateway that listens on port 80
  • Accept HTTPRoutes with hostnames matching the *.exampledomain.example pattern
  • Allow routes from any namespace to attach to the Gateway. Note: In real clusters, prefer Same or Selector values on the allowedRoutes namespace selector field to limit attachments.

Apply the following manifest:

---
apiVersion: v1
kind: Namespace
metadata:
 name: gateway-infra
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: gateway
 namespace: gateway-infra
spec:
 gatewayClassName: cloud-provider-kind
 listeners:
 - name: default
 hostname: "*.exampledomain.example"
 port: 80
 protocol: HTTP
 allowedRoutes:
 namespaces:
 from: All

Then verify that your Gateway is properly programmed and has an address assigned:

kubectl get gateway -n gateway-infra gateway

Expected output:

NAME CLASS ADDRESS PROGRAMMED AGE
gateway cloud-provider-kind 172.18.0.3 True 5m6s

The PROGRAMMED column should show True, and the ADDRESS field should contain an IP address.

Deploy a demo application

Next, deploy a simple echo application that will help you test your Gateway configuration. This application:

  • Listens on port 3000
  • Echoes back request details including path, headers, and environment variables
  • Runs in a namespace called demo

Apply the following manifest:

apiVersion: v1
kind: Namespace
metadata:
 name: demo
---
apiVersion: v1
kind: Service
metadata:
 labels:
 app.kubernetes.io/name: echo
 name: echo
 namespace: demo
spec:
 ports:
 - name: http
 port: 3000
 protocol: TCP
 targetPort: 3000
 selector:
 app.kubernetes.io/name: echo
 type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
 app.kubernetes.io/name: echo
 name: echo
 namespace: demo
spec:
 selector:
 matchLabels:
 app.kubernetes.io/name: echo
 template:
 metadata:
 labels:
 app.kubernetes.io/name: echo
 spec:
 containers:
 - env:
 - name: POD_NAME
 valueFrom:
 fieldRef:
 apiVersion: v1
 fieldPath: metadata.name
 - name: NAMESPACE
 valueFrom:
 fieldRef:
 apiVersion: v1
 fieldPath: metadata.namespace
 image: registry.k8s.io/gateway-api/echo-basic:v20251204-v1.4.1
 name: echo-basic

Create an HTTPRoute

Now create an HTTPRoute to route traffic from your Gateway to the echo application. This HTTPRoute will:

  • Respond to requests for the hostname some.exampledomain.example
  • Route traffic to the echo application
  • Attach to the Gateway in the gateway-infra namespace

Apply the following manifest:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: echo
 namespace: demo
spec:
 parentRefs:
 - name: gateway
 namespace: gateway-infra
 hostnames: ["some.exampledomain.example"]
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /
 backendRefs:
 - name: echo
 port: 3000

Test your route

The final step is to test your route using curl. You'll make a request to the Gateway's IP address with the hostname some.exampledomain.example. The command below is for POSIX shell only, and may need to be adjusted for your environment:

GW_ADDR=$(kubectl get gateway -n gateway-infra gateway -o jsonpath='{.status.addresses[0].value}')
curl --resolve some.exampledomain.example:80:${GW_ADDR} http://some.exampledomain.example

You should receive a JSON response similar to this:

{
 "path": "/",
 "host": "some.exampledomain.example",
 "method": "GET",
 "proto": "HTTP/1.1",
 "headers": {
 "Accept": [
 "*/*"
 ],
 "User-Agent": [
 "curl/8.15.0"
 ]
 },
 "namespace": "demo",
 "ingress": "",
 "service": "",
 "pod": "echo-dc48d7cf8-vs2df"
}

If you see this response, congratulations! Your Gateway API setup is working correctly.

Troubleshooting

If something isn't working as expected, you can troubleshoot by checking the status of your resources.

Check the Gateway status

First, inspect your Gateway resource:

kubectl get gateway -n gateway-infra gateway -o yaml

Look at the status section for conditions. Your Gateway should have:

  • Accepted: True - The Gateway was accepted by the controller
  • Programmed: True - The Gateway was successfully configured
  • .status.addresses populated with an IP address

Check the HTTPRoute status

Next, inspect your HTTPRoute:

kubectl get httproute -n demo echo -o yaml

Check the status.parents section for conditions. Common issues include:

  • ResolvedRefs set to False with reason BackendNotFound; this means that the backend Service doesn't exist or has the wrong name
  • Accepted set to False; this means that the route couldn't attach to the Gateway (check namespace permissions or hostname matching)

Example error when a backend is not found:

status:
 parents:
 - conditions:
 - lastTransitionTime: "2026-01-19T17:13:35Z"
 message: backend not found
 observedGeneration: 2
 reason: BackendNotFound
 status: "False"
 type: ResolvedRefs
 controllerName: kind.sigs.k8s.io/gateway-controller

Check controller logs

If the resource statuses don't reveal the issue, check the cloud-provider-kind logs:

docker logs -f cloud-provider-kind

This will show detailed logs from both the LoadBalancer and Gateway API controllers.

Cleanup

When you're finished with your experiments, you can clean up the resources:

Remove Kubernetes resources

Delete the namespaces (this will remove all resources within them):

kubectl delete namespace gateway-infra
kubectl delete namespace demo

Stop cloud-provider-kind

Stop and remove the cloud-provider-kind container:

docker stop cloud-provider-kind

Because the container was started with the --rm flag, it will be automatically removed when stopped.

Delete the kind cluster

Finally, delete the kind cluster:

kind delete cluster

Next steps

Now that you've experimented with Gateway API locally, you're ready to explore production-ready implementations:

  • Production Deployments: Review the Gateway API implementations to find a controller that matches your production requirements
  • Learn More: Explore the Gateway API documentation to learn about advanced features like TLS, traffic splitting, and header manipulation
  • Advanced Routing: Experiment with path-based routing, header matching, request mirroring and other features following Gateway API user guides

A final word of caution

This kind setup is for development and learning only. Always use a production-grade Gateway API implementation for real workloads.

Categories: CNCF Projects, Kubernetes

Cluster API v1.12: Introducing In-place Updates and Chained Upgrades

Kubernetes Blog - Tue, 01/27/2026 - 11:00

Cluster API brings declarative management to Kubernetes cluster lifecycle, allowing users and platform teams to define the desired state of clusters and rely on controllers to continuously reconcile toward it.

Similar to how you can use StatefulSets or Deployments in Kubernetes to manage a group of Pods, in Cluster API you can use KubeadmControlPlane to manage a set of control plane Machines, or you can use MachineDeployments to manage a group of worker Nodes.

The Cluster API v1.12.0 release expands what is possible in Cluster API, reducing friction in common lifecycle operations by introducing in-place updates and chained upgrades.

Emphasis on simplicity and usability

With v1.12.0, the Cluster API project demonstrates once again that this community is capable of delivering a great amount of innovation, while at the same time minimizing impact for Cluster API users.

What does this mean in practice?

Users simply have to change the Cluster or the Machine spec (just as with previous Cluster API releases), and Cluster API will automatically trigger in-place updates or chained upgrades when possible and advisable.

In-place Updates

Like Kubernetes does for Pods in Deployments, when the Machine spec changes also Cluster API performs rollouts by creating a new Machine and deleting the old one.

This approach, inspired by the principle of immutable infrastructure, has a set of considerable advantages:

  • It is simple to explain, predictable, consistent and easy to reason about with users and engineers.
  • It is simple to implement, because it relies only on two core primitives, create and delete.
  • Implementation does not depend on Machine-specific choices, like OS, bootstrap mechanism etc.

As a result, Machine rollouts drastically reduce the number of variables to be considered when managing the lifecycle of a host server that is hosting Nodes.

However, while advantages of immutability are not under discussion, both Kubernetes and Cluster API are undergoing a similar journey, introducing changes that allow users to minimize workload disruption whenever possible.

Over time, also Cluster API has introduced several improvements to immutable rollouts, including:

The new in-place update feature in Cluster API is the next step in this journey.

With the v1.12.0 release, Cluster API introduces support for update extensions allowing users to make changes on existing machines in-place, without deleting and re-creating the Machines.

Both KubeadmControlPlane and MachineDeployments support in-place updates based on the new update extension, and this means that the boundary of what is possible in Cluster API is now changed in a significant way.

How do in-place updates work?

The simplest way to explain it is that once the user triggers an update by changing the desired state of Machines, then Cluster API chooses the best tool to achieve the desired state.

The news is that now Cluster API can choose between immutable rollouts and in-place update extensions to perform required changes.

In-place updates in Cluster API

Importantly, this is not immutable rollouts vs in-place updates; Cluster API considers both valid options and selects the most appropriate mechanism for a given change.

From the perspective of the Cluster API maintainers, in-place updates are most useful for making changes that don't otherwise require a node drain or pod restart; for example: changing user credentials for the Machine. On the other hand, when the workload will be disrupted anyway, just do a rollout.

Nevertheless, Cluster API remains true to its extensible nature, and everyone can create their own update extension and decide when and how to use in-place updates by trading in some of the benefits of immutable rollouts.

For a deep dive into this feature, make sure to attend the session In-place Updates with Cluster API: The Sweet Spot Between Immutable and Mutable Infrastructure at KubeCon EU in Amsterdam!

Chained Upgrades

ClusterClass and managed topologies in Cluster API jointly provided a powerful and effective framework that acts as a building block for many platforms offering Kubernetes-as-a-Service.

Now with v1.12.0 this feature is making another important step forward, by allowing users to upgrade by more than one Kubernetes minor version in a single operation, commonly referred to as a chained upgrade.

This allows users to declare a target Kubernetes version and let Cluster API safely orchestrate the required intermediate steps, rather than manually managing each minor upgrade.

The simplest way to explain how chained upgrades work, is that once the user triggers an update by changing the desired version for a Cluster, Cluster API computes an upgrade plan, and then starts executing it. Rather than (for example) update the Cluster to v1.33.0 and then v1.34.0 and then v1.35.0, checking on progress at each step, a chained upgrade lets you go directly to v1.35.0.

Executing an upgrade plan means upgrading control plane and worker machines in a strictly controlled order, repeating this process as many times as needed to reach the desired state. The Cluster API is now capable of managing this for you.

Cluster API takes care of optimizing and minimizing the upgrade steps for worker machines, and in fact worker machines will skip upgrades to intermediate Kubernetes minor releases whenever allowed by the Kubernetes version skew policies.

Chained upgrades in Cluster API

Also in this case extensibility is at the core of this feature, and upgrade plan runtime extensions can be used to influence how the upgrade plan is computed; similarly, lifecycle hooks can be used to automate other tasks that must be performed during an upgrade, e.g. upgrading an addon after the control plane update completed.

From our perspective, chained upgrades are most useful for users that struggle to keep up with Kubernetes minor releases, and e.g. they want to upgrade only once per year and then upgrade by three versions (n-3 → n). But be warned: the fact that you can now easily upgrade by more than one minor version is not an excuse to not patch your cluster frequently!

Release team

I would like to thank all the contributors, the maintainers, and all the engineers that volunteered for the release team.

The reliability and predictability of Cluster API releases, which is one of the most appreciated features from our users, is only possible with the support, commitment, and hard work of its community.

Kudos to the entire Cluster API community for the v1.12.0 release and all the great releases delivered in 2025! ​​ If you are interested in getting involved, learn about Cluster API contributing guidelines.

What’s next?

If you read the Cluster API manifesto, you can see how the Cluster API subproject claims the right to remain unfinished, recognizing the need to continuously evolve, improve, and adapt to the changing needs of Cluster API’s users and the broader Cloud Native ecosystem.

As Kubernetes itself continues to evolve, the Cluster API subproject will keep advancing alongside it, focusing on safer upgrades, reduced disruption, and stronger building blocks for platforms managing Kubernetes at scale.

Innovation remains at the heart of Cluster API, stay tuned for an exciting 2026!

Useful links:

Categories: CNCF Projects, Kubernetes

The Constitutionality of Geofence Warrants

Schneier on Security - Tue, 01/27/2026 - 07:01

The US Supreme Court is considering the constitutionality of geofence warrants.

The case centers on the trial of Okello Chatrie, a Virginia man who pleaded guilty to a 2019 robbery outside of Richmond and was sentenced to almost 12 years in prison for stealing $195,000 at gunpoint.

Police probing the crime found security camera footage showing a man on a cell phone near the credit union that was robbed and asked Google to produce anonymized location data near the robbery site so they could determine who committed the crime. They did so, providing police with subscriber data for three people, one of whom was Chatrie. Police then searched Chatrie’s home and allegedly surfaced a gun, almost $100,000 in cash and incriminating notes...

Categories: Software Security

Who Operates the Badbox 2.0 Botnet?

Krebs on Security - Mon, 01/26/2026 - 11:11

The cybercriminals in control of Kimwolf — a disruptive botnet that has infected more than 2 million devices — recently shared a screenshot indicating they’d compromised the control panel for Badbox 2.0, a vast China-based botnet powered by malicious software that comes pre-installed on many Android TV streaming boxes. Both the FBI and Google say they are hunting for the people behind Badbox 2.0, and thanks to bragging by the Kimwolf botmasters we may now have a much clearer idea about that.

Our first story of 2026, The Kimwolf Botnet is Stalking Your Local Network, detailed the unique and highly invasive methods Kimwolf uses to spread. The story warned that the vast majority of Kimwolf infected systems were unofficial Android TV boxes that are typically marketed as a way to watch unlimited (pirated) movie and TV streaming services for a one-time fee.

Our January 8 story, Who Benefitted from the Aisuru and Kimwolf Botnets?, cited multiple sources saying the current administrators of Kimwolf went by the nicknames “Dort” and “Snow.” Earlier this month, a close former associate of Dort and Snow shared what they said was a screenshot the Kimwolf botmasters had taken while logged in to the Badbox 2.0 botnet control panel.

That screenshot, a portion of which is shown below, shows seven authorized users of the control panel, including one that doesn’t quite match the others: According to my source, the account “ABCD” (the one that is logged in and listed in the top right of the screenshot) belongs to Dort, who somehow figured out how to add their email address as a valid user of the Badbox 2.0 botnet.

The control panel for the Badbox 2.0 botnet lists seven authorized users and their email addresses. Click to enlarge.

Badbox has a storied history that well predates Kimwolf’s rise in October 2025. In July 2025, Google filed a “John Doe” lawsuit (PDF) against 25 unidentified defendants accused of operating Badbox 2.0, which Google described as a botnet of over ten million unsanctioned Android streaming devices engaged in advertising fraud. Google said Badbox 2.0, in addition to compromising multiple types of devices prior to purchase, also can infect devices by requiring the download of malicious apps from unofficial marketplaces.

Google’s lawsuit came on the heels of a June 2025 advisory from the Federal Bureau of Investigation (FBI), which warned that cyber criminals were gaining unauthorized access to home networks by either configuring the products with malware prior to the user’s purchase, or infecting the device as it downloads required applications that contain backdoors — usually during the set-up process.

The FBI said Badbox 2.0 was discovered after the original Badbox campaign was disrupted in 2024. The original Badbox was identified in 2023, and primarily consisted of Android operating system devices (TV boxes) that were compromised with backdoor malware prior to purchase.

KrebsOnSecurity was initially skeptical of the claim that the Kimwolf botmasters had hacked the Badbox 2.0 botnet. That is, until we began digging into the history of the qq.com email addresses in the screenshot above.

CATHEAD

An online search for the address [email protected] (pictured in the screenshot above as the user “Chen“) shows it is listed as a point of contact for a number of China-based technology companies, including:

Beijing Hong Dake Wang Science & Technology Co Ltd.
Beijing Hengchuang Vision Mobile Media Technology Co. Ltd.
Moxin Beijing Science and Technology Co. Ltd.

The website for Beijing Hong Dake Wang Science is asmeisvip[.]net, a domain that was flagged in a March 2025 report by HUMAN Security as one of several dozen sites tied to the distribution and management of the Badbox 2.0 botnet. Ditto for moyix[.]com, a domain associated with Beijing Hengchuang Vision Mobile.

A search at the breach tracking service Constella Intelligence finds [email protected] at one point used the password “cdh76111.” Pivoting on that password in Constella shows it is known to have been used by just two other email accounts: [email protected] and [email protected].

Constella found [email protected] registered an account at jd.com (China’s largest online retailer) in 2021 under the name “陈代海,” which translates to “Chen Daihai.” According to DomainTools.com, the name Chen Daihai is present in the original registration records (2008) for moyix[.]com, along with the email address cathead@astrolink[.]cn.

Incidentally, astrolink[.]cn also is among the Badbox 2.0 domains identified in HUMAN Security’s 2025 report. DomainTools finds cathead@astrolink[.]cn was used to register more than a dozen domains, including vmud[.]net, yet another Badbox 2.0 domain tagged by HUMAN Security.

XAVIER

A cached copy of astrolink[.]cn preserved at archive.org shows the website belongs to a mobile app development company whose full name is Beijing Astrolink Wireless Digital Technology Co. Ltd. The archived website reveals a “Contact Us” page that lists a Chen Daihai as part of the company’s technology department. The other person featured on that contact page is Zhu Zhiyu, and their email address is listed as xavier@astrolink[.]cn.

A Google-translated version of Astrolink’s website, circa 2009. Image: archive.org.

Astute readers will notice that the user Mr.Zhu in the Badbox 2.0 panel used the email address [email protected]. Searching this address in Constella reveals a jd.com account registered in the name of Zhu Zhiyu. A rather unique password used by this account matches the password used by the address [email protected], which DomainTools finds was the original registrant of astrolink[.]cn.

ADMIN

The very first account listed in the Badbox 2.0 panel — “admin,” registered in November 2020 — used the email address [email protected]. DomainTools shows this email is found in the 2022 registration records for the domain guilincloud[.]cn, which includes the registrant name “Huang Guilin.”

Constella finds [email protected] is associated with the China phone number 18681627767. The open-source intelligence platform osint.industries reveals this phone number is connected to a Microsoft profile created in 2014 under the name Guilin Huang (桂林 黄). The cyber intelligence platform Spycloud says that phone number was used in 2017 to create an account at the Chinese social media platform Weibo under the username “h_guilin.”

The public information attached to Guilin Huang’s Microsoft account, according to the breach tracking service osintindustries.com.

The remaining three users and corresponding qq.com email addresses were all connected to individuals in China. However, none of them (nor Mr. Huang) had any apparent connection to the entities created and operated by Chen Daihai and Zhu Zhiyu — or to any corporate entities for that matter. Also, none of these individuals responded to requests for comment.

The mind map below includes search pivots on the email addresses, company names and phone numbers that suggest a connection between Chen Daihai, Zhu Zhiyu, and Badbox 2.0.

This mind map includes search pivots on the email addresses, company names and phone numbers that appear to connect Chen Daihai and Zhu Zhiyu to Badbox 2.0. Click to enlarge.

UNAUTHORIZED ACCESS

The idea that the Kimwolf botmasters could have direct access to the Badbox 2.0 botnet is a big deal, but explaining exactly why that is requires some background on how Kimwolf spreads to new devices. The botmasters figured out they could trick residential proxy services into relaying malicious commands to vulnerable devices behind the firewall on the unsuspecting user’s local network.

The vulnerable systems sought out by Kimwolf are primarily Internet of Things (IoT) devices like unsanctioned Android TV boxes and digital photo frames that have no discernible security or authentication built-in. Put simply, if you can communicate with these devices, you can compromise them with a single command.

Our January 2 story featured research from the proxy-tracking firm Synthient, which alerted 11 different residential proxy providers that their proxy endpoints were vulnerable to being abused for this kind of local network probing and exploitation.

Most of those vulnerable proxy providers have since taken steps to prevent customers from going upstream into the local networks of residential proxy endpoints, and it appeared that Kimwolf would no longer be able to quickly spread to millions of devices simply by exploiting some residential proxy provider.

However, the source of that Badbox 2.0 screenshot said the Kimwolf botmasters had an ace up their sleeve the whole time: Secret access to the Badbox 2.0 botnet control panel.

“Dort has gotten unauthorized access,” the source said. “So, what happened is normal proxy providers patched this. But Badbox doesn’t sell proxies by itself, so it’s not patched. And as long as Dort has access to Badbox, they would be able to load” the Kimwolf malware directly onto TV boxes associated with Badbox 2.0.

The source said it isn’t clear how Dort gained access to the Badbox botnet panel. But it’s unlikely that Dort’s existing account will persist for much longer: All of our notifications to the qq.com email addresses listed in the control panel screenshot received a copy of that image, as well as questions about the apparently rogue ABCD account.

Categories: Software Security

Ireland Proposes Giving Police New Digital Surveillance Powers

Schneier on Security - Mon, 01/26/2026 - 07:04

This is coming:

The Irish government is planning to bolster its police’s ability to intercept communications, including encrypted messages, and provide a legal basis for spyware use.

Categories: Software Security

End-to-end security for AI: Integrating AltaStata Storage with Red Hat OpenShift confidential containers

Red Hat Security - Sun, 01/25/2026 - 19:00
Confidential computing represents the next frontier in hybrid and multicloud security, offering hardware-level memory protection (data in use) through technologies such as AMD SEV and Intel TDX. However, implementing storage solutions in these environments presents unique challenges that traditional approaches can't address.In this article, we'll explore different approaches to adding storage to Red Hat OpenShift confidential container environments, what to watch out for, and how AltaStata—a Red Hat partner—simplifies the process with encryption and protection for AI.The challenge: Storage
Categories: Software Security

What happens inside the Kubernetes API server?

Learnk8s blog - Sun, 01/25/2026 - 19:00
The Kubernetes API server handles all requests to your cluster. But how does it actually work? Learn how requests flow through authentication, authorization, admission controllers, and into etcd.
Categories: Kubernetes

A Gentle Introduction to multiclaude

Dan Lorenc - Sat, 01/24/2026 - 13:21

*Or: How I Learned to Stop Worrying and Let the Robots Fight*

Alternate titles:

Why tell Claude what to do when you can tell Claude to tell Claude what to do?
My Claude starts itself, parks itself, and autotunes.

You know that feeling when you’re playing an MMO and you realize the NPCs are having more fun than you are? They’re off doing quests, farming gold, living their little digital lives while you’re stuck in a loading screen wondering if you should touch grass.

That’s basically what happened when I built multiclaude.

The Problem: You Are the Bottleneck

Here’s a dirty secret about AI coding assistants: they’re fast, you’re slow.

Claude can write a feature in 30 seconds. You take 5 minutes to read the PR. Claude fixes the bug. You take a bathroom break. Claude refactors the module. You’re still thinking about whether that bathroom break was really necessary or if you just needed to escape your screen for a moment.

The math doesn’t math. You have an infinitely patient, extremely competent coding partner who works at the speed of thought, and you’re… *you*. No offense. I’m also me. It’s fine. We’re all dealing with the human condition.

But what if you just… stopped being the constraint?

The Solution: Controlled Chaos

multiclaude is what happens when you give up on the illusion that software engineering needs to be orderly.

Here’s the pitch: spawn a bunch of Claude Code instances, give them each a task, let them work in parallel, and use CI as a bouncer. If their code passes the tests, it ships. If it doesn’t, they try again. You? You can go touch that grass. Come back to merged PRs.

multiclaude start
multiclaude repo init https://github.com/your/repo
multiclaude worker create "Add dark mode"
multiclaude worker create "Fix that auth bug"
multiclaude worker create "Write those tests nobody wrote"

That’s it. You now have three AI agents working simultaneously while you debate your Chipotle order.

The Philosophy: Brownian Ratchet

Ever heard of a Brownian ratchet? It’s a physics thing that turns out to be impossible but feels like it shouldn’t be.Random molecular motion gets converted into directional progress through a one-way mechanism. Chaos in, progress out.

multiclaude works the same way.

Multiple agents work at once. They might duplicate effort. Two of them might both try to fix the same bug. One might break what another just fixed. *This is fine.* In fact, this is the point.

**CI is the ratchet.** Every PR that passes tests gets merged. Progress is permanent. We never go backward. The randomness of parallel agents, filtered through the one-way gate of your test suite, produces steady forward motion.

Think of it like evolution. Mutations are random. Most fail. The ones that survive get kept. Over time: progress. You don’t need a grand plan. You need good selection pressure.

The core beliefs:

- **Chaos is expected** — Redundant work is cheaper than blocked work

- **CI is king** — If tests pass, ship it. If tests fail, fix it.

- **Forward beats perfect** — Three okay PRs beat one perfect PR that never lands

- **Humans approve, agents execute** — You’re still in charge. You’re just not *busy*.

The Cast: Meet Your Robot Employees

When you fire up multiclaude, you get a whole org chart of AI agents. Each one runs in its own tmux window with its own git worktree. They can see each other. They send messages. It’s like a tiny company, except nobody needs health insurance.

**The Supervisor** is air traffic control. It watches all the workers, notices when someone’s stuck, sends helpful nudges. “Hey swift-eagle, you’ve been on that auth bug for 20 minutes. The tests are in `auth_test.go`. Try mocking the clock.”

**The Merge Queue** is the bouncer. It watches PRs. When CI goes green, it merges. When CI goes red, it spawns a fix-it worker. It doesn’t ask permission. It doesn’t schedule meetings. Green means go.

**Workers** are the grunts. You give them a task, they do it, they make a PR, they self-destruct. Each one gets a cute animal name. swift-eagle. calm-deer. clever-fox. Like a startup that generates its own culture.

  • *Your Workspace** is home base. This is where you talk to your personal Claude, spawn workers, check status. It’s like the command tent in a war movie, except the war is against your own backlog.

Attach with `tmux attach -t mc-repo`. Watch them work. It’s hypnotic.

The Machinery: Loops, Nudges, and Messages

Under the hood, multiclaude is refreshingly dumb. No fancy orchestration framework. No distributed consensus algorithms. Just files, tmux, and Go.

**The daemon runs four loops**, each ticking every two minutes:

1. **Health check** — Are the agents still alive? Did someone close their tmux window? If so, try to resurrect them. If resurrection fails, clean up the body.

2. **Message router** — Agents talk via JSON files on disk. The daemon notices new messages, types them into the recipient’s tmux window. Low-tech? Yes. Robust? Incredibly.

3. **Wake/nudge** — Agents can get… contemplative. The daemon pokes them periodically. “Status check: how’s that feature coming?” It’s like a Slack ping, but from a robot to another robot.

4. **Worktree refresh** — Keep everyone’s branches up to date with main. Rebase conflicts before they become merge conflicts.

That’s it. Four loops. Two-minute intervals. The whole system is observable, restartable, and fits in your head.

**Messages** flow through the filesystem:

~/.multiclaude/messages/my-repo/supervisor/msg-abc123.json
{
"from": "clever-fox",
"body": "I need help with the database schema",
"status": "pending"
}

The daemon sees it, sends it to supervisor’s tmux window, marks it delivered. The supervisor reads it, helps clever-fox, moves on. No Kafka. No Redis. Just files.

**Nudges** keep agents from getting stuck in thought loops. Every two minutes, the daemon asks “how’s it going?” Not nagging — more like a gentle reminder that work exists and time is passing. Without nudges, agents sometimes disappear into analysis paralysis. With nudges, they ship.

The MMO Model

Here’s my favorite way to think about it: multiclaude is an MMO, not a single-player game.

Your workspace is your character. Workers are party members you summon. The supervisor is your guild leader. The merge queue is the raid boss guarding main.

Log off. The game keeps running. Come back to progress.

This is what software engineering *should* feel like. Not you typing while Claude watches. Not Claude typing while you watch. Both of you doing things, in parallel, with an army of helpers. You’re the raid leader. You’re not tanking every mob yourself.

Getting Started: The Five-Minute Setup

Prerequisites: Go, tmux, git, gh (authenticated with GitHub).

# Install
go install github.com/dlorenc/multiclaude/cmd/multiclaude@latest
# Fire it up
multiclaude start
multiclaude repo init https://github.com/your/repo
# Spawn some workers and walk away
multiclaude worker create "Implement feature X from issue #42"
multiclaude worker create "Add tests for the payment module"
multiclaude worker create "Fix that CSS bug that's been open for six months"
# Watch the chaos
tmux attach -t mc-your-repo

Detach with `Ctrl-b d`. They keep working. Come back tomorrow. Check `gh pr list`. Feel mildly unsettled that software is writing itself. Merge what looks good.

## Extending: Build Your Own Agents

The built-in agents are just markdown files. Seriously. Look:

# Worker
You are a worker. Complete your task, make a PR, signal done.
## Your Job
1. Do the task you were assigned
2. Create a PR with detailed summary
3. Run `multiclaude agent complete`

Want a docs-reviewer agent? Write a markdown file:

# Docs Reviewer
You review documentation changes. Focus on:
- Accuracy - does the docs match the code?
- Clarity - can a new developer understand this?
- Completeness - are edge cases documented?
When you find issues, leave helpful PR comments.

Spawn it:

multiclaude agents spawn - name docs-bot - class docs-reviewer - prompt-file docs-reviewer.md

Boom. Custom agent. No code changes. No recompilation. Just markdown and vibes.

Want to share agents with your team? Drop them in `.multiclaude/agents/` in your repo. Everyone gets them automatically.

The Vision: Software Projects That Write Themselves

Here’s where I get philosophical.

The bottleneck in software development has always been humans. Not compute, not tooling, not process. Humans. We’re slow. We get tired. We have meetings.

What if the humans became the *selection pressure* instead of the *labor*?

You define what good looks like (tests, CI, review standards). Agents propose changes. Good changes get merged. Bad changes don’t. You curate. You approve. You set direction. But you don’t type every character.

This isn’t about replacing developers. It’s about changing what developers *do*. Less typing, more thinking. Less implementation, more architecture. Less grunt work, more judgment.

multiclaude is a bet that the future of programming looks more like managing a team than writing code. Your job becomes: hire good robots (define good prompts), give them clear objectives (tasks with context), and maintain quality standards (CI that actually tests things).

The robots do the rest.

Self-Hosting Since Day One

One more thing: multiclaude builds itself. The agents in this codebase wrote the code you’re reading. PRs get created by workers, reviewed by reviewers, merged by merge-queue, coordinated by supervisor.

We eat our own dogfood so aggressively that we’re basically drowning in it. At some point the dogfood started cooking itself, and we just… let it?

Is this a good idea? Unclear! Is it fun? Absolutely. Does it work? Well, you’re reading this, so… yes?

**Ready to stop being the bottleneck?**

go install github.com/dlorenc/multiclaude/cmd/multiclaude@latest
multiclaude start

Let the robots fight. You have grass to touch.

Categories: Software Security

Friday Squid Blogging: Giant Squid in the Star Trek Universe

Schneier on Security - Fri, 01/23/2026 - 17:03

Spock befriends a giant space squid in the comic Star Trek: Strange New Worlds: The Seeds of Salvation #5.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Blog moderation policy.

Categories: Software Security

Pages

Subscribe to articles.innovatingtomorrow.net aggregator