You are here

Feed aggregator

Abusing Notion’s AI Agent for Data Theft

Schneier on Security - Mon, 09/29/2025 - 07:07

Notion just released version 3.0, complete with AI agents. Because the system contains Simon Willson’s lethal trifecta, it’s vulnerable to data theft though prompt injection.

First, the trifecta:

The lethal trifecta of capabilities is:

  • Access to your private data—one of the most common purposes of tools in the first place!
  • Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM
  • The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)...
Categories: Software Security

Digital Threat Modeling Under Authoritarianism

Schneier on Security - Fri, 09/26/2025 - 07:04

Today’s world requires us to make complex and nuanced decisions about our digital security. Evaluating when to use a secure messaging app like Signal or WhatsApp, which passwords to store on your smartphone, or what to share on social media requires us to assess risks and make judgments accordingly. Arriving at any conclusion is an exercise in threat modeling.

In security, threat modeling is the process of determining what security measures make sense in your particular situation. It’s a way to think about potential risks, possible defenses, and the costs of both. It’s how experts avoid being distracted by irrelevant risks or overburdened by undue costs...

Categories: Software Security

Autonomous Testing of etcd’s Robustness

CNCF Blog Projects Category - Thu, 09/25/2025 - 11:25

As a critical component of many production systems, including Kubernetes, the etcd project’s first priority is reliability. Ensuring consistency and data safety requires our project contributors to continuously improve testing methodologies. In this article, we describe how we use advanced simulation testing to uncover subtle bugs, validate the robustness of our releases, and increase our confidence in etcd’s stability. We’ll share our key findings and how they have improved etcd.

Enhancing etcd’s Robustness Testing

Many critical software systems depend on etcd to be correct and consistent, most notably as the primary datastore for Kubernetes. After some issues with the v3.5 release, the etcd maintainers developed a new robustness testing framework to better test for correctness under various failure scenarios. To further enhance our testing capabilities, we integrated a deterministic simulation testing platform from Antithesis into our workflow.

The platform works by running the entire etcd cluster inside a deterministic hypervisor. This specialized environment gives the testing software complete control over every source of non-determinism, such as network behavior, thread scheduling, and system clocks. This means any bug it discovers can be perfectly and reliably reproduced.

Within this simulated environment, the testing methodology shifts away from traditional, scenario-based tests. Instead of writing tests imperatively with strict assertions for one specific outcome, this approach uses declarative, property-based assertions about system behavior. These properties are high-level invariants about the system that must always hold true. For example, “data consistency is never violated” or “a watch event is never dropped.”

The platform then treats these properties not as passive checks, but as targets to break. It combines automated exploration with targeted fault injection, actively searching for the precise sequence of events and failures that will cause a property to be violated. This active search for violations is what allows the platform to uncover subtle bugs that result from complex combinations of factors. Antithesis refers to this approach as Autonomous Testing.

This builds upon etcd’s existing robustness tests, which also use a property-based approach. However, without a deterministic environment or automated exploration, the original framework resembled throwing darts while blindfolded and hoping to hit the bullseye. A bug might be found, but the process relies heavily on random chance and is difficult to reproduce. Antithesis’s deterministic simulation and active exploration remove the blindfold, enabling a systematic and reproducible search for bugs.

How We Tested

Our goals for this testing effort were to:

  1. Validate the robustness of etcd v3.6.
  2. Improve etcd’s software quality by finding and fixing bugs.
  3. Enhance our existing testing framework with autonomous testing.

We ran our existing robustness tests on the Antithesis simulation platform, testing a 3-node and a 1-node etcd cluster against a variety of faults, including:

  • Network faults: latency, congestion, and partitions.
  • Container-level faults: thread pauses, process kills, clock jitter, and CPU throttling.

We tested older versions of etcd with known bugs to validate the testing methodology, as well as our stable releases (3.4, 3.5, 3.6) and the main development branch. In total, we ran 830 wall-clock hours of testing, which simulated 4.5 years of usage.

What We Found

The results were impressive. The simulation testing not only found all the known bugs we tested for but also uncovered several new issues in our main development branch.

Here are some of the key findings:

  • A critical watch bug was discovered that our existing tests had missed. This bug was present in all stable releases of etcd.
  • All known bugs were found, giving us confidence in the ability of the combined testing approach to find regressions.
  • Our own testing was improved by revealing a flaw in our linearization checker model.

Issues in the Main Development Branch

DescriptionReport LinkStatusImpactDetailsWatch on future revision might receive old eventsTriage ReportFixed in 3.6.2 (#20281)MediumNew bug discovered by AntithesisWatch on future revision might receive old notificationsTriage ReportFixed in 3.6.2 (#20221)MediumNew bug discovered by both Antithesis and robustness testsPanic when two snapshots are received in a short periodTriage ReportOpenLowPreviously discovered by robustnessPanic from db page expected to be 5Triage ReportOpenLowNew bug discovered by AntithesisOperation time based on watch response is incorrectTriage ReportFixed test on main branch (#19998)LowBug in robustness tests discovered by Antithesis

Known Issues

Antithesis also successfully found and reproduced these known issues in older releases – the “Brown M&Ms” set by the etcd maintainers.

DescriptionReport LinkWatch dropping an event when compacting on deleteTriage ReportRevision decreasing caused by crash during compactionTriage ReportWatch progress notification not synced with streamTriage ReportInconsistent revision caused by crash during defragTriage ReportWatchable runlock bugTriage Report

Conclusion

The integration of this advanced simulation testing into our development workflow has been a success. It has allowed us to find and fix critical bugs, improve our existing testing framework, and increase our confidence in the reliability of etcd. We will continue to leverage this technology to ensure that etcd remains a stable and trusted distributed key-value store for the community.

Categories: CNCF Projects

Announcing Changed Block Tracking API support (alpha)

Kubernetes Blog - Thu, 09/25/2025 - 09:00

We're excited to announce the alpha support for a changed block tracking mechanism. This enhances the Kubernetes storage ecosystem by providing an efficient way for CSI storage drivers to identify changed blocks in PersistentVolume snapshots. With a driver that can use the feature, you could benefit from faster and more resource-efficient backup operations.

If you're eager to try this feature, you can skip to the Getting Started section.

What is changed block tracking?

Changed block tracking enables storage systems to identify and track modifications at the block level between snapshots, eliminating the need to scan entire volumes during backup operations. The improvement is a change to the Container Storage Interface (CSI), and also to the storage support in Kubernetes itself. With the alpha feature enabled, your cluster can:

  • Identify allocated blocks within a CSI volume snapshot
  • Determine changed blocks between two snapshots of the same volume
  • Streamline backup operations by focusing only on changed data blocks

For Kubernetes users managing large datasets, this API enables significantly more efficient backup processes. Backup applications can now focus only on the blocks that have changed, rather than processing entire volumes.

Note:

As of now, the Changed Block Tracking API is supported only for block volumes and not for file volumes. CSI drivers that manage file-based storage systems will not be able to implement this capability.

Benefits of changed block tracking support in Kubernetes

As Kubernetes adoption grows for stateful workloads managing critical data, the need for efficient backup solutions becomes increasingly important. Traditional full backup approaches face challenges with:

  • Long backup windows: Full volume backups can take hours for large datasets, making it difficult to complete within maintenance windows.
  • High resource utilization: Backup operations consume substantial network bandwidth and I/O resources, especially for large data volumes and data-intensive applications.
  • Increased storage costs: Repetitive full backups store redundant data, causing storage requirements to grow linearly even when only a small percentage of data actually changes between backups.

The Changed Block Tracking API addresses these challenges by providing native Kubernetes support for incremental backup capabilities through the CSI interface.

Key components

The implementation consists of three primary components:

  1. CSI SnapshotMetadata Service API: An API, offered by gRPC, that provides volume snapshot and changed block data.
  2. SnapshotMetadataService API: A Kubernetes CustomResourceDefinition (CRD) that advertises CSI driver metadata service availability and connection details to cluster clients.
  3. External Snapshot Metadata Sidecar: An intermediary component that connects CSI drivers to backup applications via a standardized gRPC interface.

Implementation requirements

Storage provider responsibilities

If you're an author of a storage integration with Kubernetes and want to support the changed block tracking feature, you must implement specific requirements:

  1. Implement CSI RPCs: Storage providers need to implement the SnapshotMetadata service as defined in the CSI specifications protobuf. This service requires server-side streaming implementations for the following RPCs:

    • GetMetadataAllocated: For identifying allocated blocks in a snapshot
    • GetMetadataDelta: For determining changed blocks between two snapshots
  2. Storage backend capabilities: Ensure the storage backend has the capability to track and report block-level changes.

  3. Deploy external components: Integrate with the external-snapshot-metadata sidecar to expose the snapshot metadata service.

  4. Register custom resource: Register the SnapshotMetadataService resource using a CustomResourceDefinition and create a SnapshotMetadataService custom resource that advertises the availability of the metadata service and provides connection details.

  5. Support error handling: Implement proper error handling for these RPCs according to the CSI specification requirements.

Backup solution responsibilities

A backup solution looking to leverage this feature must:

  1. Set up authentication: The backup application must provide a Kubernetes ServiceAccount token when using the Kubernetes SnapshotMetadataService API. Appropriate access grants, such as RBAC RoleBindings, must be established to authorize the backup application ServiceAccount to obtain such tokens.

  2. Implement streaming client-side code: Develop clients that implement the streaming gRPC APIs defined in the schema.proto file. Specifically:

    • Implement streaming client code for GetMetadataAllocated and GetMetadataDelta methods
    • Handle server-side streaming responses efficiently as the metadata comes in chunks
    • Process the SnapshotMetadataResponse message format with proper error handling

    The external-snapshot-metadata GitHub repository provides a convenient iterator support package to simplify client implementation.

  3. Handle large dataset streaming: Design clients to efficiently handle large streams of block metadata that could be returned for volumes with significant changes.

  4. Optimize backup processes: Modify backup workflows to use the changed block metadata to identify and only transfer changed blocks to make backups more efficient, reducing both backup duration and resource consumption.

Getting started

To use changed block tracking in your cluster:

  1. Ensure your CSI driver supports volume snapshots and implements the snapshot metadata capabilities with the required external-snapshot-metadata sidecar
  2. Make sure the SnapshotMetadataService custom resource is registered using CRD
  3. Verify the presence of a SnapshotMetadataService custom resource for your CSI driver
  4. Create clients that can access the API using appropriate authentication (via Kubernetes ServiceAccount tokens)

The API provides two main functions:

  • GetMetadataAllocated: Lists blocks allocated in a single snapshot
  • GetMetadataDelta: Lists blocks changed between two snapshots

What’s next?

Depending on feedback and adoption, the Kubernetes developers hope to push the CSI Snapshot Metadata implementation to Beta in the future releases.

Where can I learn more?

For those interested in trying out this new feature:

How do I get involved?

This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. On behalf of SIG Storage, I would like to offer a huge thank you to the contributors who helped review the design and implementation of the project, including but not limited to the following:

Thank also to everyone who has contributed to the project, including others who helped review the KEP and the CSI spec PR

For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We always welcome new contributors.

The SIG also holds regular Data Protection Working Group meetings. New attendees are welcome to join our discussions.

Categories: CNCF Projects, Kubernetes

Malicious-Looking URL Creation Service

Schneier on Security - Thu, 09/25/2025 - 07:02

This site turns your URL into something sketchy-looking.

For example, www.schneier.com becomes
https://cheap-bitcoin.online/firewall-snatcher/cipher-injector/phishing_sniffer_tool.html?form=inject&host=spoof&id=bb1bc121&parameter=inject&payload=%28function%28%29%7B+return+%27+hi+%27.trim%28%29%3B+%7D%29%28%29%3B&port=spoof.

Found on Boing Boing.

Categories: Software Security

CNCF’s Helm Project Remains Fully Open Source and Unaffected by Recent Vendor Deprecations

CNCF Blog Projects Category - Wed, 09/24/2025 - 12:00

Recently, users may have seen the news about Broadcom (Bitnami) regarding upcoming deprecations of their publicly available container images and Helm Charts. These changes, which will take effect by September 29, 2025, mark a shift to a paid subscription model for Bitnami Secure Images and the removal of many free-to-use artifacts from public registries.

We want to be clear: these changes do not impact the Helm project itself.

Helm is a graduated project that will remain under the CNCF. It continues to be fully open source, Apache 2.0 licensed, and governed by a neutral community. The CNCF community retains ownership of all project intellectual property per our IP policy, ensuring no single vendor can alter its open governance model.

While “Helm charts” refer broadly to a packaging format that anyone can use to deploy applications on Kubernetes, Bitnami Helm Charts are a specific vendor-maintained implementation. Developed and maintained by the Bitnami team (now part of Broadcom), these charts are known for their ease of use, security features, and reliability. Bitnami’s decision to deprecate its public chart and image repositories is entirely separate from the Helm project itself.

Users currently depending on Bitnami Helm Charts should begin exploring migration or mirroring strategies to avoid potential disruption.

The Helm community is actively working to support users during this transition, including guidance on:

  • Updating chart dependencies
  • Exploring alternative chart sources
  • Migrating to maintained open image repositories

We encourage users to follow the Helm blog and Helm GitHub for updates and support resources.

CNCF remains committed to maintaining the integrity of our open source projects and supporting communities through transitions like this. This event also reinforces the importance of vendor neutrality and resilient infrastructure design—a principle at the heart of our mission.

For any media inquiries, please contact: [email protected]

Categories: CNCF Projects

Feds Tie ‘Scattered Spider’ Duo to $115M in Ransoms

Krebs on Security - Wed, 09/24/2025 - 07:48

U.S. prosecutors last week levied criminal hacking charges against 19-year-old U.K. national Thalha Jubair for allegedly being a core member of Scattered Spider, a prolific cybercrime group blamed for extorting at least $115 million in ransom payments from victims. The charges came as Jubair and an alleged co-conspirator appeared in a London court to face accusations of hacking into and extorting several large U.K. retailers, the London transit system, and healthcare providers in the United States.

At a court hearing last week, U.K. prosecutors laid out a litany of charges against Jubair and 18-year-old Owen Flowers, accusing the teens of involvement in an August 2024 cyberattack that crippled Transport for London, the entity responsible for the public transport network in the Greater London area.

A court artist sketch of Owen Flowers (left) and Thalha Jubair appearing at Westminster Magistrates’ Court last week. Credit: Elizabeth Cook, PA Wire.

On July 10, 2025, KrebsOnSecurity reported that Flowers and Jubair had been arrested in the United Kingdom in connection with recent Scattered Spider ransom attacks against the retailers Marks & Spencer and Harrods, and the British food retailer Co-op Group.

That story cited sources close to the investigation saying Flowers was the Scattered Spider member who anonymously gave interviews to the media in the days after the group’s September 2023 ransomware attacks disrupted operations at Las Vegas casinos operated by MGM Resorts and Caesars Entertainment.

The story also noted that Jubair’s alleged handles on cybercrime-focused Telegram channels had far lengthier rap sheets involving some of the more consequential and headline-grabbing data breaches over the past four years. What follows is an account of cybercrime activities that prosecutors have attributed to Jubair’s alleged hacker handles, as told by those accounts in posts to public Telegram channels that are closely monitored by multiple cyber intelligence firms.

EARLY DAYS (2021-2022)

Jubair is alleged to have been a core member of the LAPSUS$ cybercrime group that broke into dozens of technology companies beginning in late 2021, stealing source code and other internal data from tech giants including MicrosoftNvidiaOktaRockstar GamesSamsungT-Mobile, and Uber.

That is, according to the former leader of the now-defunct LAPSUS$. In April 2022, KrebsOnSecurity published internal chat records taken from a server that LAPSUS$ used, and those chats indicate Jubair was working with the group using the nicknames Amtrak and Asyntax. In the middle of the gang’s cybercrime spree, Asyntax told the LAPSUS$ leader not to share T-Mobile’s logo in images sent to the group because he’d been previously busted for SIM-swapping and his parents would suspect he was back at it again.

The leader of LAPSUS$ responded by gleefully posting Asyntax’s real name, phone number, and other hacker handles into a public chat room on Telegram:

In March 2022, the leader of the LAPSUS$ data extortion group exposed Thalha Jubair’s name and hacker handles in a public chat room on Telegram.

That story about the leaked LAPSUS$ chats also connected Amtrak/Asyntax to several previous hacker identities, including “Everlynn,” who in April 2021 began offering a cybercriminal service that sold fraudulent “emergency data requests” targeting the major social media and email providers.

In these so-called “fake EDR” schemes, the hackers compromise email accounts tied to police departments and government agencies, and then send unauthorized demands for subscriber data (e.g. username, IP/email address), while claiming the information being requested can’t wait for a court order because it relates to an urgent matter of life and death.

The roster of the now-defunct “Infinity Recursion” hacking team, which sold fake EDRs between 2021 and 2022. The founder “Everlynn” has been tied to Jubair. The member listed as “Peter” became the leader of LAPSUS$ who would later post Jubair’s name, phone number and hacker handles into LAPSUS$’s chat channel.

EARTHTOSTAR

Prosecutors in New Jersey last week alleged Jubair was part of a threat group variously known as Scattered Spider, 0ktapus, and UNC3944, and that he used the nicknames EarthtoStar, Brad, Austin, and Austistic.

Beginning in 2022, EarthtoStar co-ran a bustling Telegram channel called Star Chat, which was home to a prolific SIM-swapping group that relentlessly used voice- and SMS-based phishing attacks to steal credentials from employees at the major wireless providers in the U.S. and U.K.

Jubair allegedly used the handle “Earth2Star,” a core member of a prolific SIM-swapping group operating in 2022. This ad produced by the group lists various prices for SIM swaps.

The group would then use that access to sell a SIM-swapping service that could redirect a target’s phone number to a device the attackers controlled, allowing them to intercept the victim’s phone calls and text messages (including one-time codes). Members of Star Chat targeted multiple wireless carriers with SIM-swapping attacks, but they focused mainly on phishing T-Mobile employees.

In February 2023, KrebsOnSecurity scrutinized more than seven months of these SIM-swapping solicitations on Star Chat, which almost daily peppered the public channel with “Tmo up!” and “Tmo down!” notices indicating periods wherein the group claimed to have active access to T-Mobile’s network.

A redacted receipt from Star Chat’s SIM-swapping service targeting a T-Mobile customer after the group gained access to internal T-Mobile employee tools.

The data showed that Star Chat — along with two other SIM-swapping groups operating at the same time — collectively broke into T-Mobile over a hundred times in the last seven months of 2022. However, Star Chat was by far the most prolific of the three, responsible for at least 70 of those incidents.

The 104 days in the latter half of 2022 in which different known SIM-swapping groups claimed access to T-Mobile employee tools. Star Chat was responsible for a majority of these incidents. Image: krebsonsecurity.com.

A review of EarthtoStar’s messages on Star Chat as indexed by the threat intelligence firm Flashpoint shows this person also sold “AT&T email resets” and AT&T call forwarding services for up to $1,200 per line. EarthtoStar explained the purpose of this service in post on Telegram:

“Ok people are confused, so you know when u login to chase and it says ‘2fa required’ or whatever the fuck, well it gives you two options, SMS or Call. If you press call, and I forward the line to you then who do you think will get said call?”

New Jersey prosecutors allege Jubair also was involved in a mass SMS phishing campaign during the summer of 2022 that stole single sign-on credentials from employees at hundreds of companies. The text messages asked users to click a link and log in at a phishing page that mimicked their employer’s Okta authentication page, saying recipients needed to review pending changes to their upcoming work schedules.

The phishing websites used a Telegram instant message bot to forward any submitted credentials in real-time, allowing the attackers to use the phished username, password and one-time code to log in as that employee at the real employer website.

That weeks-long SMS phishing campaign led to intrusions and data thefts at more than 130 organizations, including LastPass, DoorDash, Mailchimp, Plex and Signal.

A visual depiction of the attacks by the SMS phishing group known as 0ktapus, ScatterSwine, and Scattered Spider. Image: Amitai Cohen twitter.com/amitaico.

DA, COMRADE

EarthtoStar’s group Star Chat specialized in phishing their way into business process outsourcing (BPO) companies that provide customer support for a range of multinational companies, including a number of the world’s largest telecommunications providers. In May 2022, EarthtoStar posted to the Telegram channel “Frauwudchat”:

“Hi, I am looking for partners in order to exfiltrate data from large telecommunications companies/call centers/alike, I have major experience in this field, [including] a massive call center which houses 200,000+ employees where I have dumped all user credentials and gained access to the [domain controller] + obtained global administrator I also have experience with REST API’s and programming. I have extensive experience with VPN, Citrix, cisco anyconnect, social engineering + privilege escalation. If you have any Citrix/Cisco VPN or any other useful things please message me and lets work.”

At around the same time in the Summer of 2022, at least two different accounts tied to Star Chat — “RocketAce” and “Lopiu” — introduced the group’s services to denizens of the Russian-language cybercrime forum Exploit, including:

-SIM-swapping services targeting Verizon and T-Mobile customers;
-Dynamic phishing pages targeting customers of single sign-on providers like Okta;
-Malware development services;
-The sale of extended validation (EV) code signing certificates.

The user “Lopiu” on the Russian cybercrime forum Exploit advertised many of the same unique services offered by EarthtoStar and other Star Chat members. Image source: ke-la.com.

These two accounts on Exploit created multiple sales threads in which they claimed administrative access to U.S. telecommunications providers and asked other Exploit members for help in monetizing that access. In June 2022, RocketAce, which appears to have been just one of EarthtoStar’s many aliases, posted to Exploit:

Hello. I have access to a telecommunications company’s citrix and vpn. I would like someone to help me break out of the system and potentially attack the domain controller so all logins can be extracted we can discuss payment and things leave your telegram in the comments or private message me ! Looking for someone with knowledge in citrix/privilege escalation

On Nov. 15, 2022, EarthtoStar posted to their Star Sanctuary Telegram channel that they were hiring malware developers with a minimum of three years of experience and the ability to develop rootkits, backdoors and malware loaders.

“Optional: Endorsed by advanced APT Groups (e.g. Conti, Ryuk),” the ad concluded, referencing two of Russia’s most rapacious and destructive ransomware affiliate operations. “Part of a nation-state / ex-3l (3 letter-agency).”

2023-PRESENT DAY

The Telegram and Discord chat channels wherein Flowers and Jubair allegedly planned and executed their extortion attacks are part of a loose-knit network known as the Com, an English-speaking cybercrime community consisting mostly of individuals living in the United States, the United Kingdom, Canada and Australia.

Many of these Com chat servers have hundreds to thousands of members each, and some of the more interesting solicitations on these communities are job offers for in-person assignments and tasks that can be found if one searches for posts titled, “If you live near,” or “IRL job” — short for “in real life” job.

These “violence-as-a-service” solicitations typically involve “brickings,” where someone is hired to toss a brick through the window at a specified address. Other IRL jobs for hire include tire-stabbings, molotov cocktail hurlings, drive-by shootings, and even home invasions. The people targeted by these services are typically other criminals within the community, but it’s not unusual to see Com members asking others for help in harassing or intimidating security researchers and even the very law enforcement officers who are investigating their alleged crimes.

It remains unclear what precipitated this incident or what followed directly after, but on January 13, 2023, a Star Sanctuary account used by EarthtoStar solicited the home invasion of a sitting U.S. federal prosecutor from New York. That post included a photo of the prosecutor taken from the Justice Department’s website, along with the message:

“Need irl niggas, in home hostage shit no fucking pussies no skinny glock holding 100 pound niggas either”

Throughout late 2022 and early 2023, EarthtoStar’s alias “Brad” (a.k.a. “Brad_banned”) frequently advertised Star Chat’s malware development services, including custom malicious software designed to hide the attacker’s presence on a victim machine:

We can develop KERNEL malware which will achieve persistence for a long time,
bypass firewalls and have reverse shell access.

This shit is literally like STAGE 4 CANCER FOR COMPUTERS!!!

Kernel meaning the highest level of authority on a machine.
This can range to simple shells to Bootkits.

Bypass all major EDR’s (SentinelOne, CrowdStrike, etc)
Patch EDR’s scanning functionality so it’s rendered useless!

Once implanted, extremely difficult to remove (basically impossible to even find)
Development Experience of several years and in multiple APT Groups.

Be one step ahead of the game. Prices start from $5,000+. Message @brad_banned to get a quote

In September 2023 , both MGM Resorts and Caesars Entertainment suffered ransomware attacks at the hands of a Russian ransomware affiliate program known as ALPHV and BlackCat. Caesars reportedly paid a $15 million ransom in that incident.

Within hours of MGM publicly acknowledging the 2023 breach, members of Scattered Spider were claiming credit and telling reporters they’d broken in by social engineering a third-party IT vendor. At a hearing in London last week, U.K. prosecutors told the court Jubair was found in possession of more than $50 million in ill-gotten cryptocurrency, including funds that were linked to the Las Vegas casino hacks.

The Star Chat channel was finally banned by Telegram on March 9, 2025. But U.S. prosecutors say Jubair and fellow Scattered Spider members continued their hacking, phishing and extortion activities up until September 2025.

In April 2025, the Com was buzzing about the publication of “The Com Cast,” a lengthy screed detailing Jubair’s alleged cybercriminal activities and nicknames over the years. This account included photos and voice recordings allegedly of Jubair, and asserted that in his early days on the Com Jubair used the nicknames Clark and Miku (these are both aliases used by Everlynn in connection with their fake EDR services).

Thalha Jubair (right), without his large-rimmed glasses, in an undated photo posted in The Com Cast.

More recently, the anonymous Com Cast author(s) claimed, Jubair had used the nickname “Operator,” which corresponds to a Com member who ran an automated Telegram-based doxing service that pulled consumer records from hacked data broker accounts. That public outing came after Operator allegedly seized control over the Doxbin, a long-running and highly toxic community that is used to “dox” or post deeply personal information on people.

“Operator/Clark/Miku: A key member of the ransomware group Scattered Spider, which consists of a diverse mix of individuals involved in SIM swapping and phishing,” the Com Cast account stated. “The group is an amalgamation of several key organizations, including Infinity Recursion (owned by Operator), True Alcorians (owned by earth2star), and Lapsus, which have come together to form a single collective.”

The New Jersey complaint (PDF) alleges Jubair and other Scattered Spider members committed computer fraud, wire fraud, and money laundering in relation to at least 120 computer network intrusions involving 47 U.S. entities between May 2022 and September 2025. The complaint alleges the group’s victims paid at least $115 million in ransom payments.

U.S. authorities say they traced some of those payments to Scattered Spider to an Internet server controlled by Jubair. The complaint states that a cryptocurrency wallet discovered on that server was used to purchase several gift cards, one of which was used at a food delivery company to send food to his apartment. Another gift card purchased with cryptocurrency from the same server was allegedly used to fund online gaming accounts under Jubair’s name. U.S. prosecutors said that when they seized that server they also seized $36 million in cryptocurrency.

The complaint also charges Jubair with involvement in a hacking incident in January 2025 against the U.S. courts system that targeted a U.S. magistrate judge overseeing a related Scattered Spider investigation. That other investigation appears to have been the prosecution of Noah Michael Urban, a 20-year-old Florida man charged in November 2024 by prosecutors in Los Angeles as one of five alleged Scattered Spider members.

Urban pleaded guilty in April 2025 to wire fraud and conspiracy charges, and in August he was sentenced to 10 years in federal prison. Speaking with KrebsOnSecurity from jail after his sentencing, Urban asserted that the judge case gave him more time than prosecutors requested because he was mad that Scattered Spider hacked his email account.

Noah “Kingbob” Urban, posting to Twitter/X around the time of his sentencing on Aug. 20.

court transcript (PDF) from a status hearing in February 2025 shows Urban was telling the truth about the hacking incident that happened while he was in federal custody. The judge told attorneys for both sides that a co-defendant in the California case was trying to find out about Mr. Urban’s activity in the Florida case, and that the hacker accessed the account by impersonating a judge over the phone and requesting a password reset.

Allison Nixon is chief research officer at the New York based security firm Unit 221B, and easily one of the world’s leading experts on Com-based cybercrime activity. Nixon said the core problem with legally prosecuting well-known cybercriminals from the Com has traditionally been that the top offenders tend to be under the age of 18, and thus difficult to charge under federal hacking statutes.

In the United States, prosecutors typically wait until an underage cybercrime suspect becomes an adult to charge them. But until that day comes, she said, Com actors often feel emboldened to continue committing — and very often bragging about — serious cybercrime offenses.

“Here we have a special category of Com offenders that effectively enjoy legal immunity,” Nixon told KrebsOnSecurity. “Most get recruited to Com groups when they are older, but of those that join very young, such as 12 or 13, they seem to be the most dangerous because at that age they have no grounding in reality and so much longevity before they exit their legal immunity.”

Nixon said U.K. authorities face the same challenge when they briefly detain and search the homes of underage Com suspects: Namely, the teen suspects simply go right back to their respective cliques in the Com and start robbing and hurting people again the minute they’re released.

Indeed, the U.K. court heard from prosecutors last week that both Scattered Spider suspects were detained and/or searched by local law enforcement on multiple occasions, only to return to the Com less than 24 hours after being released each time.

“What we see is these young Com members become vectors for perpetrators to commit enormously harmful acts and even child abuse,” Nixon said. “The members of this special category of people who enjoy legal immunity are meeting up with foreign nationals and conducting these sometimes heinous acts at their behest.”

Nixon said many of these individuals have few friends in real life because they spend virtually all of their waking hours on Com channels, and so their entire sense of identity, community and self-worth gets wrapped up in their involvement with these online gangs. She said if the law was such that prosecutors could treat these people commensurate with the amount of harm they cause society, that would probably clear up a lot of this problem.

“If law enforcement was allowed to keep them in jail, they would quit reoffending,” she said.

The Times of London reports that Flowers is facing three charges under the Computer Misuse Act: two of conspiracy to commit an unauthorized act in relation to a computer causing/creating risk of serious damage to human welfare/national security and one of attempting to commit the same act. Maximum sentences for these offenses can range from 14 years to life in prison, depending on the impact of the crime.

Jubair is reportedly facing two charges in the U.K.: One of conspiracy to commit an unauthorized act in relation to a computer causing/creating risk of serious damage to human welfare/national security and one of failing to comply with a section 49 notice to disclose the key to protected information.

In the United States, Jubair is charged with computer fraud conspiracy, two counts of computer fraud, wire fraud conspiracy, two counts of wire fraud, and money laundering conspiracy. If extradited to the U.S., tried and convicted on all charges, he faces a maximum penalty of 95 years in prison.

In July 2025, the United Kingdom followed Australia’s example in banning victims of hacking from paying ransoms to cybercriminal groups unless approved by officials. U.K. organizations that are considered part of critical infrastructure reportedly will face a complete ban, as will the entire public sector. U.K. victims of a hack are now required to notify officials to better inform policymakers on the scale of Britain’s ransomware problem.

For further reading (bless you), check out Bloomberg’s poignant story last week based on a year’s worth of jailhouse interviews with convicted Scattered Spider member Noah Urban.

Categories: Software Security

US Disrupts Massive Cell Phone Array in New York

Schneier on Security - Wed, 09/24/2025 - 07:09

This is a weird story:

The US Secret Service disrupted a network of telecommunications devices that could have shut down cellular systems as leaders gather for the United Nations General Assembly in New York City.

The agency said on Tuesday that last month it found more than 300 SIM servers and 100,000 SIM cards that could have been used for telecom attacks within the area encompassing parts of New York, New Jersey and Connecticut.

“This network had the power to disable cell phone towers and essentially shut down the cellular network in New York City,” said special agent in charge Matt McCool...

Categories: Software Security

Apple’s New Memory Integrity Enforcement

Schneier on Security - Tue, 09/23/2025 - 07:07

Apple has introduced a new hardware/software security feature in the iPhone 17: “Memory Integrity Enforcement,” targeting the memory safety vulnerabilities that spyware products like Pegasus tend to use to get unauthorized system access. From Wired:

In recent years, a movement has been steadily growing across the global tech industry to address a ubiquitous and insidious type of bugs known as memory-safety vulnerabilities. A computer’s memory is a shared resource among all programs, and memory safety issues crop up when software can pull data that should be off limits from a computer’s memory or manipulate data in memory that shouldn’t be accessible to the program. When developers—­even experienced and security-conscious developers—­write software in ubiquitous, historic programming languages, like C and C++, it’s easy to make mistakes that lead to memory safety vulnerabilities. That’s why proactive tools like ...

Categories: Software Security

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Kubernetes Blog - Mon, 09/22/2025 - 20:00

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Map view showing relationships between resources

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

NodePool default metrics shown with controls to see different frequencies

NodeClaim default metrics shown with controls to see different frequencies

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Pod Placement Decisions data including reason, from, pod, message, and age

Node decision data including Type, Reason, Node, From, Message

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Config editor with validation support

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Node claims data including Name, Status, Instance Type, CPU, Zone, Age, and Actions

Node Pools data including Name, NodeClass, CPU, Memory, Nodes, Status, Age, Actions

EC2 Node Classes data including Name, Cluster, Instance Profile, Status, IAM Role, Age, and Actions

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Pending Pods data including Name, Namespace, Type, Reason, From, and Message

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name Tested Extra provider specific info supported AWS ✅ ✅ Azure ✅ ✅ AlibabaCloud ❌ ❌ Bizfly Cloud ❌ ❌ Cluster API ❌ ❌ GCP ❌ ❌ Proxmox ❌ ❌ Oracle Cloud Infrastructure (OCI) ❌ ❌

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

Categories: CNCF Projects, Kubernetes

Kubernetes v1.34: Pod Level Resources Graduated to Beta

Kubernetes Blog - Mon, 09/22/2025 - 14:30

On behalf of the Kubernetes community, I am thrilled to announce that the Pod Level Resources feature has graduated to Beta in the Kubernetes v1.34 release and is enabled by default! This significant milestone introduces a new layer of flexibility for defining and managing resource allocation for your Pods. This flexibility stems from the ability to specify CPU and memory resources for the Pod as a whole. Pod level resources can be combined with the container-level specifications to express the exact resource requirements and limits your application needs.

Pod-level specification for resources

Until recently, resource specifications that applied to Pods were primarily defined at the individual container level. While effective, this approach sometimes required duplicating or meticulously calculating resource needs across multiple containers within a single Pod. As a beta feature, Kubernetes allows you to specify the CPU, memory and hugepages resources at the Pod-level. This means you can now define resource requests and limits for an entire Pod, enabling easier resource sharing without requiring granular, per-container management of these resources where it's not needed.

Why does Pod-level specification matter?

This feature enhances resource management in Kubernetes by offering flexible resource management at both the Pod and container levels.

  • It provides a consolidated approach to resource declaration, reducing the need for meticulous, per-container management, especially for Pods with multiple containers.

  • Pod-level resources enable containers within a pod to share unused resoures amongst themselves, promoting efficient utilization within the pod. For example, it prevents sidecar containers from becoming performance bottlenecks. Previously, a sidecar (e.g., a logging agent or service mesh proxy) hitting its individual CPU limit could be throttled and slow down the entire Pod, even if the main application container had plenty of spare CPU. With pod-level resources, the sidecar and the main container can share Pod's resource budget, ensuring smooth operation during traffic spikes - either the whole Pod is throttled or all containers work.

  • When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This gives you – and cluster administrators - a powerful way to enforce overall resource boundaries for your Pods.

    For scheduling, if a pod-level request is explicitly defined, the scheduler uses that specific value to find a suitable node, insteaf of the aggregated requests of the individual containers. At runtime, the pod-level limit acts as a hard ceiling for the combined resource usage of all containers. Crucially, this pod-level limit is the absolute enforcer; even if the sum of the individual container limits is higher, the total resource consumption can never exceed the pod-level limit.

  • Pod-level resources are prioritized in influencing the Quality of Service (QoS) class of the Pod.

  • For Pods running on Linux nodes, the Out-Of-Memory (OOM) score adjustment calculation considers both pod-level and container-level resources requests.

  • Pod-level resources are designed to be compatible with existing Kubernetes functionalities, ensuring a smooth integration into your workflows.

How to specify resources for an entire Pod

Using PodLevelResources feature gate requires Kubernetes v1.34 or newer for all cluster components, including the control plane and every node. This feature gate is in beta and enabled by default in v1.34.

Example manifest

You can specify CPU, memory and hugepages resources directly in the Pod spec manifest at the resources field for the entire Pod.

Here’s an example demonstrating a Pod with both CPU and memory requests and limits defined at the Pod level:

apiVersion: v1
kind: Pod
metadata:
 name: pod-resources-demo
 namespace: pod-resources-example
spec:
 # The 'resources' field at the Pod specification level defines the overall
 # resource budget for all containers within this Pod combined.
 resources: # Pod-level resources
 # 'limits' specifies the maximum amount of resources the Pod is allowed to use.
 # The sum of the limits of all containers in the Pod cannot exceed these values.
 limits:
 cpu: "1" # The entire Pod cannot use more than 1 CPU core.
 memory: "200Mi" # The entire Pod cannot use more than 200 MiB of memory.
 # 'requests' specifies the minimum amount of resources guaranteed to the Pod.
 # This value is used by the Kubernetes scheduler to find a node with enough capacity.
 requests:
 cpu: "1" # The Pod is guaranteed 1 CPU core when scheduled.
 memory: "100Mi" # The Pod is guaranteed 100 MiB of memory when scheduled.
 containers:
 - name: main-app-container
 image: nginx
 ...
 # This container has no resource requests or limits specified.
 - name: auxiliary-container
 image: fedora
 command: ["sleep", "inf"]
 ...
 # This container has no resource requests or limits specified.

In this example, the pod-resources-demo Pod as a whole requests 1 CPU and 100 MiB of memory, and is limited to 1 CPU and 200 MiB of memory. The containers within will operate under these overall Pod-level constraints, as explained in the next section.

Interaction with container-level resource requests or limits

When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This means the node allocates resources based on the pod-level specifications.

Consider a Pod with two containers where pod-level CPU and memory requests and limits are defined, and only one container has its own explicit resource definitions:

apiVersion: v1
kind: Pod
metadata:
 name: pod-resources-demo
 namespace: pod-resources-example
spec:
 resources:
 limits:
 cpu: "1"
 memory: "200Mi"
 requests:
 cpu: "1"
 memory: "100Mi"
 containers:
 - name: main-app-container
 image: nginx
 resources:
 requests:
 cpu: "0.5"
 memory: "50Mi"
 - name: auxiliary-container
 image: fedora
 command: [ "sleep", "inf"]
 # This container has no resource requests or limits specified.
  • Pod-Level Limits: The pod-level limits (cpu: "1", memory: "200Mi") establish an absolute boundary for the entire Pod. The sum of resources consumed by all its containers is enforced at this ceiling and cannot be surpassed.

  • Resource Sharing and Bursting: Containers can dynamically borrow any unused capacity, allowing them to burst as needed, so long as the Pod's aggregate usage stays within the overall limit.

  • Pod-Level Requests: The pod-level requests (cpu: "1", memory: "100Mi") serve as the foundational resource guarantee for the entire Pod. This value informs the scheduler's placement decision and represents the minimum resources the Pod can rely on during node-level contention.

  • Container-Level Requests: Container-level requests create a priority system within the Pod's guaranteed budget. Because main-app-container has an explicit request (cpu: "0.5", memory: "50Mi"), it is given precedence for its share of resources under resource pressure over the auxiliary-container, which has no such explicit claim.

Limitations

  • First of all, in-place resize of pod-level resources is not supported for Kubernetes v1.34 (or earlier). Attempting to modify the pod-level resource limits or requests on a running Pod results in an error: the resize is rejected. The v1.34 implementation of Pod level resources focuses on allowing initial declaration of an overall resource envelope, that applies to the entire Pod. That is distinct from in-place pod resize, which (despite what the name might suggest) allows you to make dynamic adjustments to container resource requests and limits, within a running Pod, and potentially without a container restart. In-place resizing is also not yet a stable feature; it graduated to Beta in the v1.33 release.

  • Only CPU, memory, and hugepages resources can be specified at pod-level.

  • Pod-level resources are not supported for Windows pods. If the Pod specification explicitly targets Windows (e.g., by setting spec.os.name: "windows"), the API server will reject the Pod during the validation step. If the Pod is not explicitly marked for Windows but is scheduled to a Windows node (e.g., via a nodeSelector), the Kubelet on that Windows node will reject the Pod during its admission process.

  • The Topology Manager, Memory Manager and CPU Manager do not align pods and containers based on pod-level resources as these resource managers don't currently support pod-level resources.

Getting started and providing feedback

Ready to explore Pod Level Resources feature? You'll need a Kubernetes cluster running version 1.34 or later. Remember to enable the PodLevelResources feature gate across your control plane and all nodes.

As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:

Categories: CNCF Projects, Kubernetes

Details About Chinese Surveillance and Propaganda Companies

Schneier on Security - Mon, 09/22/2025 - 07:03

Details from leaked documents:

While people often look at China’s Great Firewall as a single, all-powerful government system unique to China, the actual process of developing and maintaining it works the same way as surveillance technology in the West. Geedge collaborates with academic institutions on research and development, adapts its business strategy to fit different clients’ needs, and even repurposes leftover infrastructure from its competitors.

[…]

The parallels with the West are hard to miss. A number of American surveillance and propaganda firms also started as academic projects before they were spun out into startups and grew by chasing government contracts. The difference is that in China, these companies operate with far less transparency. Their work comes to light only when a trove of documents slips onto the internet...

Categories: Software Security

PromCon is Only One Month Away; See You in Person or via Live Stream!

Prometheus Blog - Sun, 09/21/2025 - 20:00

It's that time of the year again! With just a few weeks to go until PromCon EU 2025, the Prometheus community is buzzing with excitement, preparations, and passionate conference-driven development. This tenth iteration of the conference dedicated to the CNCF Prometheus monitoring ecosystem continues the tradition of a cozy, community-led, single-track, 2-day intense Prometheus user and developer learning time!

The conference is taking place on October 21–22 at the Google Office in Munich, with an incredible lineup of talks and discussions you won't want to miss!

Content Highlights

The schedule is packed with sessions for everyone, from beginners to seasoned veterans. Topics range from:

  • Exciting development of Prometheus features and protocols, e.g. OpenMetrics 2.0, downsampling, delta metric, Parquet-based storage.
  • Insightful user community case studies and integrations, e.g. agentic AI, OpenTelemetry Schema and Resource Attributes adoption, auto aggregations, operator improvements, Perses novelties and Rust ecosystem updates.
  • General best practices and guidelines for navigating complex observability ecosystems, e.g. Prometheus and OpenTelemetry instrumentation, new alerting methodologies, monitoring CPU hardware, and new PromQL aspects.

...and more!

Last Tickets and Live Recordings

A few last tickets are still available, but they will likely sell quickly soon. You can register for the conference here: https://promcon.io/2025-munich/register/.

However, if you can't make it to Munich, fear not! All talks will be live-streamed and available as recordings after the event.

Feel free to bookmark livestream links below:

Lightning Talks Form is Now Open

Following our tradition, we're offering space for any PromCon in-person participant (capacity limits apply) to deliver a 5-minute talk related to the Prometheus ecosystem.

If you have a last-minute learning, case study or idea to share with the community, feel free to prepare a few slides and sign up here: https://forms.gle/2JF13tSBGzrcuZD7A

The form will be open until the start of the PromCon conference.

Credits

This conference wouldn't be possible without the hard work of the Prometheus team's core organisers (Goutham, Richi and Basti), and the amazing CNCF team!

We wouldn't be able to do it without our sponsors, too:

We can't wait to see you there, either in person or online!

Categories: CNCF Projects

Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)

Kubernetes Blog - Fri, 09/19/2025 - 14:30

Have you ever made a typo when expanding your persistent volumes in Kubernetes? Meant to specify 2TB but specified 20TiB? This seemingly innocuous problem was kinda hard to fix - and took the project almost 5 years to fix. Automated recovery from storage expansion has been around for a while in beta; however, with the v1.34 release, we have graduated this to general availability.

While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).

What if you make a mistake and then realize immediately? With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested size hadn't finished, you can amend the size requested. Kubernetes will automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the latest size you specified.

I'll walk through an example of how all of this works.

Reducing PVC size to recover from failed expansion

Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously specified 10TB to 100TB - but you make a typo and specify 1000TB.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: myclaim
spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 1000TB # newly specified size - but incorrect!

Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to 1000TB is never going to succeed.

In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size, that is smaller than the mistake, provided it is still larger than the original size of the actual PersistentVolume.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: myclaim
spec:
 accessModes:
 - ReadWriteOnce
 resources:
 requests:
 storage: 100TB # Corrected size; has to be greater than 10TB.
 # You cannot shrink the volume below its actual size.

This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.

This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it must be still higher than the original size in .status.capacity. Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.

Improved error handling and observability of volume expansion

Implementing what might look like a relatively minor change also required us to almost fully redo how volume expansion works under the hood in Kubernetes. There are new API fields available in PVC objects which you can monitor to observe progress of volume expansion.

Improved observability of in-progress expansion

You can query .status.allocatedResourceStatus['storage'] of a PVC to monitor progress of a volume expansion operation. For a typical block volume, this should transition between ControllerResizeInProgress, NodeResizePending and NodeResizeInProgress and become nil/empty when volume expansion has finished.

If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - ControllerResizeInfeasible or NodeResizeInfeasible.

You can also observe size towards which Kubernetes is working by watching pvc.status.allocatedResources.

Improved error handling and reporting

Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.

Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate pvc.status.conditions with error keys ControllerResizeError or NodeResizeError when volume expansion fails.

Fixes long standing bugs in resizing workflows

This feature also has allowed us to fix long standing bugs in resizing workflow such as Kubernetes issue #115294. If you observe anything broken, please report your bugs to https://github.com/kubernetes/kubernetes/issues, along with details about how to reproduce the problem.

Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA without feedback from @msau42, @jsafrane and @xing-yang.

All of the contributors who worked on this also appreciate the input provided by @thockin and @liggitt at various Kubernetes contributor summits.

Categories: CNCF Projects, Kubernetes

Surveying the Global Spyware Market

Schneier on Security - Fri, 09/19/2025 - 07:01

The Atlantic Council has published its second annual report: “Mythical Beasts: Diving into the depths of the global spyware market.”

Too much good detail to summarize, but here are two items:

First, the authors found that the number of US-based investors in spyware has notably increased in the past year, when compared with the sample size of the spyware market captured in the first Mythical Beasts project. In the first edition, the United States was the second-largest investor in the spyware market, following Israel. In that edition, twelve investors were observed to be domiciled within the United States—­whereas in this second edition, twenty new US-based investors were observed investing in the spyware industry in 2024. This indicates a significant increase of US-based investments in spyware in 2024, catapulting the United States to being the largest investor in this sample of the spyware market. This is significant in scale, as US-based investment from 2023 to 2024 largely outpaced that of other major investing countries observed in the first dataset, including Italy, Israel, and the United Kingdom. It is also significant in the disparity it points to ­the visible enforcement gap between the flow of US dollars and US policy initiatives. Despite numerous US policy actions, such as the addition of spyware vendors on the ...

Categories: Software Security

Kubernetes v1.34: DRA Consumable Capacity

Kubernetes Blog - Thu, 09/18/2025 - 14:30

Dynamic Resource Allocation (DRA) is a Kubernetes API for managing scarce resources across Pods and containers. It enables flexible resource requests, going beyond simply allocating N number of devices to support more granular usage scenarios. With DRA, users can request specific types of devices based on their attributes, define custom configurations tailored to their workloads, and even share the same resource among multiple containers or Pods.

In this blog, we focus on the device sharing feature and dive into a new capability introduced in Kubernetes 1.34: DRA consumable capacity, which extends DRA to support finer-grained device sharing.

Background: device sharing via ResourceClaims

From the beginning, DRA introduced the ability for multiple Pods to share a device by referencing the same ResourceClaim. This design decouples resource allocation from specific hardware, allowing for more dynamic and reusable provisioning of devices.

In Kubernetes 1.33, the new support for partitionable devices allowed resource drivers to advertise slices of a device that are available, rather than exposing the entire device as an all-or-nothing resource. This enabled Kubernetes to model shareable hardware more accurately.

But there was still a missing piece: it didn't yet support scenarios where the device driver manages fine-grained, dynamic portions of a device resource — like network bandwidth — based on user demand, or to share those resources independently of ResourceClaims, which are restricted by their spec and namespace.

That’s where consumable capacity for DRA comes in.

Benefits of DRA consumable capacity support

Here's a taste of what you get in a cluster with the DRAConsumableCapacity feature gate enabled.

Device sharing across multiple ResourceClaims or DeviceRequests

Resource drivers can now support sharing the same device — or even a slice of a device — across multiple ResourceClaims or across multiple DeviceRequests.

This means that Pods from different namespaces can simultaneously share the same device, if permitted and supported by the specific DRA driver.

Device resource allocation

Kubernetes extends the allocation algorithm in the scheduler to support allocating a portion of a device's resources, as defined in the capacity field. The scheduler ensures that the total allocated capacity across all consumers never exceeds the device’s total capacity, even when shared across multiple ResourceClaims or DeviceRequests. This is very similar to the way the scheduler allows Pods and containers to share allocatable resources on Nodes; in this case, it allows them to share allocatable (consumable) resources on Devices.

This feature expands support for scenarios where the device driver is able to manage resources within a device and on a per-process basis — for example, allocating a specific amount of memory (e.g., 8 GiB) from a virtual GPU, or setting bandwidth limits on virtual network interfaces allocated to specific Pods. This aims to provide safe and efficient resource sharing.

DistinctAttribute constraint

This feature also introduces a new constraint: DistinctAttribute, which is the complement of the existing MatchAttribute constraint.

The primary goal of DistinctAttribute is to prevent the same underlying device from being allocated multiple times within a single ResourceClaim, which could happen since we are allocating shares (or subsets) of devices. This constraint ensures that each allocation refers to a distinct resource, even if they belong to the same device class.

It is useful for use cases such as allocating network devices connecting to different subnets to expand coverage or provide redundancy across failure domains.

How to use consumable capacity?

DRAConsumableCapacity is introduced as an alpha feature in Kubernetes 1.34. The feature gate DRAConsumableCapacity must be enabled in kubelet, kube-apiserver, kube-scheduler and kube-controller-manager.

--feature-gates=...,DRAConsumableCapacity=true

As a DRA driver developer

As a DRA driver developer writing in Golang, you can make a device within a ResourceSlice allocatable to multiple ResourceClaims (or devices.requests) by setting AllowMultipleAllocations to true.

Device {
 ...
 AllowMultipleAllocations: ptr.To(true),
 ...
}

Additionally, you can define a policy to restrict how each device's Capacity should be consumed by each DeviceRequest by defining RequestPolicy field in the DeviceCapacity. The example below shows how to define a policy that requires a GPU with 40 GiB of memory to allocate at least 5 GiB per request, with each allocation in multiples of 5 GiB.

DeviceCapacity{
 Value: resource.MustParse("40Gi"),
 RequestPolicy: &CapacityRequestPolicy{
 Default: ptr.To(resource.MustParse("5Gi")),
 ValidRange: &CapacityRequestPolicyRange {
 Min: ptr.To(resource.MustParse("5Gi")),
 Step: ptr.To(resource.MustParse("5Gi")),
 }
 }
}

This will be published to the ResourceSlice, as partially shown below:

apiVersion: resource.k8s.io/v1
kind: ResourceSlice
...
spec:
 devices:
 - name: gpu0
 allowMultipleAllocations: true
 capacity:
 memory:
 value: 40Gi
 requestPolicy:
 default: 5Gi
 validRange:
 min: 5Gi
 step: 5Gi

An allocated device with a specified portion of consumed capacity will have a ShareID field set in the allocation status.

claim.Status.Allocation.Devices.Results[i].ShareID

This ShareID allows the driver to distinguish between different allocations that refer to the same device or same statically-partitioned slice but come from different ResourceClaim requests.
It acts as a unique identifier for each shared slice, enabling the driver to manage and enforce resource limits independently across multiple consumers.

As a consumer

As a consumer (or user), the device resource can be requested with a ResourceClaim like this:

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
...
spec:
 devices:
 requests: # for devices
 - name: req0
 exactly:
 - deviceClassName: resource.example.com
 capacity:
 requests: # for resources which must be provided by those devices
 memory: 10Gi

This configuration ensures that the requested device can provide at least 10GiB of memory.

Notably that any resource.example.com device that has at least 10GiB of memory can be allocated. If a device that does not support multiple allocations is chosen, the allocation would consume the entire device. To filter only devices that support multiple allocations, you can define a selector like this:

selectors:
 - cel:
 expression: |-
 device.allowMultipleAllocations == true

Integration with DRA device status

In device sharing, general device information is provided through the resource slice. However, some details are set dynamically after allocation. These can be conveyed using the .status.devices field of a ResourceClaim. That field is only published in clusters where the DRAResourceClaimDeviceStatus feature gate is enabled.

If you do have device status support available, a driver can expose additional device-specific information beyond the ShareID. One particularly useful use case is for virtual networks, where a driver can include the assigned IP address(es) in the status. This is valuable for both network service operations and troubleshooting.

You can find more information by watching our recording at: KubeCon Japan 2025 - Reimagining Cloud Native Networks: The Critical Role of DRA.

What can you do next?

  • Check out the CNI DRA Driver project for an example of DRA integration in Kubernetes networking. Try integrating with network resources like macvlan, ipvlan, or smart NICs.

  • Start enabling the DRAConsumableCapacity feature gate and experimenting with virtualized or partitionable devices. Specify your workloads with consumable capacity (for example: fractional bandwidth or memory).

  • Let us know your feedback:

    • ✅ What worked well?
    • ⚠️ What didn’t?

    If you encountered issues to fix or opportunities to enhance, please file a new issue and reference KEP-5075 there, or reach out via Slack (#wg-device-management).

Conclusion

Consumable capacity support enhances the device sharing capability of DRA by allowing effective device sharing across namespaces, across claims, and tailored to each Pod’s actual needs. It also empowers drivers to enforce capacity limits, improves scheduling accuracy, and unlocks new use cases like bandwidth-aware networking and multi-tenant device sharing.

Try it out, experiment with consumable resources, and help shape the future of dynamic resource allocation in Kubernetes!

Further Reading

Categories: CNCF Projects, Kubernetes

Time-of-Check Time-of-Use Attacks Against LLMs

Schneier on Security - Thu, 09/18/2025 - 07:06

This is a nice piece of research: “Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents“.:

Abstract: Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security implications. While prior work has examined prompt-based attacks (e.g., prompt injection) and data-oriented threats (e.g., data exfiltration), time-of-check to time-of-use (TOCTOU) remain largely unexplored in this context. TOCTOU arises when an agent validates external state (e.g., a file or API response) that is later modified before use, enabling practical attacks such as malicious configuration swaps or payload injection. In this work, we present the first study of TOCTOU vulnerabilities in LLM-enabled agents. We introduce TOCTOU-Bench, a benchmark with 66 realistic user tasks designed to evaluate this class of vulnerabilities. As countermeasures, we adapt detection and mitigation techniques from systems security to this setting and propose prompt rewriting, state integrity monitoring, and tool-fusing. Our study highlights challenges unique to agentic workflows, where we achieve up to 25% detection accuracy using automated detection methods, a 3% decrease in vulnerable plan generation, and a 95% reduction in the attack window. When combining all three approaches, we reduce the TOCTOU vulnerabilities from an executed trajectory from 12% to 8%. Our findings open a new research direction at the intersection of AI safety and systems security...

Categories: Software Security

Pages

Subscribe to articles.innovatingtomorrow.net aggregator