You are here

Feed aggregator

Friday Squid Blogging: Squid Cartoon

Schneier on Security - Fri, 02/20/2026 - 17:05

I like this one.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Blog moderation policy.

Categories: Software Security

‘Starkiller’ Phishing Service Proxies Real Login Pages, MFA

Krebs on Security - Fri, 02/20/2026 - 15:00

Most phishing websites are little more than static copies of login pages for popular online destinations, and they are often quickly taken down by anti-abuse activists and security firms. But a stealthy new phishing-as-a-service offering lets customers sidestep both of these pitfalls: It uses cleverly disguised links to load the target brand’s real website, and then acts as a relay between the target and the legitimate site — forwarding the victim’s username, password and multi-factor authentication (MFA) code to the legitimate site and returning its responses.

There are countless phishing kits that would-be scammers can use to get started, but successfully wielding them requires some modicum of skill in configuring servers, domain names, certificates, proxy services, and other repetitive tech drudgery. Enter Starkiller, a new phishing service that dynamically loads a live copy of the real login page and records everything the user types, proxying the data from the legitimate site back to the victim.

According to an analysis of Starkiller by the security firm Abnormal AI, the service lets customers select a brand to impersonate (e.g., Apple, Facebook, Google, Microsoft et. al.) and generates a deceptive URL that visually mimics the legitimate domain while routing traffic through the attacker’s infrastructure.

For example, a phishing link targeting Microsoft customers appears as “login.microsoft.com@[malicious/shortened URL here].” The “@” sign in the link trick is an oldie but goodie, because everything before the “@” in a URL is considered username data, and the real landing page is what comes after the “@” sign. Here’s what it looks like in the target’s browser:

Image: Abnormal AI. The actual malicious landing page is blurred out in this picture, but we can see it ends in .ru. The service also offers the ability to insert links from different URL-shortening services.

Once Starkiller customers select the URL to be phished, the service spins up a Docker container running a headless Chrome browser instance that loads the real login page, Abnormal found.

“The container then acts as a man-in-the-middle reverse proxy, forwarding the end user’s inputs to the legitimate site and returning the site’s responses,” Abnormal researchers Callie Baron and Piotr Wojtyla wrote in a blog post on Thursday. “Every keystroke, form submission, and session token passes through attacker-controlled infrastructure and is logged along the way.”

Starkiller in effect offers cybercriminals real-time session monitoring, allowing them to live-stream the target’s screen as they interact with the phishing page, the researchers said.

“The platform also includes keylogger capture for every keystroke, cookie and session token theft for direct account takeover, geo-tracking of targets, and automated Telegram alerts when new credentials come in,” they wrote. “Campaign analytics round out the operator experience with visit counts, conversion rates, and performance graphs—the same kind of metrics dashboard a legitimate SaaS [software-as-a-service] platform would offer.”

Abnormal said the service also deftly intercepts and relays the victim’s MFA credentials, since the recipient who clicks the link is actually authenticating with the real site through a proxy, and any authentication tokens submitted are then forwarded to the legitimate service in real time.

“The attacker captures the resulting session cookies and tokens, giving them authenticated access to the account,” the researchers wrote. “When attackers relay the entire authentication flow in real time, MFA protections can be effectively neutralized despite functioning exactly as designed.”

The “URL Masker” feature of the Starkiller phishing service features options for configuring the malicious link. Image: Abnormal.

Starkiller is just one of several cybercrime services offered by a threat group calling itself Jinkusu, which maintains an active user forum where customers can discuss techniques, request features and troubleshoot deployments. One a-la-carte feature will harvest email addresses and contact information from compromised sessions, and advises the data can be used to build target lists for follow-on phishing campaigns.

This service strikes me as a remarkable evolution in phishing, and its apparent success is likely to be copied by other enterprising cybercriminals (assuming the service performs as well as it claims). After all, phishing users this way avoids the upfront costs and constant hassles associated with juggling multiple phishing domains, and it throws a wrench in traditional phishing detection methods like domain blocklisting and static page analysis.

It also massively lowers the barrier to entry for novice cybercriminals, Abnormal researchers observed.

“Starkiller represents a significant escalation in phishing infrastructure, reflecting a broader trend toward commoditized, enterprise-style cybercrime tooling,” their report concludes. “Combined with URL masking, session hijacking, and MFA bypass, it gives low-skill cybercriminals access to attack capabilities that were previously out of reach.”

Categories: Software Security

Ring Cancels Its Partnership with Flock

Schneier on Security - Fri, 02/20/2026 - 07:08

It’s a demonstration of how toxic the surveillance-tech company Flock has become when Amazon’s Ring cancels the partnership between the two companies.

As Hamilton Nolan advises, remove your Ring doorbell.

Categories: Software Security

Malicious AI

Schneier on Security - Thu, 02/19/2026 - 07:05

Interesting:

Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

Part 2 of the story. And a Wall Street Journal article.

Categories: Software Security

Multi-CDN: A Critical Decision for a Resilient Architecture

Fastly Blog (Security) - Wed, 02/18/2026 - 19:00
Learn why Multi-CDN is a critical architecture for resilience & performance. Explore DNS, Layer 7, CDN chaining, and client-side steering with Fastly.
Categories: Software Security

Announcing Kyverno 1.17!

CNCF Blog Projects Category - Wed, 02/18/2026 - 07:11

Kyverno 1.17 is a landmark release that marks the stabilization of our next-generation Common Expression Language (CEL) policy engine.

While 1.16 introduced the “CEL-first” vision in beta, 1.17 promotes these capabilities to v1, offering a high-performance, future-proof path for policy as code.

This release focuses on “completing the circle” for CEL policies by introducing namespaced mutation and generation, expanding the available function libraries for complex logic, and enhancing supply chain security with upcoming Cosign v3 support.

A new look for kyverno.io

The first thing you’ll notice with 1.17 is our completely redesigned website. We’ve moved beyond a simple documentation site to create a modern, high-performance portal for platform engineers.Let’s be honest: the Kyverno website redesign was long overdue. As the project evolved into the industry standard for unified policy as code, our documentation needs to reflect that maturity. We are proud to finally unveil the new experience at https://kyverno.io.

 A new era, a new website' showcasing the changes from the old Kyverno website to the new website, with Kyverno 1.17.

  • Modern redesign
    Built on the Starlight framework, the new site is faster, fully responsive, and features a clean, professional aesthetic that makes long-form reading much easier on the eyes.
  • Enhanced documentation structure
    We’ve reorganized our docs from the ground up. Information is now tiered by “User Journey”—from a simplified Quick Start for beginners to deep-dive Reference material for advanced policy authors.
  • Fully redesigned policy catalog
    Our library of 300+ sample policies has a new interface. It features improved filtering and a dedicated search that allows you to find policies by Category (Best Practices, Security, etc.) or Type (CEL vs. JMESPath) instantly.
  • Enhanced search capabilities
    We’ve integrated a more intelligent search engine that indexes both documentation and policy code, ensuring you get the right answer on the first try.
  • Brand new blog
    The Kyverno blog has been refreshed to better showcase technical deep dives, community case studies, and release announcements like this one!

Namespaced mutating and generating policies

In 1.16, we introduced namespaced variants for validation, cleanup, and image verification.

Kyverno 1.17 completes this by adding:

  • NamespacedMutatingPolicy
  • NamespacedGeneratingPolicy

This enables true multi-tenancy. Namespace owners can now define their own mutation and generation logic (e.g., automatically injecting sidecars or creating default ConfigMaps) without requiring cluster-wide permissions or affecting other tenants.

CEL policy types reach v1 (GA)

The headline for 1.17 is the promotion of CEL-based policy types to v1. This signifies that the API is now stable and production-ready.

The promotion includes:

  • ValidatingPolicy and NamespacedValidatingPolicy
  • MutatingPolicy and NamespacedMutatingPolicy
  • GeneratingPolicy and NamespacedGeneratingPolicy
  • ImageValidatingPolicy and NamespacedImageValidatingPolicy
  • DeletingPolicy and NamespacedDeletingPolicy
  • PolicyException

With this graduation, platform teams can confidently migrate from JMESPath-based policies to CEL to take advantage of significantly improved evaluation performance and better alignment with upstream Kubernetes ValidatingAdmissionPolicies / MutatingAdmissionPolicies.

New CEL capabilities and functions

To ensure CEL policies are as powerful as the original Kyverno engine, 1.17 introduces several new function libraries:

  • Hash Functions
    Built-in support for md5(value), sha1(value), and sha256(value) hashing.
  • Math Functions
    Use math.round(value, precision) to round numbers to a specific decimal or integer precision.
  • X509 Decoding
    Policies can now inspect and validate the contents of x509 certificates directly within a CEL expression using x509.decode(pem).
  • Random String Generation
    Generate random strings with random() (default pattern) or random(pattern) for custom regex-based patterns.
  • Transform Utilities
    Use listObjToMap(list1, list2, keyField, valueField) to merge two object lists into a map.
  • JSON Parsing
    Parse JSON strings into structured data with json.unmarshal(jsonString).
  • YAML Parsing
    Parse YAML strings into structured data with yaml.parse(yamlString).
  • Time-based Logic
    New time.now(), time.truncate(timestamp, duration), and time.toCron(timestamp) functions allow for time-since or “maintenance window” style policies.

The deprecation of legacy APIs

As Kyverno matures and aligns more closely with upstream Kubernetes standards, we are making the strategic shift to a CEL-first architecture. This means that the legacy Policy and ClusterPolicy types (which served the community for years using JMESPath) are now entering their sunset phase.

The deprecation schedule

Kyverno 1.17 officially marks ClusterPolicy and CleanupPolicy as Deprecated. While they remain functional in this release, the clock has started on their removal to make way for the more performant, standardized CEL-based engines.

ReleaseDate (estimated)Statusv1.17Jan 2026Marked for deprecationv1.18Apr 2026Critical fixes onlyv1.19Jul 2026Critical fixes onlyv1.20Oct 2026Planned for removal

Why the change?

By standardizing on the Common Expression Language (CEL), Kyverno significantly improves its performance and aligns with the native validation logic used by the Kubernetes API server itself.

For platform teams, this means one less language to learn and a more predictable and scalable policy-as-code experience.

Note for authors

From this point forward, we strongly recommend that every new policy you write be based on the new CEL APIs. Choosing the legacy APIs for new work today simply adds to your migration workload later this year.

Migration tips

We understand that many of you have hundreds of existing policies. To ensure a smooth transition, we have provided comprehensive resources:

  • The Migration Guide
    Our new Migration to CEL Guide provides a side-by-side mapping of legacy ClusterPolicy fields to their new equivalents (e.g., mapping validate.pattern to ValidatingPolicy expressions).
  • New Policy Types
    You can now begin moving your rules into specialized types like ValidatingPolicy, MutatingPolicy, and GeneratingPolicy. You can see the full breakdown of these new v1 APIs in the Policy Types Overview.

Enhanced supply chain security

Supply chain security remains a core pillar of Kyverno.

  • Cosign v3 Support
    1.17 adds support for the latest Cosign features, ensuring your image verification remains compatible with the evolving Sigstore ecosystem.
  • Expanded Attestation Parsing
    New capabilities to deserialize YAML and JSON strings within CEL policies make it easier to verify complex metadata and SBOMs.

Observability and reporting upgrades

We have refined how Kyverno communicates policy results:

  • Granular Reporting Control
    A new –allowedResults flag allows you to filter which results (e.g., only “Fail”) are stored in reports, significantly reducing ETCD pressure in large clusters.
  • Enhanced Metrics
    More detailed latency and execution metrics for CEL policies are now included by default to help you monitor the “hidden” cost of policy enforcement.

For developers and integrators

To support the broader ecosystem and make it easier to build integrations, we have decoupled our core components:

  • New API Repository
    Our CEL-based APIs now live in a dedicated repository: kyverno/api. This makes it significantly lighter to import Kyverno types into your own Go projects.
  • Kyverno SDK
    For developers building custom controllers or tools that interact with Kyverno, the SDK project is now housed at kyverno/sdk.

Getting started and backward compatibility

Upgrading from 1.16 is straightforward. However, since the CEL policy types have moved to v1, we recommend updating your manifests to the new API version. Kyverno will continue to support v1beta1 for a transition period.

helm repo update
helm upgrade --install kyverno kyverno/kyverno -n kyverno --version 3.7.0

Looking ahead: The Kyverno roadmap

As we move past the 1.17 milestone, our focus shifts toward long-term sustainability and the “Kyverno Platform” experience. Our goal is to ensure that Kyverno remains the most user-friendly and performant governance tool in the cloud-native ecosystem.

  • Growing the community
    We are doubling down on our commitment to the community. Expect more frequent office hours, improved contributor onboarding, and a renewed focus on making the Kyverno community the most welcoming space in CNCF.
  • A unified tooling experience
    Over the years, we’ve built several powerful sub-projects (like the CLI, Policy Reporter, and Kyverno-Authz). A major goal on our roadmap is to unify these tools into a cohesive experience, reducing fragmentation and making it easier to manage the entire policy lifecycle from a single vantage point.
  • Performance and scalability guardrails
    As clusters grow, performance becomes paramount. We are shifting our focus toward rigorous automated performance testing and will be providing more granular metrics regarding throughput and latency. We want to give platform engineers the data they need to understand exactly what Kyverno can handle in high-scale production environments.
  • Continuous UX improvement
    The website redesign was just the first step. We will continue to iterate on our user interfaces, documentation, and error messaging to ensure that Kyverno remains “Simplified” by design, not just in name.

Conclusion

Kyverno 1.17 is the most robust version yet, blending the flexibility of our original engine with the performance and standardization of CEL.

But this release is about more than just code—it’s about the total user experience. Whether you’re browsing the new policy catalog or scaling thousands of CEL-based rules, we hope this release makes your Kubernetes journey smoother.

A massive thank you to our contributors for making this release (and the new website!) a reality.

Categories: CNCF Projects

AI Found Twelve New Vulnerabilities in OpenSSL

Schneier on Security - Wed, 02/18/2026 - 07:03

The title of the post is”What AI Security Research Looks Like When It Works,” and I agree:

In the latest OpenSSL security release> on January 27, 2026, twelve new zero-day vulnerabilities (meaning unknown to the maintainers at time of disclosure) were announced. Our AI system is responsible for the original discovery of all twelve, each found and responsibly disclosed to the OpenSSL team during the fall and winter of 2025. Of those, 10 were assigned CVE-2025 identifiers and 2 received CVE-2026 identifiers. Adding the 10 to the three we already found in the ...

Categories: Software Security

Side-Channel Attacks Against LLMs

Schneier on Security - Tue, 02/17/2026 - 07:01

Here are three papers describing different side-channel attacks against LLMs.

Remote Timing Attacks on Efficient Language Model Inference“:

Abstract: Scaling up language models has significantly increased their capabilities. But larger models are slower models, and so there is now an extensive body of work (e.g., speculative sampling or parallel decoding) that improves the (average case) efficiency of language model generation. But these techniques introduce data-dependent timing characteristics. We show it is possible to exploit these timing differences to mount a timing attack. By monitoring the (encrypted) network traffic between a victim user and a remote language model, we can learn information about the content of messages by noting when responses are faster or slower. With complete black-box access, on open source systems we show how it is possible to learn the topic of a user’s conversation (e.g., medical advice vs. coding assistance) with 90%+ precision, and on production systems like OpenAI’s ChatGPT and Anthropic’s Claude we can distinguish between specific messages or infer the user’s language. We further show that an active adversary can leverage a boosting attack to recover PII placed in messages (e.g., phone numbers or credit card numbers) for open source systems. We conclude with potential defenses and directions for future work...

Categories: Software Security

Redefining automation governance: From execution to observability at Bradesco

Red Hat Security - Mon, 02/16/2026 - 19:00
At Bradesco, one of the largest financial institutions in Brazil and Latin America, the ability to scale is crucial. Automation plays a central role in this journey, and Red Hat Ansible Automation Platform has become the foundation supporting thousands of jobs executed daily across mission-critical environments.As automation expanded across teams, systems, and domains, Bradesco reached a new stage of maturity. Execution at scale was already well established, delivering efficiency and speed. However, operating automation at this level within a highly regulated financial environment introduced a
Categories: Software Security

The Promptware Kill Chain

Schneier on Security - Mon, 02/16/2026 - 07:04

 initial access, privilege escalation, reconnaissance, persistence, command & control, lateral movement, action on objective

Attacks against modern generative artificial intelligence (AI) large language models (LLMs) pose a real threat. Yet discussions around these attacks and their potential defenses are dangerously myopic. The dominant narrative focuses on “prompt injection,” a set of techniques to embed instructions into inputs to LLM intended to perform malicious activity. This term suggests a simple, singular vulnerability. This framing obscures a more complex and dangerous reality. Attacks on LLM-based systems have evolved into a distinct class of malware execution mechanisms, which we term “promptware.” In a ...

Categories: Software Security

Not affected by cross-ns privilege escalation via policy api call

Kubewarden Blog - Sun, 02/15/2026 - 19:00
Why Kubewarden is not affected by CVE-2026-22039 The recent vulnerability CVE-2026-22039 is doing the rounds in the Kubernetes security community, with dramatic titles such as “How an admission controller vulnerability turned Kubernetes namespaces into a security illusion”. You can read about people doubting admission controllers, claiming they have too much power, or they represent too high a value target. In this blogpost, we reassure Kubewarden users that they aren’t affected thanks to our architecture, and explain why.
Categories: Web Assembly

Upcoming Speaking Engagements

Schneier on Security - Sat, 02/14/2026 - 12:04

This is a current list of where and when I am scheduled to speak:

  • I’m speaking at Ontario Tech University in Oshawa, Ontario, Canada, at 2 PM ET on Thursday, February 26, 2026.
  • I’m speaking at the Personal AI Summit in Los Angeles, California, USA, on Thursday, March 5, 2026.
  • I’m speaking at Tech Live: Cybersecurity in New York City, USA, on Wednesday, March 11, 2026.
  • I’m giving the Ross Anderson Lecture at the University of Cambridge’s Churchill College at 5:30 PM GMT on Thursday, March 19, 2026.
  • I’m speaking at RSAC 2026 in San Francisco, California, USA, on Wednesday, March 25, 2026...
Categories: Software Security

Modernizing Prometheus: Native Storage for Composite Types

Prometheus Blog - Fri, 02/13/2026 - 19:00

Over the last year, the Prometheus community has been working hard on several interesting and ambitious changes that previously would have been seen as controversial or not feasible. While there might be little visibility about those from the outside (e.g., it's not an OpenClaw Prometheus plugin, sorry ?), Prometheus developers are, organically, steering Prometheus into a certain, coherent future. Piece by piece, we unexpectedly get closer to goals we never dreamed we would achieve as an open-source project!

This post starts (hopefully!) as a series of blog posts that share a few ambitious shifts that might be exciting to new and existing Prometheus users and developers. In this post, I'd love to focus on the idea of native storage for the composite types which is tidying up a lot of challenges that piled up over time. Make sure to check the provided inlined links on how you can adopt some of those changes early or contribute!

CAUTION: Disclaimer: This post is intended as a fun overview, from my own personal point of view as a Prometheus maintainer. Some of the mentioned changes haven't been (yet) officially approved by the Prometheus Team; some of them were not proved in production.

NOTE: This post was written by humans; AI was used only for cosmetic and grammar fixes.

Classic Representation: Primitive Samples

As you might know, the Prometheus data model (so server, PromQL, protocols) supports gauges, counters, histograms and summaries. OpenMetrics 1.0 extended this with gaugehistogram, info and stateset types.

Impressively, for a long time Prometheus' TSDB storage implementation had an explicitly clean and simple data model. The TSDB allowed the storage and retrieval of string-labelled primitive samples containing only float64 values and int64 timestamps. It was completely metric-type-agnostic.

The metric types were implied on top of the TSDB, for humans and best effort tooling for PromQL. For simplicity, let's call this way of storing types a classic model or representation. In this model:

We have primitive types:

  • gauge is a "default" type with no special rules, just a float sample with labels.

  • counter that should have a _total suffix in the name for humans to understand its semantics.

    foo_total 17.0
    
  • info that needs an _info suffix in the metric name and always has a value of 1.

We have composite types. This is where the fun begins. In the classic representation, composite metrics are represented as a set of primitive float samples:

  • histogram is a group of counters with certain mandatory suffixes and le labels:

    foo_bucket{le="0.0"} 0
    foo_bucket{le="1e-05"} 0
    foo_bucket{le="0.0001"} 5
    foo_bucket{le="0.1"} 8
    foo_bucket{le="1.0"} 10
    foo_bucket{le="10.0"} 11
    foo_bucket{le="100000.0"} 11
    foo_bucket{le="1e+06"} 15
    foo_bucket{le="1e+23"} 16
    foo_bucket{le="1.1e+23"} 17
    foo_bucket{le="+Inf"} 17
    foo_count 17
    foo_sum 324789.3
    
  • gaugehistogram, summary, and stateset types follow the same logic – a group of special gauges or counters that compose a single metric.

The classic model served the Prometheus project well. It significantly simplified the storage implementation, enabling Prometheus to be one of the most optimized, open-source time-series databases, with distributed versions based on the same data model available in projects like Cortex, Thanos, and Mimir, etc.

Unfortunately, there are always tradeoffs. This classic model has a few limitations:

  • Efficiency: It tends to yield overhead for composite types because every new piece of data (e.g., new bucket) takes precious index space (it's a new unique series), whereas samples are significantly more compressible (rarely change, time-oriented).
  • Functionality: It poses limitations to the shape and flexibility of the data you store (unless we'd go into some JSON-encoded labels, which have massive downsides).
  • Transactionality: Primitive pieces of composite types (separate counters) are processed independently. While we did a lot of work to ensure write isolation and transactionality for scrapes, transactionality completely breaks apart when data is received or sent via remote write, OTLP protocols, or, to distributed long-term storage, Prometheus solutions. For example, a foo histogram might have been partially sent, but its foo_bucket{le="1.1e+23"} 17 counter series be delayed or dropped accidentally, which risks triggering false positive alerts or no alerts, depending on the situation.
  • Reliability: Consumers of the TSDB data have to essentially guess the type semantics. There's nothing stopping users from writing a foo_bucket gauge or foo_total histogram.

A Glimpse of Native Storage for Composite Types

The classic model was challenged by the introduction of native histograms. The TSDB was extended to store composite histogram samples other than float. We tend to call this a native histogram, because TSDB can now "natively" store a full (with sparse and exponential buckets) histogram as an atomic, composite sample.

At that point, the common wisdom was to stop there. The special advanced histogram that's generally meant to replace the "classic" histograms uses a composite sample, while the rest of the metrics use the classic model. Making other composite types consistent with the new native model felt extremely disruptive to users, with too much work and risks. A common counter-argument was that users will eventually migrate their classic histograms naturally, and summaries are also less useful, given the more powerful bucketing and lower cost of native histograms.

Unfortunately, the migration to native histograms was known to take time, given the slight PromQL change required to use them, and the new bucketing and client changes needed (applications have to define new or edit existing metrics to new histograms). There will also be old software used for a long time that never is never migrated. Eventually, it leaves Prometheus with no chance of deprecating classic histograms, with all the software solutions required to support the classic model, likely for decades.

However, native histograms did push TSDB and the ecosystem into that new composite sample pattern. Some of those changes could be easily adapted to all composite types. Native histograms also gave us a glimpse of the many benefits of that native support. It was tempting to ask ourselves: would it be possible to add native counterparts of the existing composite metrics to replace them, ideally transparently?

Organically, in 2024, for transactionality and efficiency, we introduced a native histogram custom buckets(NHCB) concept that essentially allows storing classic histograms with explicit buckets natively, reusing native histogram composite sample data structures.

NHCB has proven to be at least 30% more efficient than the classic representation, while offering functional parity with classic histograms. However, two practical challenges emerged that slowed down the adoption:

  1. Expanding, that is converting from NHCB to classic histogram, is relatively trivial, but combining, which is turning a classic histogram into NHCB, is often not feasible. We don't want to wait for client ecosystem adoption, and also being mindful of legacy, hard to change software, we envisioned NHCB being converted (so combined) on scrape from the classic representation. That has proven to be somewhat expensive on scrape. Additionally, combination logic is practically impossible when receiving "pushes" (e.g., remote write with classic histograms), as you could end up having different parts of the same histogram sample (e.g., buckets and count) sent via different remote write shards or sequential messages. This combination challenge is also why OpenTelemetry collector users see an extra overhead on prometheusreceiver as the OpenTelemetry model strictly follows the composite sample model.

  2. Consumption is slightly different, especially in the PromQL query syntax. Our initial decision was to surface NHCB histograms using a native-histogram-like PromQL syntax. For example the following classic histogram:

    foo_bucket{le="0.0"} 0
    # ...
    foo_bucket{le="1.1e+23"} 17
    foo_bucket{le="+Inf"} 17
    foo_count 17
    foo_sum 324789.3
    

    When we convert this to NHCB, you can no longer use foo_bucket as your metric name selector. Since NHCB is now stored as a foo metric, you need to use:

    histogram_quantile(0.9, sum(foo{job="a"}))
    
    # Old syntax: histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
    

    This has also another effect. It violates our "what you see is what you query" rule for the text formats, at least until OpenMetrics 2.

    On top of that, similar problems occur on other Prometheus outputs (federation, remote read, and remote write).

NOTE: Fun fact: Prometheus client data model (SDKs) and PrometheusProto scrape protocol use the composite sample model already!

Transparent Native Representation

Let's get straight to the point. Organically, the Prometheus community seems to align with the following two ideas:

  • We want to eventually move to a fully composite sample model on the storage layer, given all the benefits.
  • Users needs to be able to switch (e.g., on scrape) from classic to native form in storage without breaking consumption layer. Essentially to help with non-trivial migration pains (finding who use what, double-writing, synchronizing), avoiding tricky, dual mode, protocol changes and to deprecate the classic model ASAP for the sustainability of the Prometheus codebase, we need to ensure eventual consumption migration e.g., PromQL queries -- independently to the storage layer.

Let's go through evidence of this direction, which also represents efforts you can contribute to or adopt early!

  1. We are discussing the "native" summary and stateset to fully eliminate classic model for all composite types. Feel free to join and help on that work!

  2. We are working on the OpenMetrics 2.0 to consolidate and improve the pull protocol scene and apply the new learnings. One of the core changes will be the move to composite values in text, which makes the text format trivial to parse for storages that support composite types natively. This solves the combining challenge. Note that, by default, for now, all composite types will be still "expanded" to classic format on scrape, so there's no breaking change for users. Feel free to join our WG to help or give feedback.

  3. Prometheus receive and export protocol has been updated. Remote Write 2.0 allows transporting histograms in the "native" form instead of a classic representation (classic one is still supported). In the future versions (e.g. 2.1), we could easily follow a similar pattern and add native summaries and stateset. Contributions are welcome to make Remote Write 2.0 stable!

  4. We are experimenting with the consumption compatibility modes that translate the composite types store as composite samples to classic representation. This is not trivial; there are edge cases, but it might be more feasible (and needed!) than we might have initially anticipated. See:

    In PromQL it might work as follows, for an NHCB that used to be a classic histogram:

    # New syntax gives our "foo" NHCB:    
    histogram_quantile(0.9, sum(foo{job="a"}))
    # Old syntax still works, expanding "foo" NHCB to classic representation:
    histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
    

    Alternatives, like a special label or annotations, are also discussed.

When implemented, it should be possible to fully switch different parts of your metric collection pipeline to native form transparently.

Summary

Moving Prometheus to a native composite type world is not easy and will take time, especially around coding, testing and optimizing. Notably it switches performance characteristics of the metric load from uniform, predictable sample sizes to a sample size that depends on a type. Another challenge is code architecture - maintaining different sample types has already proven to be very verbose (we need unions, Go!).

However, recent work revealed a very clean and possible path that yields clear benefits around functionality, transactionality, reliability, and efficiency in the relatively near future, which is pretty exciting!

If you have any questions around these changes, feel free to:

  • DM me on Slack.
  • Visit the #prometheus-dev Slack channel and share your questions.
  • Comment on related issues, create PRs, also review PRs (the most impactful work!)

The Prometheus community is also at KubeConEU 2026 in Amsterdam! Make sure to:

I'm hoping we can share stories of other important, orthogonal shifts we see in the community in future posts. No promises (and help welcome!), but there's a lot to cover, such as (random order, not a full list):

  1. Our native start timestamp feature journey that cleanly unblocks native delta temporality without "hacks" like reusing gauges, separate layer of metric types or label annotations e,g., __temporality__.
  2. Optional schematization of Prometheus metrics that attempt to solve a ton of stability problems with metric naming and shape; building on top of OpenTelemetry semconv.
  3. Our metadata storage journey that attempts to improve the OpenTelemetry Entities and resource attributes storage and consumption experience.
  4. Our journey to organize and extend Prometheus scrape pull protocols with the recent ownership move of OpenMetrics.
  5. An incredible TSDB Parquet effort, coming from the three LTS project groups (Cortex, Thanos, Mimir) working together, attempting to improve high-cardinality cases.
  6. Fun experiments with PromQL extensions, like PromQL with pipes and variables and some new SQL transpilation ideas.
  7. Governance changes.

See you in open-source!

Categories: CNCF Projects

Friday Squid Blogging: Do Squid Dream?

Schneier on Security - Fri, 02/13/2026 - 17:08

An exploration of the interesting question.

Categories: Software Security

Zero CVEs: The symptom of a larger problem

Red Hat Security - Thu, 02/12/2026 - 19:00
There has been much discussion lately regarding the "Zero CVE" movement. At Red Hat, we welcome this focus, emphasized by our recent announcement of Project Hummingbird to provide more frequently updated container images. Hummingbird represents a shift in how customers receive Red Hat's open source artifacts: Faster without sacrificing code integrity. You can read more about Project Hummingbird here. While this project is relatively new, it's built on the years of work and lessons learned in modernizing our own internal build system.While the industry often focuses on the result (the image), w
Categories: Software Security

Extend trust across the software supply chain with Red Hat trusted libraries

Red Hat Security - Thu, 02/12/2026 - 19:00
Modern software development runs on open source, and that’s not hyperbole. Python alone pulls in dozens—sometimes hundreds—of third‑party libraries for even the simplest applications. While public repositories have fueled innovation at incredible speed, they’ve also created a new class of risk: Opaque build pipelines, unverifiable provenance, and a growing burden on teams to chase vulnerabilities after the fact.Today marks the tech preview of Red Hat trusted libraries, a new package index designed to bring enterprise-grade trust, transparency, and security posture to application depe
Categories: Software Security

Chasing the holy grail: Why Red Hat’s Hummingbird project aims for "near zero" CVEs

Red Hat Security - Thu, 02/12/2026 - 19:00
In the world of enterprise software security, few metrics are as coveted, or as elusive, as "zero CVEs." Simply put, a zero CVE (Common Vulnerabilities and Exposures) approach aims to deliver software components that are completely free of known security vulnerabilities at the time of shipping. For many organizations, particularly those in highly regulated industries, this is not just a "nice to have," it is a mandate. Initiatives like FedRAMP and various strict security frameworks increasingly demand that software supply chains be clean of known risks before deployment. As the industry has ta
Categories: Software Security

3D Printer Surveillance

Schneier on Security - Thu, 02/12/2026 - 07:01

New York is contemplating a bill that adds surveillance to 3D printers:

New York’s 2026­2027 executive budget bill (S.9005 / A.10005) includes language that should alarm every maker, educator, and small manufacturer in the state. Buried in Part C is a provision requiring all 3D printers sold or delivered in New York to include “blocking technology.” This is defined as software or firmware that scans every print file through a “firearms blueprint detection algorithm” and refuses to print anything it flags as a potential firearm or firearm component...

Categories: Software Security

From challenge to champion: Elevate your vulnerability management strategy

Red Hat Security - Wed, 02/11/2026 - 19:00
In the world of cybersecurity, vulnerability management is frequently a collaborative effort between vendors, software maintainers, and customers. It's a continuous journey of discovery, prioritization, and remediation that we embark on together. Each challenge that we face provides valuable opportunities to refine our strategies and strengthen our collective security posture.Based on our work with customers, we've identified a few common areas where we can all “level up” our vulnerability management game. Let's explore these patterns and recommendations.Beyond the base score: The art of s
Categories: Software Security

Spotlight on SIG Architecture: API Governance

Kubernetes Blog - Wed, 02/11/2026 - 19:00

This is the fifth interview of a SIG Architecture Spotlight series that covers the different subprojects, and we will be covering SIG Architecture: API Governance.

In this SIG Architecture spotlight we talked with Jordan Liggitt, lead of the API Governance sub-project.

Introduction

FM: Hello Jordan, thank you for your availability. Tell us a bit about yourself, your role and how you got involved in Kubernetes.

JL: My name is Jordan Liggitt. I'm a Christian, husband, father of four, software engineer at Google by day, and amateur musician by stealth. I was born in Texas (and still like to claim it as my point of origin), but I've lived in North Carolina for most of my life.

I've been working on Kubernetes since 2014. At that time, I was working on authentication and authorization at Red Hat, and my very first pull request to Kubernetes attempted to add an OAuth server to the Kubernetes API server. It never exited work-in-progress status. I ended up going with a different approach that layered on top of the core Kubernetes API server in a different project (spoiler alert: this is foreshadowing), and I closed it without merging six months later.

Undeterred by that start, I stayed involved, helped build Kubernetes authentication and authorization capabilities, and got involved in the definition and evolution of the core Kubernetes APIs from early beta APIs, like v1beta3 to v1. I got tagged as an API reviewer in 2016 based on those contributions, and was added as an API approver in 2017.

Today, I help lead the API Governance and code organization subprojects for SIG Architecture, and I am a tech lead for SIG Auth.

FM: And when did you get specifically involved in the API Governance project?

JL: Around 2019.

Goals and scope of API Governance

FM: How would you describe the main goals and areas of intervention of the subproject?

The surface area includes all the various APIs Kubernetes has, and there are APIs that people do not always realize are APIs: command-line flags, configuration files, how binaries are run, how they talk to back-end components like the container runtime, and how they persist data. People often think of "the API" as only the REST API... that is the biggest and most obvious one, and the one with the largest audience, but all of these other surfaces are also APIs. Their audiences are narrower, so there is more flexibility there, but they still require consideration.

The goals are to be stable while still enabling innovation. Stability is easy if you never change anything, but that contradicts the goal of evolution and growth. So we balance "be stable" with "allow change".

FM: Speaking of changes, in terms of ensuring consistency and quality (which is clearly one of the reasons this project exists), what are the specific quality gates in the lifecycle of a Kubernetes change? Does API Governance get involved during the release cycle, prior to it through guidelines, or somewhere in between? At what points do you ensure the intended role is fulfilled?

JL: We have guidelines and conventions, both for APIs in general and for how to change an API. These are living documents that we update as we encounter new scenarios. They are long and dense, so we also support them with involvement at either the design stage or the implementation stage.

Sometimes, due to bandwidth constraints, teams move ahead with design work without feedback from API Review. That’s fine, but it means that when implementation begins, the API review will happen then, and there may be substantial feedback. So we get involved when a new API is created or an existing API is changed, either at design or implementation.

FM: Is this during the Kubernetes Enhancement Proposal (KEP) process? Since KEPs are mandatory for enhancements, I assume part of the work intersects with API Governance?

JL: It can. KEPs vary in how detailed they are. Some include literal API definitions. When they do, we can perform an API review at the design stage. Then implementation becomes a matter of checking fidelity to the design.

Getting involved early is ideal. But some KEPs are conceptual and leave details to the implementation. That’s not wrong; it just means the implementation will be more exploratory. Then API Review gets involved later, possibly recommending structural changes.

There’s a trade-off regardless: detailed design upfront versus iterative discovery during implementation. People and teams work differently, and we’re flexible and happy to consult early or at implementation time.

FM: This reminds me of what Fred Brooks wrote in "The Mythical Man-Month" about conceptual integrity being central to product quality... No matter how you structure the process, there must be a point where someone looks at what is coming and ensures conceptual integrity. Kubernetes uses APIs everywhere -- externally and internally -- so API Governance is critical to maintaining that integrity. How is this captured?

JL: Yes, the conventions document captures patterns we’ve learned over time: what to do in various situations. We also have automated linters and checks to ensure correctness around patterns like spec/status semantics. These automated tools help catch issues even when humans miss them.

As new scenarios arise -- and they do constantly -- we think through how to approach them and fold the results back into our documentation and tools. Sometimes it takes a few attempts before we settle on an approach that works well.

FM: Exactly. Each new interaction improves the guidelines.

JL: Right. And sometimes the first approach turns out to be wrong. It may take two or three iterations before we land on something robust.

The impact of Custom Resource Definitions

FM: Is there any particular change, episode, or domain that stands out as especially noteworthy, complex, or interesting in your experience?

JL: The watershed moment was Custom Resources. Prior to that, every API was handcrafted by us and fully reviewed. There were inconsistencies, but we understood and controlled every type and field.

When Custom Resources arrived, anyone could define anything. The first version did not even require a schema. That made it extremely powerful -- it enabled change immediately -- but it left us playing catch-up on stability and consistency.

When Custom Resources graduated to General Availability (GA), schemas became required, but escape hatches still existed for backward compatibility. Since then, we’ve been working on giving CRD authors validation capabilities comparable to built-ins. Built-in validation rules for CRDs have only just reached GA in the last few releases.

So CRDs opened the "anything is possible" era. Built-in validation rules are the second major milestone: bringing consistency back.

The three major themes have been defining schemas, validating data, and handling pre-existing invalid data. With ratcheting validation (allowing data to improve without breaking existing objects), we can now guide CRD authors toward conventions without breaking the world.

API Governance in context

FM: How does API Governance relate to SIG Architecture and API Machinery?

JL: API Machinery provides the actual code and tools that people build APIs on. They don’t review APIs for storage, networking, scheduling, etc.

SIG Architecture sets the overall system direction and works with API Machinery to ensure the system supports that direction. API Governance works with other SIGs building on that foundation to define conventions and patterns, ensuring consistent use of what API Machinery provides.

FM: Thank you. That clarifies the flow. Going back to release cycles: do release phases -- enhancements freeze, code freeze -- change your workload? Or is API Governance mostly continuous?

JL: We get involved in two places: design and implementation. Design involvement increases before enhancements freeze; implementation involvement increases before code freeze. However, many efforts span multiple releases, so there is always some design and implementation happening, even for work targeting future releases. Between those intense periods, we often have time to work on long-term design work.

An anti-pattern we see is teams thinking about a large feature for months and then presenting it three weeks before enhancements freeze, saying, "Here is the design, please review." For big changes with API impact, it’s much better to involve API Governance early.

And there are good times in the cycle for this -- between freezes -- when people have bandwidth. That’s when long-term review work fits best.

Getting involved

FM: Clearly. Now, regarding team dynamics and new contributors: how can someone get involved in API Governance? What should they focus on?

JL: It’s usually best to follow a specific change rather than trying to learn everything at once. Pick a small API change, perhaps one someone else is making or one you want to make, and observe the full process: design, implementation, review.

High-bandwidth review -- live discussion over video -- is often very effective. If you’re making or following a change, ask whether there’s a time to go over the design or PR together. Observing those discussions is extremely instructive.

Start with a small change. Then move to a bigger one. Then maybe a new API. That builds understanding of conventions as they are applied in practice.

FM: Excellent. Any final comments, or anything we missed?

JL: Yes... the reason we care so much about compatibility and stability is for our users. It’s easy for contributors to see those requirements as painful obstacles preventing cleanup or requiring tedious work... but users integrated with our system, and we made a promise to them: we want them to trust that we won’t break that contract. So even when it requires more work, moves slower, or involves duplication, we choose stability.

We are not trying to be obstructive; we are trying to make life good for users.

A lot of our questions focus on the future: you want to do something now... how will you evolve it later without breaking it? We assume we will know more in the future, and we want the design to leave room for that.

We also assume we will make mistakes. The question then is: how do we leave ourselves avenues to improve while keeping compatibility promises?

FM: Exactly. Jordan, thank you, I think we’ve covered everything. This has been an insightful view into the API Governance project and its role in the wider Kubernetes project.

JL: Thank you.

Categories: CNCF Projects, Kubernetes

Pages

Subscribe to articles.innovatingtomorrow.net aggregator