Feed aggregator
Friday Squid Blogging: Squid Cartoon
I like this one.
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
‘Starkiller’ Phishing Service Proxies Real Login Pages, MFA
Most phishing websites are little more than static copies of login pages for popular online destinations, and they are often quickly taken down by anti-abuse activists and security firms. But a stealthy new phishing-as-a-service offering lets customers sidestep both of these pitfalls: It uses cleverly disguised links to load the target brand’s real website, and then acts as a relay between the target and the legitimate site — forwarding the victim’s username, password and multi-factor authentication (MFA) code to the legitimate site and returning its responses.
There are countless phishing kits that would-be scammers can use to get started, but successfully wielding them requires some modicum of skill in configuring servers, domain names, certificates, proxy services, and other repetitive tech drudgery. Enter Starkiller, a new phishing service that dynamically loads a live copy of the real login page and records everything the user types, proxying the data from the legitimate site back to the victim.
According to an analysis of Starkiller by the security firm Abnormal AI, the service lets customers select a brand to impersonate (e.g., Apple, Facebook, Google, Microsoft et. al.) and generates a deceptive URL that visually mimics the legitimate domain while routing traffic through the attacker’s infrastructure.
For example, a phishing link targeting Microsoft customers appears as “login.microsoft.com@[malicious/shortened URL here].” The “@” sign in the link trick is an oldie but goodie, because everything before the “@” in a URL is considered username data, and the real landing page is what comes after the “@” sign. Here’s what it looks like in the target’s browser:

Image: Abnormal AI. The actual malicious landing page is blurred out in this picture, but we can see it ends in .ru. The service also offers the ability to insert links from different URL-shortening services.
Once Starkiller customers select the URL to be phished, the service spins up a Docker container running a headless Chrome browser instance that loads the real login page, Abnormal found.
“The container then acts as a man-in-the-middle reverse proxy, forwarding the end user’s inputs to the legitimate site and returning the site’s responses,” Abnormal researchers Callie Baron and Piotr Wojtyla wrote in a blog post on Thursday. “Every keystroke, form submission, and session token passes through attacker-controlled infrastructure and is logged along the way.”
Starkiller in effect offers cybercriminals real-time session monitoring, allowing them to live-stream the target’s screen as they interact with the phishing page, the researchers said.
“The platform also includes keylogger capture for every keystroke, cookie and session token theft for direct account takeover, geo-tracking of targets, and automated Telegram alerts when new credentials come in,” they wrote. “Campaign analytics round out the operator experience with visit counts, conversion rates, and performance graphs—the same kind of metrics dashboard a legitimate SaaS [software-as-a-service] platform would offer.”
Abnormal said the service also deftly intercepts and relays the victim’s MFA credentials, since the recipient who clicks the link is actually authenticating with the real site through a proxy, and any authentication tokens submitted are then forwarded to the legitimate service in real time.
“The attacker captures the resulting session cookies and tokens, giving them authenticated access to the account,” the researchers wrote. “When attackers relay the entire authentication flow in real time, MFA protections can be effectively neutralized despite functioning exactly as designed.”

The “URL Masker” feature of the Starkiller phishing service features options for configuring the malicious link. Image: Abnormal.
Starkiller is just one of several cybercrime services offered by a threat group calling itself Jinkusu, which maintains an active user forum where customers can discuss techniques, request features and troubleshoot deployments. One a-la-carte feature will harvest email addresses and contact information from compromised sessions, and advises the data can be used to build target lists for follow-on phishing campaigns.
This service strikes me as a remarkable evolution in phishing, and its apparent success is likely to be copied by other enterprising cybercriminals (assuming the service performs as well as it claims). After all, phishing users this way avoids the upfront costs and constant hassles associated with juggling multiple phishing domains, and it throws a wrench in traditional phishing detection methods like domain blocklisting and static page analysis.
It also massively lowers the barrier to entry for novice cybercriminals, Abnormal researchers observed.
“Starkiller represents a significant escalation in phishing infrastructure, reflecting a broader trend toward commoditized, enterprise-style cybercrime tooling,” their report concludes. “Combined with URL masking, session hijacking, and MFA bypass, it gives low-skill cybercriminals access to attack capabilities that were previously out of reach.”
Ring Cancels Its Partnership with Flock
It’s a demonstration of how toxic the surveillance-tech company Flock has become when Amazon’s Ring cancels the partnership between the two companies.
As Hamilton Nolan advises, remove your Ring doorbell.
Malicious AI
Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.
Multi-CDN: A Critical Decision for a Resilient Architecture
Announcing Kyverno 1.17!
Kyverno 1.17 is a landmark release that marks the stabilization of our next-generation Common Expression Language (CEL) policy engine.
While 1.16 introduced the “CEL-first” vision in beta, 1.17 promotes these capabilities to v1, offering a high-performance, future-proof path for policy as code.
This release focuses on “completing the circle” for CEL policies by introducing namespaced mutation and generation, expanding the available function libraries for complex logic, and enhancing supply chain security with upcoming Cosign v3 support.
A new look for kyverno.io
The first thing you’ll notice with 1.17 is our completely redesigned website. We’ve moved beyond a simple documentation site to create a modern, high-performance portal for platform engineers.Let’s be honest: the Kyverno website redesign was long overdue. As the project evolved into the industry standard for unified policy as code, our documentation needs to reflect that maturity. We are proud to finally unveil the new experience at https://kyverno.io.
- Modern redesign
Built on the Starlight framework, the new site is faster, fully responsive, and features a clean, professional aesthetic that makes long-form reading much easier on the eyes. - Enhanced documentation structure
We’ve reorganized our docs from the ground up. Information is now tiered by “User Journey”—from a simplified Quick Start for beginners to deep-dive Reference material for advanced policy authors. - Fully redesigned policy catalog
Our library of 300+ sample policies has a new interface. It features improved filtering and a dedicated search that allows you to find policies by Category (Best Practices, Security, etc.) or Type (CEL vs. JMESPath) instantly. - Enhanced search capabilities
We’ve integrated a more intelligent search engine that indexes both documentation and policy code, ensuring you get the right answer on the first try. - Brand new blog
The Kyverno blog has been refreshed to better showcase technical deep dives, community case studies, and release announcements like this one!
Namespaced mutating and generating policies
In 1.16, we introduced namespaced variants for validation, cleanup, and image verification.
Kyverno 1.17 completes this by adding:
- NamespacedMutatingPolicy
- NamespacedGeneratingPolicy
This enables true multi-tenancy. Namespace owners can now define their own mutation and generation logic (e.g., automatically injecting sidecars or creating default ConfigMaps) without requiring cluster-wide permissions or affecting other tenants.
CEL policy types reach v1 (GA)
The headline for 1.17 is the promotion of CEL-based policy types to v1. This signifies that the API is now stable and production-ready.
The promotion includes:
- ValidatingPolicy and NamespacedValidatingPolicy
- MutatingPolicy and NamespacedMutatingPolicy
- GeneratingPolicy and NamespacedGeneratingPolicy
- ImageValidatingPolicy and NamespacedImageValidatingPolicy
- DeletingPolicy and NamespacedDeletingPolicy
- PolicyException
With this graduation, platform teams can confidently migrate from JMESPath-based policies to CEL to take advantage of significantly improved evaluation performance and better alignment with upstream Kubernetes ValidatingAdmissionPolicies / MutatingAdmissionPolicies.
New CEL capabilities and functions
To ensure CEL policies are as powerful as the original Kyverno engine, 1.17 introduces several new function libraries:
- Hash Functions
Built-in support for md5(value), sha1(value), and sha256(value) hashing. - Math Functions
Use math.round(value, precision) to round numbers to a specific decimal or integer precision. - X509 Decoding
Policies can now inspect and validate the contents of x509 certificates directly within a CEL expression using x509.decode(pem). - Random String Generation
Generate random strings with random() (default pattern) or random(pattern) for custom regex-based patterns. - Transform Utilities
Use listObjToMap(list1, list2, keyField, valueField) to merge two object lists into a map. - JSON Parsing
Parse JSON strings into structured data with json.unmarshal(jsonString). - YAML Parsing
Parse YAML strings into structured data with yaml.parse(yamlString). - Time-based Logic
New time.now(), time.truncate(timestamp, duration), and time.toCron(timestamp) functions allow for time-since or “maintenance window” style policies.
The deprecation of legacy APIs
As Kyverno matures and aligns more closely with upstream Kubernetes standards, we are making the strategic shift to a CEL-first architecture. This means that the legacy Policy and ClusterPolicy types (which served the community for years using JMESPath) are now entering their sunset phase.
The deprecation schedule
Kyverno 1.17 officially marks ClusterPolicy and CleanupPolicy as Deprecated. While they remain functional in this release, the clock has started on their removal to make way for the more performant, standardized CEL-based engines.
ReleaseDate (estimated)Statusv1.17Jan 2026Marked for deprecationv1.18Apr 2026Critical fixes onlyv1.19Jul 2026Critical fixes onlyv1.20Oct 2026Planned for removalWhy the change?
By standardizing on the Common Expression Language (CEL), Kyverno significantly improves its performance and aligns with the native validation logic used by the Kubernetes API server itself.
For platform teams, this means one less language to learn and a more predictable and scalable policy-as-code experience.
Note for authors
From this point forward, we strongly recommend that every new policy you write be based on the new CEL APIs. Choosing the legacy APIs for new work today simply adds to your migration workload later this year.
Migration tips
We understand that many of you have hundreds of existing policies. To ensure a smooth transition, we have provided comprehensive resources:
- The Migration Guide
Our new Migration to CEL Guide provides a side-by-side mapping of legacy ClusterPolicy fields to their new equivalents (e.g., mapping validate.pattern to ValidatingPolicy expressions). - New Policy Types
You can now begin moving your rules into specialized types like ValidatingPolicy, MutatingPolicy, and GeneratingPolicy. You can see the full breakdown of these new v1 APIs in the Policy Types Overview.
Enhanced supply chain security
Supply chain security remains a core pillar of Kyverno.
- Cosign v3 Support
1.17 adds support for the latest Cosign features, ensuring your image verification remains compatible with the evolving Sigstore ecosystem. - Expanded Attestation Parsing
New capabilities to deserialize YAML and JSON strings within CEL policies make it easier to verify complex metadata and SBOMs.
Observability and reporting upgrades
We have refined how Kyverno communicates policy results:
- Granular Reporting Control
A new –allowedResults flag allows you to filter which results (e.g., only “Fail”) are stored in reports, significantly reducing ETCD pressure in large clusters. - Enhanced Metrics
More detailed latency and execution metrics for CEL policies are now included by default to help you monitor the “hidden” cost of policy enforcement.
For developers and integrators
To support the broader ecosystem and make it easier to build integrations, we have decoupled our core components:
- New API Repository
Our CEL-based APIs now live in a dedicated repository: kyverno/api. This makes it significantly lighter to import Kyverno types into your own Go projects. - Kyverno SDK
For developers building custom controllers or tools that interact with Kyverno, the SDK project is now housed at kyverno/sdk.
Getting started and backward compatibility
Upgrading from 1.16 is straightforward. However, since the CEL policy types have moved to v1, we recommend updating your manifests to the new API version. Kyverno will continue to support v1beta1 for a transition period.
helm repo update
helm upgrade --install kyverno kyverno/kyverno -n kyverno --version 3.7.0
Looking ahead: The Kyverno roadmap
As we move past the 1.17 milestone, our focus shifts toward long-term sustainability and the “Kyverno Platform” experience. Our goal is to ensure that Kyverno remains the most user-friendly and performant governance tool in the cloud-native ecosystem.
- Growing the community
We are doubling down on our commitment to the community. Expect more frequent office hours, improved contributor onboarding, and a renewed focus on making the Kyverno community the most welcoming space in CNCF. - A unified tooling experience
Over the years, we’ve built several powerful sub-projects (like the CLI, Policy Reporter, and Kyverno-Authz). A major goal on our roadmap is to unify these tools into a cohesive experience, reducing fragmentation and making it easier to manage the entire policy lifecycle from a single vantage point. - Performance and scalability guardrails
As clusters grow, performance becomes paramount. We are shifting our focus toward rigorous automated performance testing and will be providing more granular metrics regarding throughput and latency. We want to give platform engineers the data they need to understand exactly what Kyverno can handle in high-scale production environments. - Continuous UX improvement
The website redesign was just the first step. We will continue to iterate on our user interfaces, documentation, and error messaging to ensure that Kyverno remains “Simplified” by design, not just in name.
Conclusion
Kyverno 1.17 is the most robust version yet, blending the flexibility of our original engine with the performance and standardization of CEL.
But this release is about more than just code—it’s about the total user experience. Whether you’re browsing the new policy catalog or scaling thousands of CEL-based rules, we hope this release makes your Kubernetes journey smoother.
A massive thank you to our contributors for making this release (and the new website!) a reality.
AI Found Twelve New Vulnerabilities in OpenSSL
The title of the post is”What AI Security Research Looks Like When It Works,” and I agree:
In the latest OpenSSL security release> on January 27, 2026, twelve new zero-day vulnerabilities (meaning unknown to the maintainers at time of disclosure) were announced. Our AI system is responsible for the original discovery of all twelve, each found and responsibly disclosed to the OpenSSL team during the fall and winter of 2025. Of those, 10 were assigned CVE-2025 identifiers and 2 received CVE-2026 identifiers. Adding the 10 to the three we already found in the ...
Side-Channel Attacks Against LLMs
Here are three papers describing different side-channel attacks against LLMs.
“Remote Timing Attacks on Efficient Language Model Inference“:
Abstract: Scaling up language models has significantly increased their capabilities. But larger models are slower models, and so there is now an extensive body of work (e.g., speculative sampling or parallel decoding) that improves the (average case) efficiency of language model generation. But these techniques introduce data-dependent timing characteristics. We show it is possible to exploit these timing differences to mount a timing attack. By monitoring the (encrypted) network traffic between a victim user and a remote language model, we can learn information about the content of messages by noting when responses are faster or slower. With complete black-box access, on open source systems we show how it is possible to learn the topic of a user’s conversation (e.g., medical advice vs. coding assistance) with 90%+ precision, and on production systems like OpenAI’s ChatGPT and Anthropic’s Claude we can distinguish between specific messages or infer the user’s language. We further show that an active adversary can leverage a boosting attack to recover PII placed in messages (e.g., phone numbers or credit card numbers) for open source systems. We conclude with potential defenses and directions for future work...
Redefining automation governance: From execution to observability at Bradesco
The Promptware Kill Chain
Attacks against modern generative artificial intelligence (AI) large language models (LLMs) pose a real threat. Yet discussions around these attacks and their potential defenses are dangerously myopic. The dominant narrative focuses on “prompt injection,” a set of techniques to embed instructions into inputs to LLM intended to perform malicious activity. This term suggests a simple, singular vulnerability. This framing obscures a more complex and dangerous reality. Attacks on LLM-based systems have evolved into a distinct class of malware execution mechanisms, which we term “promptware.” In a ...
Not affected by cross-ns privilege escalation via policy api call
Upcoming Speaking Engagements
This is a current list of where and when I am scheduled to speak:
- I’m speaking at Ontario Tech University in Oshawa, Ontario, Canada, at 2 PM ET on Thursday, February 26, 2026.
- I’m speaking at the Personal AI Summit in Los Angeles, California, USA, on Thursday, March 5, 2026.
- I’m speaking at Tech Live: Cybersecurity in New York City, USA, on Wednesday, March 11, 2026.
- I’m giving the Ross Anderson Lecture at the University of Cambridge’s Churchill College at 5:30 PM GMT on Thursday, March 19, 2026.
- I’m speaking at RSAC 2026 in San Francisco, California, USA, on Wednesday, March 25, 2026...
Modernizing Prometheus: Native Storage for Composite Types
Over the last year, the Prometheus community has been working hard on several interesting and ambitious changes that previously would have been seen as controversial or not feasible. While there might be little visibility about those from the outside (e.g., it's not an OpenClaw Prometheus plugin, sorry ?), Prometheus developers are, organically, steering Prometheus into a certain, coherent future. Piece by piece, we unexpectedly get closer to goals we never dreamed we would achieve as an open-source project!
This post starts (hopefully!) as a series of blog posts that share a few ambitious shifts that might be exciting to new and existing Prometheus users and developers. In this post, I'd love to focus on the idea of native storage for the composite types which is tidying up a lot of challenges that piled up over time. Make sure to check the provided inlined links on how you can adopt some of those changes early or contribute!
CAUTION: Disclaimer: This post is intended as a fun overview, from my own personal point of view as a Prometheus maintainer. Some of the mentioned changes haven't been (yet) officially approved by the Prometheus Team; some of them were not proved in production.
NOTE: This post was written by humans; AI was used only for cosmetic and grammar fixes.
Classic Representation: Primitive Samples
As you might know, the Prometheus data model (so server, PromQL, protocols) supports gauges, counters, histograms and summaries. OpenMetrics 1.0 extended this with gaugehistogram, info and stateset types.
Impressively, for a long time Prometheus' TSDB storage implementation had an explicitly clean and simple data model. The TSDB allowed the storage and retrieval of string-labelled primitive samples containing only float64 values and int64 timestamps. It was completely metric-type-agnostic.
The metric types were implied on top of the TSDB, for humans and best effort tooling for PromQL. For simplicity, let's call this way of storing types a classic model or representation. In this model:
We have primitive types:
-
gaugeis a "default" type with no special rules, just a float sample with labels. -
counterthat should have a_totalsuffix in the name for humans to understand its semantics.foo_total 17.0 -
infothat needs an_infosuffix in the metric name and always has a value of1.
We have composite types. This is where the fun begins. In the classic representation, composite metrics are represented as a set of primitive float samples:
-
histogramis a group ofcounterswith certain mandatory suffixes andlelabels:foo_bucket{le="0.0"} 0 foo_bucket{le="1e-05"} 0 foo_bucket{le="0.0001"} 5 foo_bucket{le="0.1"} 8 foo_bucket{le="1.0"} 10 foo_bucket{le="10.0"} 11 foo_bucket{le="100000.0"} 11 foo_bucket{le="1e+06"} 15 foo_bucket{le="1e+23"} 16 foo_bucket{le="1.1e+23"} 17 foo_bucket{le="+Inf"} 17 foo_count 17 foo_sum 324789.3 -
gaugehistogram,summary, andstatesettypes follow the same logic – a group of specialgaugesorcountersthat compose a single metric.
The classic model served the Prometheus project well. It significantly simplified the storage implementation, enabling Prometheus to be one of the most optimized, open-source time-series databases, with distributed versions based on the same data model available in projects like Cortex, Thanos, and Mimir, etc.
Unfortunately, there are always tradeoffs. This classic model has a few limitations:
- Efficiency: It tends to yield overhead for composite types because every new piece of data (e.g., new bucket) takes precious index space (it's a new unique series), whereas samples are significantly more compressible (rarely change, time-oriented).
- Functionality: It poses limitations to the shape and flexibility of the data you store (unless we'd go into some JSON-encoded labels, which have massive downsides).
- Transactionality: Primitive pieces of composite types (separate counters) are processed independently. While we did a lot of work to ensure write isolation and transactionality for scrapes, transactionality completely breaks apart when data is received or sent via remote write, OTLP protocols, or, to distributed long-term storage, Prometheus solutions. For example, a
foohistogram might have been partially sent, but itsfoo_bucket{le="1.1e+23"} 17counter series be delayed or dropped accidentally, which risks triggering false positive alerts or no alerts, depending on the situation. - Reliability: Consumers of the TSDB data have to essentially guess the type semantics. There's nothing stopping users from writing a
foo_bucketgauge orfoo_totalhistogram.
A Glimpse of Native Storage for Composite Types
The classic model was challenged by the introduction of native histograms. The TSDB was extended to store composite histogram samples other than float. We tend to call this a native histogram, because TSDB can now "natively" store a full (with sparse and exponential buckets) histogram as an atomic, composite sample.
At that point, the common wisdom was to stop there. The special advanced histogram that's generally meant to replace the "classic" histograms uses a composite sample, while the rest of the metrics use the classic model. Making other composite types consistent with the new native model felt extremely disruptive to users, with too much work and risks. A common counter-argument was that users will eventually migrate their classic histograms naturally, and summaries are also less useful, given the more powerful bucketing and lower cost of native histograms.
Unfortunately, the migration to native histograms was known to take time, given the slight PromQL change required to use them, and the new bucketing and client changes needed (applications have to define new or edit existing metrics to new histograms). There will also be old software used for a long time that never is never migrated. Eventually, it leaves Prometheus with no chance of deprecating classic histograms, with all the software solutions required to support the classic model, likely for decades.
However, native histograms did push TSDB and the ecosystem into that new composite sample pattern. Some of those changes could be easily adapted to all composite types. Native histograms also gave us a glimpse of the many benefits of that native support. It was tempting to ask ourselves: would it be possible to add native counterparts of the existing composite metrics to replace them, ideally transparently?
Organically, in 2024, for transactionality and efficiency, we introduced a native histogram custom buckets(NHCB) concept that essentially allows storing classic histograms with explicit buckets natively, reusing native histogram composite sample data structures.
NHCB has proven to be at least 30% more efficient than the classic representation, while offering functional parity with classic histograms. However, two practical challenges emerged that slowed down the adoption:
-
Expanding, that is converting from NHCB to classic histogram, is relatively trivial, but combining, which is turning a classic histogram into NHCB, is often not feasible. We don't want to wait for client ecosystem adoption, and also being mindful of legacy, hard to change software, we envisioned NHCB being converted (so combined) on scrape from the classic representation. That has proven to be somewhat expensive on scrape. Additionally, combination logic is practically impossible when receiving "pushes" (e.g., remote write with classic histograms), as you could end up having different parts of the same histogram sample (e.g., buckets and count) sent via different remote write shards or sequential messages. This combination challenge is also why OpenTelemetry collector users see an extra overhead on
prometheusreceiveras the OpenTelemetry model strictly follows the composite sample model. -
Consumption is slightly different, especially in the PromQL query syntax. Our initial decision was to surface NHCB histograms using a native-histogram-like PromQL syntax. For example the following classic histogram:
foo_bucket{le="0.0"} 0 # ... foo_bucket{le="1.1e+23"} 17 foo_bucket{le="+Inf"} 17 foo_count 17 foo_sum 324789.3When we convert this to NHCB, you can no longer use
foo_bucketas your metric name selector. Since NHCB is now stored as afoometric, you need to use:histogram_quantile(0.9, sum(foo{job="a"})) # Old syntax: histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))This has also another effect. It violates our "what you see is what you query" rule for the text formats, at least until OpenMetrics 2.
On top of that, similar problems occur on other Prometheus outputs (federation, remote read, and remote write).
NOTE: Fun fact: Prometheus client data model (SDKs) and
PrometheusProtoscrape protocol use the composite sample model already!
Transparent Native Representation
Let's get straight to the point. Organically, the Prometheus community seems to align with the following two ideas:
- We want to eventually move to a fully composite sample model on the storage layer, given all the benefits.
- Users needs to be able to switch (e.g., on scrape) from classic to native form in storage without breaking consumption layer. Essentially to help with non-trivial migration pains (finding who use what, double-writing, synchronizing), avoiding tricky, dual mode, protocol changes and to deprecate the classic model ASAP for the sustainability of the Prometheus codebase, we need to ensure eventual consumption migration e.g., PromQL queries -- independently to the storage layer.
Let's go through evidence of this direction, which also represents efforts you can contribute to or adopt early!
-
We are discussing the "native" summary and stateset to fully eliminate classic model for all composite types. Feel free to join and help on that work!
-
We are working on the OpenMetrics 2.0 to consolidate and improve the pull protocol scene and apply the new learnings. One of the core changes will be the move to composite values in text, which makes the text format trivial to parse for storages that support composite types natively. This solves the combining challenge. Note that, by default, for now, all composite types will be still "expanded" to classic format on scrape, so there's no breaking change for users. Feel free to join our WG to help or give feedback.
-
Prometheus receive and export protocol has been updated. Remote Write 2.0 allows transporting histograms in the "native" form instead of a classic representation (classic one is still supported). In the future versions (e.g. 2.1), we could easily follow a similar pattern and add native summaries and stateset. Contributions are welcome to make Remote Write 2.0 stable!
-
We are experimenting with the consumption compatibility modes that translate the composite types store as composite samples to classic representation. This is not trivial; there are edge cases, but it might be more feasible (and needed!) than we might have initially anticipated. See:
- PromQL compatibility mode for NHCB
- Expanding on remote write
- We need to also consider adding expanding for federation, remote read and other APIs.
In PromQL it might work as follows, for an NHCB that used to be a classic histogram:
# New syntax gives our "foo" NHCB: histogram_quantile(0.9, sum(foo{job="a"})) # Old syntax still works, expanding "foo" NHCB to classic representation: histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))Alternatives, like a special label or annotations, are also discussed.
When implemented, it should be possible to fully switch different parts of your metric collection pipeline to native form transparently.
Summary
Moving Prometheus to a native composite type world is not easy and will take time, especially around coding, testing and optimizing. Notably it switches performance characteristics of the metric load from uniform, predictable sample sizes to a sample size that depends on a type. Another challenge is code architecture - maintaining different sample types has already proven to be very verbose (we need unions, Go!).
However, recent work revealed a very clean and possible path that yields clear benefits around functionality, transactionality, reliability, and efficiency in the relatively near future, which is pretty exciting!
If you have any questions around these changes, feel free to:
- DM me on Slack.
- Visit the
#prometheus-devSlack channel and share your questions. - Comment on related issues, create PRs, also review PRs (the most impactful work!)
The Prometheus community is also at KubeConEU 2026 in Amsterdam! Make sure to:
- Visit our Prometheus KubeCon booth.
- Attend our contributing workshop on Wednesday, March 25, 2026 16:00.
- Attend our Prometheus V3 One Year In: OpenMetrics 2.0 and More! session on Thursday, March 26, 13:45.
I'm hoping we can share stories of other important, orthogonal shifts we see in the community in future posts. No promises (and help welcome!), but there's a lot to cover, such as (random order, not a full list):
- Our native start timestamp feature journey that cleanly unblocks native delta temporality without "hacks" like reusing gauges, separate layer of metric types or label annotations e,g.,
__temporality__. - Optional schematization of Prometheus metrics that attempt to solve a ton of stability problems with metric naming and shape; building on top of OpenTelemetry semconv.
- Our metadata storage journey that attempts to improve the OpenTelemetry Entities and resource attributes storage and consumption experience.
- Our journey to organize and extend Prometheus scrape pull protocols with the recent ownership move of OpenMetrics.
- An incredible TSDB Parquet effort, coming from the three LTS project groups (Cortex, Thanos, Mimir) working together, attempting to improve high-cardinality cases.
- Fun experiments with PromQL extensions, like PromQL with pipes and variables and some new SQL transpilation ideas.
- Governance changes.
See you in open-source!
Friday Squid Blogging: Do Squid Dream?
An exploration of the interesting question.
Zero CVEs: The symptom of a larger problem
Extend trust across the software supply chain with Red Hat trusted libraries
Chasing the holy grail: Why Red Hat’s Hummingbird project aims for "near zero" CVEs
3D Printer Surveillance
New York is contemplating a bill that adds surveillance to 3D printers:
New York’s 20262027 executive budget bill (S.9005 / A.10005) includes language that should alarm every maker, educator, and small manufacturer in the state. Buried in Part C is a provision requiring all 3D printers sold or delivered in New York to include “blocking technology.” This is defined as software or firmware that scans every print file through a “firearms blueprint detection algorithm” and refuses to print anything it flags as a potential firearm or firearm component...
From challenge to champion: Elevate your vulnerability management strategy
Spotlight on SIG Architecture: API Governance
This is the fifth interview of a SIG Architecture Spotlight series that covers the different subprojects, and we will be covering SIG Architecture: API Governance.
In this SIG Architecture spotlight we talked with Jordan Liggitt, lead of the API Governance sub-project.
Introduction
FM: Hello Jordan, thank you for your availability. Tell us a bit about yourself, your role and how you got involved in Kubernetes.
JL: My name is Jordan Liggitt. I'm a Christian, husband, father of four, software engineer at Google by day, and amateur musician by stealth. I was born in Texas (and still like to claim it as my point of origin), but I've lived in North Carolina for most of my life.
I've been working on Kubernetes since 2014. At that time, I was working on authentication and authorization at Red Hat, and my very first pull request to Kubernetes attempted to add an OAuth server to the Kubernetes API server. It never exited work-in-progress status. I ended up going with a different approach that layered on top of the core Kubernetes API server in a different project (spoiler alert: this is foreshadowing), and I closed it without merging six months later.
Undeterred by that start, I stayed involved, helped build Kubernetes authentication and
authorization capabilities, and got involved in the definition and evolution of the core Kubernetes
APIs from early beta APIs, like v1beta3 to v1. I got tagged as an API reviewer in 2016 based on
those contributions, and was added as an API approver in 2017.
Today, I help lead the API Governance and code organization subprojects for SIG Architecture, and I am a tech lead for SIG Auth.
FM: And when did you get specifically involved in the API Governance project?
JL: Around 2019.
Goals and scope of API Governance
FM: How would you describe the main goals and areas of intervention of the subproject?
The surface area includes all the various APIs Kubernetes has, and there are APIs that people do not always realize are APIs: command-line flags, configuration files, how binaries are run, how they talk to back-end components like the container runtime, and how they persist data. People often think of "the API" as only the REST API... that is the biggest and most obvious one, and the one with the largest audience, but all of these other surfaces are also APIs. Their audiences are narrower, so there is more flexibility there, but they still require consideration.
The goals are to be stable while still enabling innovation. Stability is easy if you never change anything, but that contradicts the goal of evolution and growth. So we balance "be stable" with "allow change".
FM: Speaking of changes, in terms of ensuring consistency and quality (which is clearly one of the reasons this project exists), what are the specific quality gates in the lifecycle of a Kubernetes change? Does API Governance get involved during the release cycle, prior to it through guidelines, or somewhere in between? At what points do you ensure the intended role is fulfilled?
JL: We have guidelines and conventions, both for APIs in general and for how to change an API. These are living documents that we update as we encounter new scenarios. They are long and dense, so we also support them with involvement at either the design stage or the implementation stage.
Sometimes, due to bandwidth constraints, teams move ahead with design work without feedback from API Review. That’s fine, but it means that when implementation begins, the API review will happen then, and there may be substantial feedback. So we get involved when a new API is created or an existing API is changed, either at design or implementation.
FM: Is this during the Kubernetes Enhancement Proposal (KEP) process? Since KEPs are mandatory for enhancements, I assume part of the work intersects with API Governance?
JL: It can. KEPs vary in how detailed they are. Some include literal API definitions. When they do, we can perform an API review at the design stage. Then implementation becomes a matter of checking fidelity to the design.
Getting involved early is ideal. But some KEPs are conceptual and leave details to the implementation. That’s not wrong; it just means the implementation will be more exploratory. Then API Review gets involved later, possibly recommending structural changes.
There’s a trade-off regardless: detailed design upfront versus iterative discovery during implementation. People and teams work differently, and we’re flexible and happy to consult early or at implementation time.
FM: This reminds me of what Fred Brooks wrote in "The Mythical Man-Month" about conceptual integrity being central to product quality... No matter how you structure the process, there must be a point where someone looks at what is coming and ensures conceptual integrity. Kubernetes uses APIs everywhere -- externally and internally -- so API Governance is critical to maintaining that integrity. How is this captured?
JL: Yes, the conventions document captures patterns we’ve learned over time: what to do in various situations. We also have automated linters and checks to ensure correctness around patterns like spec/status semantics. These automated tools help catch issues even when humans miss them.
As new scenarios arise -- and they do constantly -- we think through how to approach them and fold the results back into our documentation and tools. Sometimes it takes a few attempts before we settle on an approach that works well.
FM: Exactly. Each new interaction improves the guidelines.
JL: Right. And sometimes the first approach turns out to be wrong. It may take two or three iterations before we land on something robust.
The impact of Custom Resource Definitions
FM: Is there any particular change, episode, or domain that stands out as especially noteworthy, complex, or interesting in your experience?
JL: The watershed moment was Custom Resources. Prior to that, every API was handcrafted by us and fully reviewed. There were inconsistencies, but we understood and controlled every type and field.
When Custom Resources arrived, anyone could define anything. The first version did not even require a schema. That made it extremely powerful -- it enabled change immediately -- but it left us playing catch-up on stability and consistency.
When Custom Resources graduated to General Availability (GA), schemas became required, but escape hatches still existed for backward compatibility. Since then, we’ve been working on giving CRD authors validation capabilities comparable to built-ins. Built-in validation rules for CRDs have only just reached GA in the last few releases.
So CRDs opened the "anything is possible" era. Built-in validation rules are the second major milestone: bringing consistency back.
The three major themes have been defining schemas, validating data, and handling pre-existing invalid data. With ratcheting validation (allowing data to improve without breaking existing objects), we can now guide CRD authors toward conventions without breaking the world.
API Governance in context
FM: How does API Governance relate to SIG Architecture and API Machinery?
JL: API Machinery provides the actual code and tools that people build APIs on. They don’t review APIs for storage, networking, scheduling, etc.
SIG Architecture sets the overall system direction and works with API Machinery to ensure the system supports that direction. API Governance works with other SIGs building on that foundation to define conventions and patterns, ensuring consistent use of what API Machinery provides.
FM: Thank you. That clarifies the flow. Going back to release cycles: do release phases -- enhancements freeze, code freeze -- change your workload? Or is API Governance mostly continuous?
JL: We get involved in two places: design and implementation. Design involvement increases before enhancements freeze; implementation involvement increases before code freeze. However, many efforts span multiple releases, so there is always some design and implementation happening, even for work targeting future releases. Between those intense periods, we often have time to work on long-term design work.
An anti-pattern we see is teams thinking about a large feature for months and then presenting it three weeks before enhancements freeze, saying, "Here is the design, please review." For big changes with API impact, it’s much better to involve API Governance early.
And there are good times in the cycle for this -- between freezes -- when people have bandwidth. That’s when long-term review work fits best.
Getting involved
FM: Clearly. Now, regarding team dynamics and new contributors: how can someone get involved in API Governance? What should they focus on?
JL: It’s usually best to follow a specific change rather than trying to learn everything at once. Pick a small API change, perhaps one someone else is making or one you want to make, and observe the full process: design, implementation, review.
High-bandwidth review -- live discussion over video -- is often very effective. If you’re making or following a change, ask whether there’s a time to go over the design or PR together. Observing those discussions is extremely instructive.
Start with a small change. Then move to a bigger one. Then maybe a new API. That builds understanding of conventions as they are applied in practice.
FM: Excellent. Any final comments, or anything we missed?
JL: Yes... the reason we care so much about compatibility and stability is for our users. It’s easy for contributors to see those requirements as painful obstacles preventing cleanup or requiring tedious work... but users integrated with our system, and we made a promise to them: we want them to trust that we won’t break that contract. So even when it requires more work, moves slower, or involves duplication, we choose stability.
We are not trying to be obstructive; we are trying to make life good for users.
A lot of our questions focus on the future: you want to do something now... how will you evolve it later without breaking it? We assume we will know more in the future, and we want the design to leave room for that.
We also assume we will make mistakes. The question then is: how do we leave ourselves avenues to improve while keeping compatibility promises?
FM: Exactly. Jordan, thank you, I think we’ve covered everything. This has been an insightful view into the API Governance project and its role in the wider Kubernetes project.
JL: Thank you.
