You are here

Prometheus Blog

Subscribe to Prometheus Blog feed
The Prometheus blog
Updated: 10 hours 53 min ago

Ask the Prometheus docs with Kapa.ai

Tue, 04/21/2026 - 20:00

Prometheus documentation now includes a new Kapa.ai integration. This is available as part of the partnership between CNCF and Kapa.ai, which helps CNCF projects make their documentation and knowledge more accessible.

You can now use the Ask AI entry on prometheus.io to ask questions in natural language and get answers grounded in Prometheus documentation. For the Prometheus team, it is also a useful way to understand what people are trying to learn from the docs and where the docs still need work.

The Ask AI option is available directly from the docs search box:

Prometheus docs search field showing the Ask AI option

How this helps users

This makes the docs easier to use in a few different ways. You can ask full questions instead of guessing the exact search keywords, and you can describe a problem in your own words even if you do not know the Prometheus terminology yet.

It can also be helpful if English is not your first language, since you can often ask in your preferred language instead of translating your question into English keywords first. And because the answers are grounded in the docs, you also get links back to the relevant pages to keep exploring.

Try it on prometheus.io

The next time you are reading the Prometheus docs, open search and click Ask AI.

Once you ask a question, Kapa responds with an answer grounded in the Prometheus docs and links back to the relevant documentation:

Prometheus Docs AI answering a question about installing Prometheus with links to the docs

Why we are adding it

For the Prometheus team, this is not only a way to answer questions faster. It is also a feedback loop for improving the docs.

Kapa shows us what people ask and how confidently those questions can be answered from the existing documentation. That helps us identify missing topics, unclear explanations, and places where the right content exists but is still hard to find.

Looking at these questions over time gives us a practical way to spot recurring themes and prioritize documentation improvements:

Kapa question insights showing user questions, confidence levels, and topic tags

If Kapa gives you a useful answer, great. If it does not, that also helps us improve the docs.

Ask something simple. Ask something specific. Ask something you think should already be obvious from the docs.

From now on, asking questions is also a great way of helping the Prometheus community!

NOTE: Conversations using the Kapa integration are recorded and anonym-ised. For more information, please read https://www.kapa.ai/security

Categories: CNCF Projects

Introducing the UX Research Working Group

Tue, 04/07/2026 - 20:00

Prometheus has always prioritized solving complex technical challenges to deliver a reliable, performant open-source monitoring system. Over time, however, users have expressed a variety of experience-related pain points. Those pain points range from onboarding and configuration to documentation, mental models, and interoperability across the ecosystem.

At PromCon 2025, a user research study was presented that highlighted several of these issues. Although the central area of investigation involved Prometheus and OpenTelemetry workflows, the broader takeaway was clear: Prometheus would benefit from a dedicated, ongoing effort to understand user needs and improve the overall user experience.

Recognizing this, the Prometheus team established a Working Group focused on improving user experience through design and user research. This group is meant to support all areas of Prometheus by bringing structured research, user insights, and usability perspectives into the community's development and decision-making processes.

How we can help Prometheus maintainers

Building something where the user needs are unclear? Maybe you're looking at two competing solutions and you'd like to understand the user tradeoffs alongside the technical ones.

That's where we can be of help.

The UX Working Group will partner with you to conduct user research or provide feedback on your plans for user outreach. That could include:

  • User research reports and summaries
  • User journeys, personas, wireframes, prototypes, and other UX artifacts
  • Recommendations for improving usability, onboarding, interoperability, and documentation
  • Prioritized lists of user pain points
  • Suggestions for community discussions or decision-making topics

To get started, tell us what you're trying to do, and we'll work with you to determine what type and scope of research is most appropriate.

How we can help Prometheus end users

We want to hear from you! Let us know if you're interested in participating in a research study and we'll contact you when we're working on one that's a good fit. Having an issue with the Prometheus user experience? We can help you open an issue and direct it to the appropriate community members.

Interested in helping?

New contributors to the working group are always welcome! Get in touch and let us know what you'd like to work on.

Where to find us

Drop us a message in Slack, join a meeting, or raise an issue in GitHub.

Categories: CNCF Projects

Uncached I/O in Prometheus

Wed, 03/04/2026 - 19:00

Do you find yourself constantly looking up the difference between container_memory_usage_bytes, container_memory_working_set_bytes, and container_memory_rss? Pick the wrong one and your memory limits lie to you, your benchmarks mislead you, and your container gets OOMKilled.

You're not alone. There is even a 9-year-old Kubernetes issue that captures the frustration of users.

The explanation is simple: RAM is not used in just one way. One of the easiest things to miss is the page cache semantics. For some containers, memory taken by page caching can make up most of the reported usage, even though that memory is largely reclaimable, creating surprising differences between those metrics.

NOTE: The feature discussed here currently only supports Linux.

Prometheus writes a lot of data to disk. It is, after all, a database. But not every write benefits from sitting in the page cache. Compaction writes are the clearest example: once a block is written, only a fraction of that data is likely to be queried again soon, and since there is no way to predict which fraction, caching it all offers little return. The use-uncached-io feature flag was built to address exactly this.

Bypassing the cache for those writes reduces Prometheus's page cache footprint, making its memory usage more predictable and easier to reason about. It also relieves pressure on that shared cache, lowering the risk of evicting hot data that queries and other reads actually depend on. A potential bonus is reduced CPU overhead from cache allocations and evictions. The hard constraint throughout was to avoid any measurable regression in CPU or disk I/O.

The flag was introduced in Prometheus v3.5.0 and currently only supports Linux. Under the hood, it uses direct I/O, which requires proper filesystem support and a kernel v2.4.10 or newer, though you should be fine, as that version shipped nearly 25 years ago.

If direct I/O helps here, why was it not done earlier, and why is it not used everywhere it would help? Because direct I/O comes with strict alignment requirements. Unlike buffered I/O, you cannot simply write any chunk of memory to any position in a file. The file offset, the memory buffer address, and the transfer size must all be aligned to the logical sector size of the underlying storage device, typically 512 or 4096 bytes.

To satisfy those constraints, a bufio.Writer-like writer, directIOWriter, was implemented. On Linux kernels v6.1 or newer, Prometheus retrieves the exact alignment values via statx; on older kernels, conservative defaults are used.

The directIOWriter currently covers chunk writes during compaction only, but that alone accounts for a substantial portion of Prometheus's I/O. The results are tangible: benchmarks show a 20–50% reduction in page cache usage, as measured by container_memory_cache.

benchmark1

benchmark2

The work is not done yet, and contributions are welcome. Here are a few areas that could help move the feature closer to General Availability:

Covering more write paths

Direct I/O is currently limited to chunk writes during compaction. Index files and WAL writes are natural next candidates, although they would require some additional work.

Building more confidence around directIOWriter

All existing TSDB tests can be run against the directIOWriter using a dedicated build tag: go test --tags=forcedirectio ./tsdb/. More tests covering edge cases for the writer itself would be welcome, and there is even an idea of formally verifying that it never violates alignment requirements.

Experimenting with RWF_DONTCACHE

Introduced in Linux kernel v6.14, RWF_DONTCACHE enables uncached buffered I/O, where data still goes through the page cache but the corresponding pages are dropped afterwards. It would be worth benchmarking whether this delivers similar benefits without direct I/O's alignment constraints.

Support beyond Linux

Support is currently Linux-only. Contributions to extend it to other operating systems are welcome.

For more details, see the proposal and the PR that introduced the feature.

Categories: CNCF Projects

Introducing the UX Research Working Group

Tue, 03/03/2026 - 19:00

Prometheus has always prioritized solving complex technical challenges to deliver a reliable, performant open-source monitoring system. Over time, however, users have expressed a variety of experience-related pain points. Those pain points range from onboarding and configuration to documentation, mental models, and interoperability across the ecosystem.

At PromCon 2025, a user research study was presented that highlighted several of these issues. Although the central area of investigation involved Prometheus and OpenTelemetry workflows, the broader takeaway was clear: Prometheus would benefit from a dedicated, ongoing effort to understand user needs and improve the overall user experience.

Recognizing this, the Prometheus team established a Working Group focused on improving user experience through design and user research. This group is meant to support all areas of Prometheus by bringing structured research, user insights, and usability perspectives into the community's development and decision-making processes.

How we can help Prometheus maintainers

Building something where the user needs are unclear? Maybe you're looking at two competing solutions and you'd like to understand the user tradeoffs alongside the technical ones.

That's where we can be of help.

The UX Working Group will partner with you to conduct user research or provide feedback on your plans for user outreach. That could include:

  • User research reports and summaries
  • User journeys, personas, wireframes, prototypes, and other UX artifacts
  • Recommendations for improving usability, onboarding, interoperability, and documentation
  • Prioritized lists of user pain points
  • Suggestions for community discussions or decision-making topics

To get started, tell us what you're trying to do, and we'll work with you to determine what type and scope of research is most appropriate.

How we can help Prometheus end users

We want to hear from you! Let us know if you're interested in participating in a research study and we'll contact you when we're working on one that's a good fit. Having an issue with the Prometheus user experience? We can help you open an issue and direct it to the appropriate community members.

Interested in helping?

New contributors to the working group are always welcome! Get in touch and let us know what you'd like to work on.

Where to find us

Drop us a message in Slack, join a meeting, or raise an issue in GitHub.

Categories: CNCF Projects