Re: [PATCH v9 0/5] KVM statistics data fd-based binary interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/06/21 10:37, Enrico Weigelt, metux IT consult wrote:
* why is it binary instead of text ? is it so very high volume that
   it really matters ?

The main reason to have a binary format is not the high volume actually (though it also has its part). Rather, we would really like to include the schema to make the statistics self-describing. This includes stuff like whether the unit of measure of a statistic is clock cycles, nanoseconds, pages or whatnot; having this kind of information in text leads to awkwardness in the parsers. trace-cmd is another example where the data consists of a schema followed by binary data.

Text format could certainly be added if there's a usecase, but for developer use debugfs is usually a suitable replacement.

Last year we tried the opposite direction: we built a one-value-per-file filesystem with a common API that any subsystem could use (e.g. providing ethtool stats, /proc/interrupts, etc. in addition to KVM stats). We started with text, similar to sysfs, with the plan of extending it to a binary format later. However, other subsystems expressed very little interest in this, so instead we decided to go with something that is designed around KVM needs.

Still, the binary format that KVM uses is designed not to be KVM-specific. If other subsystems want to publish high-volume, self-describing statistic information, they are welcome to share the binary format and the code. Perhaps it may make sense in some cases to have them in sysfs, even (e.g. /sys/kernel/slab/*/.stats). As Greg said sysfs is currently one value per file, but perhaps that could be changed if the binary format is an additional way to access the information and not the only one (not that I'm planning to do it).

* how will possible future extensions of the telemetry packets work ?

The format includes a schema, so it's possible to add more statistics in the future. The exact list of statistics varies per architecture and is not part of the userspace API (obvious caveat: https://xkcd.com/1172/).

* aren't there other means to get this fd instead of an ioctl() on the
   VM fd ? something more from the outside (eg. sysfs/procfs)

Not yet, but if there's a need it can be added. It'd be plausible to publish system-wide statistics via a ioctl on /dev/kvm, for example. We'd have to check how this compares with stuff that is world-readable in procfs and sysfs, but I don't think there are security concerns in exposing that.

There's also pidfd_getfd(2) which can be used to pull a VM file descriptor from another running process. That can be used to avoid the issue of KVM file descriptors being unnamed.

* how will that relate to other hypervisors ?

Other hypervisors do not run as part of the Linux kernel (at least they are not upstream). These statistics only apply to Linux *hosts*, not guests.

As far as I know, there is no standard that Xen or the proprietary hypervisors use to communicate their telemetry info to monitoring tools, and also no standard binary format used by exporters to talk to monitoring tools. If this format will be adopted by other hypervisors or any random software, I will be happy.

Some notes from the operating perspective:

In typical datacenters we've got various monitoring tools that are able
to catch up lots of data from different sources (especially files). If
an operator e.g. is interested in something in happening in some file
(e.g. in /proc of /sys), it's quite trivial - just configure yet another
probe (maybe some regex for parsing) and done. Automatically fed in his
$monitoring_solution (e.g. nagios, ELK, Splunk, whatsnot)

... but in practice what you do is you have prebuilt exporters that talks to $monitoring_solution. Monitoring individual files is the exception, not the rule. But indeed Libvirt already has I/O and network statistics and there is an exporter for Prometheus, so we should add support for this new method as well to both QEMU (exporting the file descriptor) and Libvirt.

I hope this helps clarifying your doubts!

Paolo

With your approach, it's not that simple: now the operator needs to
create (and deploy and manage) a separate agent that somehow receives
that fd from the VMM, reads and parses that specific binary stream
and finally pushes it into the monitoring infrastructure. Or the VMM
writes it into some file, where some monitoring agent can pick it up.
In any case, not actually trivial from ops perspective.




[Index of Archives]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux