Re: How to monitor domains in regards steal time and other important metrics (VIR_DOMAIN_STATS_VCPU) ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Thu, Dec 21, 2023 at 05:36:15PM +0100, Christian Rohmann via Users wrote:
> Hey libvirt-users,
> first allow me to give a little background.
> We monitor performance metrics of OpenStack Nova VMs using libvirt as
> hypervisor. We used to run the libvirt prometheus exporter written by
> zhangjianweibj [1].
> This exporter, compared to the one from kumina / tinkoff ([2]) makes use of
> the DigitalOcean go-libvirt [3], but that should not make much of a
> difference for my questions.
> Since the development of that exporter seems to have stalled and we wanted
> to rework and contribute new features to it, we created a fork [4].
> After working trough the various ideas we had and applying them to the code,
> we proposed the prometheus-community to adopt the exporter [5] to ensure it
> is maintained
> and to serve as a reference exporter even.
> Now to my actual question ...
> Libvirt exposes per VCPU stats for domains via [6]. I'd like to be able to
> export those via the exporter.
> One important metric to me would be things like the steal time
> (vcpu.<num>.delay), to determine is domains are starting to get cut short or
> even starve
> on cpu time. Apparently those metrics are / cannot be expose anymore since
> the switch to CGroupsV2? Reading [7] or [8] others seem to have run into
> this.


I just tested that upstream libvirt on system with cgroups v2 reports
vcpu.<num>.delay as this stat is not taken from cgroups at all, we use
`/proc` for it.

The stats you are asking for can be obtained using the libvirt API
virConnectGetAllDomainStats [10].

The bugs you mentioned are talking about different stat, it affects
different API virDomainGetCPUStats [11].

> Is this actually still the case, even for more recent kernels? If so, I am
> wondering if there is an issue being tracked to implement this
> functionality?

As far as I know it is still the case there is no replacement for
cpuacct.usage_percpu in cgroups v2, but that should not affect the data
you seem to be consuming from libvirt.

> How is the steal time reported to the guest if the hypervisor is unable to
> export this info?
> Then there are other approaches like vmtop by Digital Ocean [9], which does
> use info and metrics available via /proc to determine steal time and other
> vcpu based metrics.
> So it seems the required data is somewhat available from the kernel?
> Last but not least I'd like your opinion on what other key metrics are
> important to monitoring on hypervisors and their guests?

I would say it depends on multiple factors like usage of the VMs,
workload inside the VMs, on the management application itself and so on.
There are many metrics that can be tracked like cpu, memory, network,
block, vcpu and so on.

If the workload uses mainly CPU the users might not care that much about
block usage and the other way around so I don't think there is a generic
answer to that question.


[10] <>
[11] <>

> [1]
> [2]
> [3]
> [4]
> [5]
> [6]
> [7]
> [8]
> [9]
> _______________________________________________
> Users mailing list -- users@xxxxxxxxxxxxxxxxx
> To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxx

Attachment: signature.asc
Description: PGP signature

Users mailing list -- users@xxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxx

[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux