Re: [PATCH 2/2] kvm: Add ioctl for gathering debug counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 23.01.20 13:08, Paolo Bonzini wrote:
On 21/01/20 16:38, Alexander Graf wrote:
ONE_REG would force us to define constants for each counter, and would
make it hard to retire them.  I don't like this.

Why does it make it hard to retire them? We would just return -EINVAL
on retrieval, like we do for any other non-supported ONE_REG.

It's the same as a file not existing in debugfs/statfs. Or an entry
in the array of this patch to disappear.

The devil is in the details.  For example, would you retire uapi/
constants and cause programs to fail compilation?  Or do you keep the
obsolete constants forever?  Also, fixing the mapping from ONE_REG
number to stat would mean a switch statement (or loop of some kind---a
switch statement is basically an unrolled binary search) to access the
stats.  Instead returning the id in KVM_GET_SUPPORTED_DEBUGFS_STAT would
simplify returning the stats to a simple copy_to_user.

If you look at the way RISC-V implemented ONE_REG, I think we can agree that it's possible with constant identifiers as well :). The only downside is of course that you may potentially end up with an identifier array to map from "ONE_REG id" to "offset in vcpu/vm struct". I fail to see how that's worse than the struct kvm_stats_debugfs_item[] we have today.

Of course, some of the complexity would be punted to userspace.  But
userspace is much closer to the humans that ultimately look at the
stats, so the question is: does userspace really care about knowing
which stat is which, or do they just care about having a name that will
ultimately be consumed by humans down the pipe?  If the latter (which is
also my gut feeling), that is also a point against ONE_REG.

It's not a problem of exposing the type information - we have that today
by implicitly saying "every counter is 64bit".

The thing I'm worried about is that we keep inventing these special
purpose interfaces that really do nothing but transfer numbers in one
way or another. ONE_REG's purpose was to unify them. Debug counters
really are the same story.

See above: I am not sure they are the same story because their consumers
might be very different from registers.  Registers are generally
consumed by programs (to migrate VMs, for example) and only occasionally
by humans, while stats are meant to be consumed by humans.  We may
disagree on whether this justifies a completely different API...

I don't fully agree on the "human" part here. At the end of the day, you want stats because you want to act on stats. Ideally, you want to do that fully automatically. Let me give you a few examples:

1) insn_emulation_fail triggers

You may want to feed all the failures into a database to check whether there is something wrong in the emulator.

2) (remote_)tlb_flush beyond certain threshold

If you see that you're constantly flushing remote TLBs, there's a good chance that you found a workload that may need tuning in KVM. You want to gather those stats across your full fleet of hosts, so that for the few occasions when you hit it, you can work with the actual VM owners to potentially improve their performance

3) exits beyond certain threshold

You know roughly how many exits your fleet would usually see, so you can configure an upper threshold on that. When you now have an automated way to notify you when the threshold is exceeded, you can check what that particular guest did to raise so many exits.


... and I'm sure there's room for a lot more potential stats that could be useful to gather to determine the health of a KVM environment, such as a "vcpu steal time" one or a "maximum time between two VMENTERS while the guest was in running state".

All of these should eventually feed into something bigger that collects the numbers across your full VM fleet, so that a human can take actions based on them. However, that means the values are no longer directly impacting a human, they need to feed into machines. And for that, exact, constant identifiers make much more sense :)


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux