On Wed, Apr 17, 2024 at 12:35 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats > > > once they're added, and KVM needs to maintain the exact behavior. > > > > Stats are not ABI---why would they be? > > Because they exposed through an ioctl(), and userspace can and will use stats for > functional purposes? Maybe I just had the wrong takeaway from an old thread about > adding a big pile of stats[1], where unfortunately (for me) you weighed in on > whether or not tracepoints are ABI, but not stats. I think we can agree that: - you don't want hundreds of stats (Marc's point) - a large part of the stats are very stable, but just as many (while useful) depend on details which are very much implementation dependent - a subset of stats is pretty close to being ABI (e.g. guest_mode), but others can change meaning depending on processor model, guest configuration and/or guest type (e.g. all of them affect interrupt-related stats due to APICv inhibits). While there are exceptions, the main consumer of stats (but indeed not the only one) is intended to be the user, not a program. This is the same as tracepoints, and it's why the introspection bits exist. (User-friendliness also means that bitmask stats are "ouch"; I guess we could add support for bit-sized boolean stats is things get out of control). For many stats, using them for functional purposes would be wrong/dumb. You have to be ready for the exact behavior to change even if the stats remain the same. If userspace doesn't, it's being dumb. KVM can't be blocked from supporting new features just because they "break" stats, and shouldn't be blocked from adding useful debugging stats just because userspace could be dumb. For example, the point of pf_fast is mostly to compare it with other pf_* stats and see if there's something smelly going on.pf_fast used to affect pretty much only dirty bits; nowadays it also affects accessed bits on !ept_ad machines and it does not affect dirty bits if you have PML. So in the past it was possible to use pf_fast as a proxy for the number of dirty pages, for example during migration. That usage doesn't work anymore. Tough luck. Perhaps you could use the halt-polling stats to toggle DISABLE_EXITS for VMs that consume a lot of time polling. That would be a more plausible use of stats to drive heuristics, but again, the power to look into low-level details of KVM and guest behavior comes with the accompanying responsibility. It is _not_ guaranteed to keep working in the same way as processors come out and optimizations are added to KVM. So you would have to look at intentional stats breakages one by one, and this is again a lot like tracepoints. And many potential breakages would go unnoticed anyway, because there's an infinite supply of bad ideas when it comes to stats and heuristics. > That said, I'm definitely not opposed to stats _not_ being ABI, because that would > give us a ton of flexibility. E.g. we have a non-trivial number of internal stats > that are super useful _for us_, but are rather heavy and might not be desirable > for most environments. If stats aren't considered ABI, then I'm pretty sure we > could land some of the more generally useful ones upstream, but off-by-default > and guarded by a Kconfig. E.g. we have a pile of stats related to mmu_lock that > are very helpful in identifying performance issues, but they aren't things I would > want enabled by default. That would be great indeed. > > Not everything makes a good stat but, if in doubt and it's cheap > > enough to collect it, go ahead and add it. > > Marc raised the (IMO) valid concern that "if it's cheap, add it" will lead to > death by a thousand cuts. E.g. add a few hundred vCPU stats and suddenly vCPUs > consumes an extra KiB or three of memory. > > A few APIC stats obviously aren't going to move the needle much, I'm just pointing > out that not everyone agrees that KVM should be hyper permissive when it comes to > adding new stats. Yeah, that's why I made it conditional to "if in doubt". "Stats are not ABI" is not a free pass to add anything, also because the truth is closer than "Stats are generally not ABI but keeping them stable is a good idea". Many more stats are obviously bad to have upstream, than there are good ones; and when adding stats it makes sense to consider their stability but without making it an absolute criterion for inclusion. So for this patch, I would weigh advantages to be substantial: + APICv inhibits at this point are relatively stable + the performance impact is large enough that APICv/AVIC stats _can_ be useful, both boolean and cumulative ones; so for example I'd add an interrupt_injections stat for unaccelerated injections causing a vmexit or otherwise hitting lapic.c. But absolutely would not go with a raw bitmask because: - the exact set of inhibits is subject to change - super high detail into niche APICv inhibits is unlikely to be useful - many if not most inhibits are trivially derived from VM configuration Paolo