On Wed, Apr 26, 2023, Anselm Busse wrote: > This patch adds a KVM vCPU stat that reflects the number of #AC > exceptions caused by a guest. This improves the identification and > debugging of issues that are possibly caused by guests triggering > split-locks and allows more insides compared to the current situation > of having only a warning printed when an #AC exception is raised. Irrespective of the inaccuracy Xiaoyao pointed out, I don't want to add a one-off stat for _any_ exception. I agree with what Marc said[*] when we (Google / GCP) tried to push our pile o' stats upstream: : Because I'm pretty sure that whatever stat we expose, every cloud : vendor will want their own variant, so we may just as well put the : matter in their own hands. That doesn't mean I don't want a massive pile of stats about all things KVM, quite the opposite, but I don't think they belong in upstream where KVM has to maintain them in perpetuity. E.g. at some point in the (distant) future, split-lock #AC will be completely uninteresting because all software will have been updated/fixed. FWIW, we looked at using eBPF for our out-of-tree stats and ultimately decided that carrying patches to add our stats would be significantly easier to maintain than an eBPF-based approach, e.g. rebasing this patch is trivial. But the challenges we anticipated with switching to eBPF were largely specific to running at scale. eBPF is a very viable approach for gathering information for debug, development, individual users, etc. On idea I had for easing the pain of out-of-tree stats was to clean up KVM x86's tracepoints, e.g. to give eBPF programs more stable and useful hooks, but also to allow CSPs like us to play macro games to "inject" stats at key points, e.g. add infrastructure to #define overload tracepoints to make KVM trampoline through out-of-tree stats code. But we haven't pursued that idea because (a) as above, carrying patches for out-of-tree stats requires minimal effort and (b) it wouldn't eliminate "invasive" code because we'd (GCP) inevitably want stats in places where a KVM tracepoint makes no sense. So as much as I advocate for pushing code upstream, this is one of the few areas where I think it's better to carry code out-of-tree. [*] https://lore.kernel.org/all/875yusv3vm.wl-maz@xxxxxxxxxx