+Google folks On Wed, Sep 22, 2021, Paolo Bonzini wrote: > On 22/09/21 13:22, Marc Zyngier wrote: > > Frankly, this is a job for BPF and the tracing subsystem, not for some > > hardcoded syndrome accounting. It would allow to extract meaningful > > information, prevent bloat, and crucially make it optional. Even empty > > trace points like the ones used in the scheduler would be infinitely > > better than this (load your own module that hooks into these trace > > points, expose the data you want, any way you want). > > I agree. I had left out for later the similar series you had for x86, but I > felt the same as Marc; even just counting the number of occurrences of each > exit reason is a nontrivial amount of memory to spend on each vCPU. That depends on the use case, environment, etc... E.g. if the VM is assigned a _minimum_ of 4gb per vCPU, then burning even tens of kilobytes of memory per vCPU is trivial, or at least completely acceptable. I do 100% agree this should be optional, be it through an ioctl(), module/kernel param, Kconfig, whatever. The changelogs are also sorely lacking the motivation for having dedicated stats; we'll do our best to remedy that for future work. Stepping back a bit, this is one piece of the larger issue of how to modernize KVM for hyperscale usage. BPF and tracing are great when the debugger has root access to the machine and can rerun the failing workload at will. They're useless for identifying trends across large numbers of machines, triaging failures after the fact, debugging performance issues with workloads that the debugger doesn't have direct access to, etc... Logging is a similar story, e.g. using _ratelimited() printk to aid debug works well when there are a very limited number of VMs and there is a human that can react to arbitrary kernel messages, but it's basically useless when there are 10s or 100s of VMs and taking action on a kernel message requires a prior knowledge of the message. I'm certainly not expecting other people to solve our challenges, and I fully appreciate that there are many KVM users that don't care at all about scalability, but I'm hoping we can get the community at large, and especially maintainers and reviewers, to also consider at-scale use cases when designing, implementing, reviewing, etc...