Hi Yoan, On Fri, 19 Mar 2021 16:17:04 +0000, Yoan Picchi <yoan.picchi@xxxxxxx> wrote: > > Hi all, > > As mentioned in the KVM forum talk from 2019 > (https://kvmforum2019.sched.com/event/Tmwf/kvmstat-and-beyond-past-present-and-future-of-performance-monitoring-christian-borntrager-ibm > page 10), there is few event counters for kvm_stat in the arm64 > version of kvm when you compare it to something like the x86 > version. Crucially, these differences exist because there is no universal equivalence between architecture features. A TLB invalidation on x86 doesn't do the same thing as on PPC nor arm64. > Those counters are used in kvm_stat by kernel/driver developers to > have a rough idea of the impact of their patches on the general performance. > An example would be to make sure a patch don't increase to much the amount > of interruptions. Those patches aim to add more counters to make the use of > kvm_stat more relevant when measuring performance impact. Adding more counters only make sense if the semantic of these counters is clearly defined. In this series, you have sprayed a bunch of x++ in random places, without defining what you were trying to count. > I am new in working on kernel-related things and I am learning kvm as I go. > Some of the counter I added early (memory_slot_unmaped, stage2_unmap_vm) > no longer seems relevant because while they do interesting things, they > happens in very specific scenarios. Instead of just deleting them, I prefer > to ask weither a little-used counter or no counter is the best. It works the other way around: the onus is on you to explain *why* we should even consider a counter or another. May I suggest that you start by picking exactly *one* metric, work out exactly what it is supposed to count, what significance it has at the architectural level, what it actually brings to the user? Then try and make a good job at implementing these semantics. You will learn about the arm64 architecture and KVM in one swift go, one area at a time. > I can also use some suggestion on how to test those counters as some like > remote_tlb_flush which mostly happen when fixing up a race condition; or > what uncovered event could be worth adding in a future patch set. remote_tlb_flush is a great example of something that really *doesn't* make much sense on KVM/arm64. We don't deal with remote TLBs independently of the local ones outside of vcpu migration (which you don't cover either). I can only urge you to focus on the architectural meaning of the metric you picked, and see how it maps across the hypervisor. Finally, there is the question of the general availability of these counters. They live in debugfs, which isn't a proper userspace ABI. There is work going on around making this a more palatable interface, and I'd rather see where this is going before expanding the number of counters. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm