Hi Aman, On Wed, 16 Jun 2021 10:17:28 +0100, Aman Priyadarshi <apeureka@xxxxxxxxx> wrote: > > Hi Marc, > > On Tue, 2021-06-15 at 18:05 +0100, Marc Zyngier wrote: > > > > Can you reproduce the issue with vanilla guest kernels? It'd be > > interesting to understand what makes it work on the guest side. Can > > you please bisect it? > > > > yes, I was able to narrow it down to the commit 0cbb058be904 ("arm64: perf: > Disable PMU while processing counter overflows"), which fixes the problem > on the guest side. Which is 3cce50dfec4a5b0414c974190940f47dd32c6dee in mainline. This doesn't seem to have ever been backported before 4.18. So I don't know why your 4.15 kernel was correctly behaving, but it could be that the distro had randomly picked up the correct patch! You may want to backport it to 4.14.y and let Greg know about that. > > I _think_, I understand the problem now. Please correct me if I am wrong. > > commit 30d97754b2d1 ("KVM: arm/arm64: Re-create event when setting counter > value") adds a new code path for perf event when counter value is set, > therefore kvm would generate more events than before. Without this change, > we have a lot less events, thus reducing the chances of guest messing > things up. Without this fix, we don't communicate the new guest sample period to the host's perf counter, and depending on what the guest wrote (and the previous value), it can go one way or the other. > On the other side, commit 8c3252c06516 ("KVM: arm64: pmu: Reset sample > period on overflow handling") resets the sample period to the max value, > thus reducing the number of overflow events to guest to an optimal value > (note, number of interrupts actually handled by guest would remain same in > either case). Less number of overflow interrupts to the guest, reduces the > chance of guest making up for any left over overflow event that it did not > see earlier. This fix is the natural complement of the previous one. We need to emulate the actual overflow, and prevent perf from doing its thing on the host (reloading from the previously provided value). So we reset the period to the value that perf did observe on taking the physical interrupt. Together, these two patches provide a more correct PMU emulation. The guest patch fixes prevents additional overflow being observed due while the guest is reprogramming its counters and observe a moving target. Note that the host itself needs that initial fix to correctly emulate the PMU! ;-) It is pretty hard to picture exactly *what* happens when you are missing any of these 3 patches. Both the kernel and KVM were buggy at some point, and you need all three patches to ensure something correct. Anyway, thanks for having bisected it, and worked out that this was a guest issue! M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm