On Wed, 2021-06-16 at 11:31 +0100, Marc Zyngier wrote: > > Hi Aman, > > On Wed, 16 Jun 2021 10:17:28 +0100, > Aman Priyadarshi <apeureka@xxxxxxxxx> wrote: > > > > Hi Marc, > > > > On Tue, 2021-06-15 at 18:05 +0100, Marc Zyngier wrote: > > > > > > Can you reproduce the issue with vanilla guest kernels? It'd be > > > interesting to understand what makes it work on the guest side. Can > > > you please bisect it? > > > > > > > yes, I was able to narrow it down to the commit 0cbb058be904 ("arm64: > > perf: > > Disable PMU while processing counter overflows"), which fixes the > > problem > > on the guest side. > > Which is 3cce50dfec4a5b0414c974190940f47dd32c6dee in mainline. This > doesn't seem to have ever been backported before 4.18. So I don't know > why your 4.15 kernel was correctly behaving, but it could be that the > distro had randomly picked up the correct patch! Yes. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836117 > > You may want to backport it to 4.14.y and let Greg know about that. > Ack. > > > > I _think_, I understand the problem now. Please correct me if I am > > wrong. > > > > commit 30d97754b2d1 ("KVM: arm/arm64: Re-create event when setting > > counter > > value") adds a new code path for perf event when counter value is set, > > therefore kvm would generate more events than before. Without this > > change, > > we have a lot less events, thus reducing the chances of guest messing > > things up. > > Without this fix, we don't communicate the new guest sample period to > the host's perf counter, and depending on what the guest wrote (and > the previous value), it can go one way or the other. > > > On the other side, commit 8c3252c06516 ("KVM: arm64: pmu: Reset sample > > period on overflow handling") resets the sample period to the max > > value, > > thus reducing the number of overflow events to guest to an optimal > > value > > (note, number of interrupts actually handled by guest would remain same > > in > > either case). Less number of overflow interrupts to the guest, reduces > > the > > chance of guest making up for any left over overflow event that it did > > not > > see earlier. > > This fix is the natural complement of the previous one. We need to > emulate the actual overflow, and prevent perf from doing its thing on > the host (reloading from the previously provided value). So we reset > the period to the value that perf did observe on taking the physical > interrupt. > > Together, these two patches provide a more correct PMU emulation. > > The guest patch fixes prevents additional overflow being observed due > while the guest is reprogramming its counters and observe a moving > target. Note that the host itself needs that initial fix to correctly > emulate the PMU! ;-) > > It is pretty hard to picture exactly *what* happens when you are > missing any of these 3 patches. Both the kernel and KVM were buggy at > some point, and you need all three patches to ensure something > correct. > Thanks for the explanation! Regards, Aman Priyadarshi Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm