Re: KVM: arm64: pmu: Reset sample period on overflow handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Aman,

On Wed, 16 Jun 2021 10:17:28 +0100,
Aman Priyadarshi <apeureka@xxxxxxxxx> wrote:
> 
> Hi Marc,
> 
> On Tue, 2021-06-15 at 18:05 +0100, Marc Zyngier wrote:
> > 
> > Can you reproduce the issue with vanilla guest kernels? It'd be
> > interesting to understand what makes it work on the guest side. Can
> > you please bisect it?
> > 
> 
> yes, I was able to narrow it down to the commit 0cbb058be904 ("arm64: perf:
> Disable PMU while processing counter overflows"), which fixes the problem
> on the guest side.

Which is 3cce50dfec4a5b0414c974190940f47dd32c6dee in mainline. This
doesn't seem to have ever been backported before 4.18. So I don't know
why your 4.15 kernel was correctly behaving, but it could be that the
distro had randomly picked up the correct patch!

You may want to backport it to 4.14.y and let Greg know about that.

> 
> I _think_, I understand the problem now. Please correct me if I am wrong.
> 
> commit 30d97754b2d1 ("KVM: arm/arm64: Re-create event when setting counter
> value") adds a new code path for perf event when counter value is set,
> therefore kvm would generate more events than before. Without this change,
> we have a lot less events, thus reducing the chances of guest messing
> things up.

Without this fix, we don't communicate the new guest sample period to
the host's perf counter, and depending on what the guest wrote (and
the previous value), it can go one way or the other.

> On the other side, commit 8c3252c06516 ("KVM: arm64: pmu: Reset sample
> period on overflow handling") resets the sample period to the max value,
> thus reducing the number of overflow events to guest to an optimal value
> (note, number of interrupts actually handled by guest would remain same in
> either case). Less number of overflow interrupts to the guest, reduces the
> chance of guest making up for any left over overflow event that it did not
> see earlier.

This fix is the natural complement of the previous one. We need to
emulate the actual overflow, and prevent perf from doing its thing on
the host (reloading from the previously provided value). So we reset
the period to the value that perf did observe on taking the physical
interrupt.

Together, these two patches provide a more correct PMU emulation.

The guest patch fixes prevents additional overflow being observed due
while the guest is reprogramming its counters and observe a moving
target. Note that the host itself needs that initial fix to correctly
emulate the PMU! ;-)

It is pretty hard to picture exactly *what* happens when you are
missing any of these 3 patches. Both the kernel and KVM were buggy at
some point, and you need all three patches to ensure something
correct.

Anyway, thanks for having bisected it, and worked out that this was a
guest issue!

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux