On Wed, Jan 31, 2024, Mingwei Zhang wrote: > On Wed, Jan 31, 2024 at 9:02 AM Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote: > > On 1/31/24 07:43, Sean Christopherson wrote: > > > On Tue, Jan 23, 2024, Mingwei Zhang wrote: > > >> Fix type length error since pmu->fixed_ctr_ctrl is u64 but the local > > >> variable old_fixed_ctr_ctrl is u8. Truncating the value leads to > > >> information loss at runtime. This leads to incorrect value in old_ctrl > > >> retrieved from each field of old_fixed_ctr_ctrl and causes incorrect code > > >> execution within the for loop of reprogram_fixed_counters(). So fix this > > >> type to u64. > > > > > > But what is the actual fallout from this? Stating that the bug causes incorrect > > > code execution isn't helpful, that's akin to saying water is wet. > > > > > > If I'm following the code correctly, the only fallout is that KVM may unnecessarily > > > mark a fixed PMC as in use and reprogram it. I.e. the bug can result in (minor?) > > > performance issues, but it won't cause functional problems. > > > > My this issue cause "Uhhuh. NMI received for unknown reason XX on CPU XX." at VM side? > > > > The PMC is still active while the VM side handle_pmi_common() is not going to handle it? > > hmm, so the new value is '0', but the old value is non-zero, KVM is > supposed to zero out (stop) the fix counter), but it skips it. This > leads to the counter continuously increasing until it overflows, but > guest PMU thought it had disabled it. That's why you got this warning? No, that can't happen, and KVM would have a massive bug if that were the case. The truncation can _only_ cause bits to disappear, it can't magically make bits appear, i.e. the _only_ way this can cause a problem is for KVM to incorrectly think a PMC is being disabled. And FWIW, KVM does do the right thing (well, "right" might be too strong) when a fixed PMC is disabled. KVM will pause the counter in reprogram_counter(), and then leave the perf event paused counter as pmc_event_is_allowed() will return %false due to the PMC being locally disabled. But in this case, _if_ the counter is actually enabled, KVM will simply reprogram the PMC. Reprogramming is unnecessary and wasteful, but it's not broken. Side topic, looking at this code made me realize just how terrible the names pmc_in_use and pmc_speculative_in_use() are. "pmc_in_use" sounds like it tracks which PMCs have perf_events, and at first glance at kvm_pmu_cleanup(), it even _looks_ like that's the case. But kvm_pmu_cleanup() is _skipping_ PMCs that are not "in use". And conversely, there is nothing speculative about checking the local enable bit for a PMC. I'll send patches to rename pmc_in_use to pmc_accessed, and pmc_speculative_in_use() to pmc_is_locally_enabled(). As for this one, unless someone spends the time to prove me wrong, it's destined for 6.9 with a changelog that says the bug is likely benign.