On Mon, Apr 15, 2024 at 10:38 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Mon, Apr 15, 2024, Mingwei Zhang wrote: > > On Mon, Apr 15, 2024 at 3:04 AM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > > > On 4/15/2024 2:06 PM, Mingwei Zhang wrote: > > > > On Fri, Apr 12, 2024 at 9:25 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > > > >>>> It's necessary to clear the EVENTSELx MSRs for both GP and fixed counters. > > > >>>> Considering this case, Guest uses GP counter 2, but Host doesn't use it. So > > > >>>> if the EVENTSEL2 MSR is not cleared here, the GP counter 2 would be enabled > > > >>>> unexpectedly on host later since Host perf always enable all validate bits > > > >>>> in PERF_GLOBAL_CTRL MSR. That would cause issues. > > > >>>> > > > >>>> Yeah, the clearing for PMCx MSR should be unnecessary . > > > >>>> > > > >>> Why is clearing for PMCx MSR unnecessary? Do we want to leaking counter > > > >>> values to the host? NO. Not in cloud usage. > > > >> No, this place is clearing the guest counter value instead of host > > > >> counter value. Host always has method to see guest value in a normal VM > > > >> if he want. I don't see its necessity, it's just a overkill and > > > >> introduce extra overhead to write MSRs. > > > >> > > > > I am curious how the perf subsystem solves the problem? Does perf > > > > subsystem in the host only scrubbing the selector but not the counter > > > > value when doing the context switch? > > > > > > When context switch happens, perf code would schedule out the old events > > > and schedule in the new events. When scheduling out, the ENABLE bit of > > > EVENTSELx MSR would be cleared, and when scheduling in, the EVENTSELx > > > and PMCx MSRs would be overwritten with new event's attr.config and > > > sample_period separately. Of course, these is only for the case when > > > there are new events to be programmed on the PMC. If no new events, the > > > PMCx MSR would keep stall value and won't be cleared. > > > > > > Anyway, I don't see any reason that PMCx MSR must be cleared. > > > > > > > I don't have a strong opinion on the upstream version. But since both > > the mediated vPMU and perf are clients of PMU HW, leaving PMC values > > uncleared when transition out of the vPMU boundary is leaking info > > technically. > > I'm not objecting to ensuring guest PMCs can't be read by any entity that's not > in the guest's TCB, which is what I would consider a true leak. I'm objecting > to blindly clearing all PMCs, and more specifically objecting to *KVM* clearing > PMCs when saving guest state without coordinating with perf in any way. > > I am ok if we start with (or default to) a "safe" implementation that zeroes all > PMCs when switching to host context, but I want KVM and perf to work together to > do the context switches, e.g. so that we don't end up with code where KVM writes > to all PMC MSRs and that perf also immediately writes to all PMC MSRs. I am fully aligned with you on this. > > One my biggest complaints with the current vPMU code is that the roles and > responsibilities between KVM and perf are poorly defined, which leads to suboptimal > and hard to maintain code. > > Case in point, I'm pretty sure leaving guest values in PMCs _would_ leak guest > state to userspace processes that have RDPMC permissions, as the PMCs might not > be dirty from perf's perspective (see perf_clear_dirty_counters()). > > Blindly clearing PMCs in KVM "solves" that problem, but in doing so makes the > overall code brittle because it's not clear whether KVM _needs_ to clear PMCs, > or if KVM is just being paranoid. So once this rolls out, perf and vPMU are clients directly to PMU HW. Faithful cleaning (blind cleaning) has to be the baseline implementation, until both clients agree to a "deal" between them. Currently, there is no such deal, but I believe we could have one via future discussion. Thanks. -Mingwei