On Thu, Apr 08, 2021 at 01:39:49PM +0800, Xu, Like wrote: > Hi Peter, > > Thanks for your detailed comments. > > If you have more comments for other patches, please let me know. > > On 2021/4/7 23:39, Peter Zijlstra wrote: > > On Mon, Mar 29, 2021 at 01:41:29PM +0800, Like Xu wrote: > > > @@ -3869,10 +3876,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) > > > if (arr[1].guest) > > > arr[0].guest |= arr[1].guest; > > > - else > > > + else { > > > arr[1].guest = arr[1].host; > > > + arr[2].guest = arr[2].host; > > > + } > > What's all this gibberish? > > > > The way I read that it says: > > > > if guest has PEBS_ENABLED > > guest GLOBAL_CTRL |= PEBS_ENABLED > > otherwise > > guest PEBS_ENABLED = host PEBS_ENABLED > > guest DS_AREA = host DS_AREA > > > > which is just completely random garbage afaict. Why would you leak host > > msrs into the guest? > > In fact, this is not a leak at all. > > When we do "arr[i].guest = arr[i].host;" assignment in the > intel_guest_get_msrs(), the KVM will check "if (msrs[i].host == > msrs[i].guest)" and if so, it disables the atomic switch for this msr > during vmx transaction in the caller atomic_switch_perf_msrs(). Another marvel of bad coding style that function is :-( Lots of missing {} and indentation fail. This is terrible though, why would we clear the guest MSRs when it changes PEBS_ENABLED. The guest had better clear them itself. Removing guest DS_AREA just because we don't have any bits set in PEBS_ENABLED is wrong and could very break all sorts of drivers. > In that case, the msr value doesn't change and any guest write will be > trapped. If the next check is "msrs[i].host != msrs[i].guest", the > atomic switch will be triggered again. > > Compared to before, this part of the logic has not changed, which helps to > reduce overhead. It's unreadable garbage at best. If you don't want it changed, then don't add it to the arr[] thing in the first place. > > Why would you change guest GLOBAL_CTRL implicitly; > > This is because in the early part of this function, we have operations: > > if (x86_pmu.flags & PMU_FL_PEBS_ALL) > arr[0].guest &= ~cpuc->pebs_enabled; > else > arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK); > > and if guest has PEBS_ENABLED, we need these bits back for PEBS counters: > > arr[0].guest |= arr[1].guest; I don't think that's right, who's to say they were set in the first place? The guest's GLOBAL_CTRL could have had the bits cleared at VMEXIT time. You can't unconditionally add PEBS_ENABLED into GLOBAL_CTRL, that's wrong. > > guest had better wrmsr that himself to control when stuff is enabled. > > When vm_entry, the msr value of GLOBAL_CTRL on the hardware may be > different from trapped value "pmu->global_ctrl" written by the guest. > > If the perf scheduler cross maps guest counter X to the host counter Y, > we have to enable the bit Y in GLOBAL_CTRL before vm_entry rather than X. Sure, but I don't see that happening here.