On 2022-09-15 10:30 p.m., Wang, Wei W wrote: > On Thursday, September 15, 2022 11:43 PM, Liang, Kan wrote: >> On 2022-09-15 10:39 a.m., Wang, Wei W wrote: >>> On Thursday, September 15, 2022 9:55 PM Liang, Kan wrote: >>>> On 2022-09-14 10:46 p.m., Wang, Wei W wrote: >>>>> On Thursday, September 15, 2022 4:26 AM, Liang, Kan wrote: >>>>>> The perf_event_disable() eventually invokes the intel_pt_stop(). >>>>>> We already expose the intel_pt_stop()/cpu_emergency_stop_pt() to >>>>>> other modules. I don't think we have to use the perf_event_disable(). >>>>>> Also, the >>>>>> perf_event_disable() requires extra codes. >>>>>> >>>>>> I went through the discussions. I agree with Sean's suggestion. >>>>>> We should only put the logic in the KVM but all the MSR access >>>>>> details into the PT driver. >>>>> >>>>> Even the driver itself doesn’t drive the save/restore of the MSRs, >>>>> it is drived >>>> by perf. >>>> >>>> It through perf_event, not driven by perf_event. The perf_event >>>> generic code never knows when should invokes each driver to >>>> save/restore information. It should be driven by the other subsystem e.g., >> scheduler. >>> >>> Yes. The cpu scheduler does this via the perf subsystem, though. >>> >>>> >>>> For this case, KVM should drive the save/restore, and the PT driver >>>> eventually does all the MSR access details. >>>> >>>>> 1. If we make KVM a user of perf, we should do this via >>>> perf_event_disable/enable_*. >>>>> 2. If we make KVM an alternative to perf (i.e. have direct control >>>>> over PMU HW), we can do this via driver interfaces like perf. >>>>> Per my experience, we should go for 1. Probably need Peter's >>>>> opinions on >>>> this. >>>>> >>>> >>>> For 1, the perf_event_disable/enable_* are not enough. They don't >>>> save/restore MSRs. >>> >>> perf_event_disable will go through perf to call pt_event_stop which saves >> the related MSRs, right? >> >> I don't think so. The pt_event_stop() doesn't save all the >> MSR_IA32_RTIT_* MSRs. > > Not all the MSRs are required to be saved. In general, pt_event_stop should have > saved all the MSRs required for an event switching. Otherwise the general usages > of PT have been broken. To be more precise, the following MSRs are not saved by > pt_event_stop, but I don’t see they are required to be saved: > > - MSR_IA32_RTIT_CR3_MATCH: I don’t see it is used by perf. > Seems like KVM saved an MSR that's not even used by the host. > > - Address range MSRs (MSR_IA32_RTIT_ADDR0_A etc.): Those are provided by s/w and not updated by h/w. > So they're just set to MSRs when event gets scheduled in. There is no need to save. > OK. I think you need a clean-up patch to fix them first. >> >>> (if so, what large changes did you mean?) >>> >>>> If we go to this way, we have to introduce a new generic interface to >>>> ask each driver to save/restore their MSRs when the guest is >>>> entering/exiting. We'd better combine the new interface with the >>>> existing >>>> perf_guest_get_msrs() of the core driver. >>>> I think that's an ideal solution, but requires big changes in the code. >>>> >>>> 2 is the current KVM implementation. See pt_save_msr()/pt_load_msr(). >>>> I don't think it's a right way. We'd better fix it. >>>> >>>> The suggestion should be 3. The KVM notify the PT driver via the >>>> interface provided by PT. The PT driver save/restore all the registers. >>>> I think it's an acceptable solution with small code changes. >>> >>> This looks like we just relocate the save/restore functions to the PT driver >> and KVM still directly call them - still not going through perf's management. >> Imagine every user operates on the pmu h/w directly like this, things would be >> a mess. >>> >> >> >> The pt_event_stop() and the proposed interface still manipulate the PT event >> pt->handle.event. The event status is updated as well. It's still under control of >> the perf_event. > > Did you mean to handle the PT event in the proposed driver API? Event status is just > one of the things. There are other things if we want to make it complete for this, > e.g. event->oncpu = -1, and eventually seems we will re-implement perf_event_disable_*. > As my understand, perf always check the status first. If it's a stopped or inactivated event, I don't think event->oncpu will be touched. That's why I think the proposed driver API should be acceptable. > Btw, Xiaoyao has made it work with perf_event_disable_local, and don’t have that many changes. > If necessary, we can post the 2nd version out to double check. > I'm not worry about which ways (either perf_event_disable_local() or the proposed PT driver API) are chosen to stop the PT. If the existing perf_event interfaces can meet your requirement, that's perfect. My real concern is the pt_save_msr()/pt_load_msr(). I don't think it's a job for KVM. See atomic_switch_perf_msrs(). It is the perf core driver rather than KVM that tells which MSRs should be saved/restored in VMCS. We should do the same thing for PT. (Actually, I think we already encounter issues with the current KVM-dominated method. KVM saves/restores unnecessary MSRs. Right?) To do so, I think there may be two ways. - Since MSRs have to be switched for both PT and core drivers, it sounds reasonable to provide a new generic interface in the perf_event. The new interface is to tell KVM which MSRs should be saved/restored. Then KVM can decide to save/restore via VMCS or direct MSR access. I suspect this way requires big change, but it will benefit all the drivers which have similar requirements. - The proposed driver API. The MSRs are saved/restored in the PT driver. Thanks, Kan