On Mon, Apr 22, 2024 at 11:45 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > > > On 4/23/2024 2:08 PM, maobibo wrote: > > > > > > On 2024/4/23 下午12:23, Mingwei Zhang wrote: > >> On Mon, Apr 22, 2024 at 8:55 PM maobibo <maobibo@xxxxxxxxxxx> wrote: > >>> > >>> > >>> > >>> On 2024/4/23 上午11:13, Mi, Dapeng wrote: > >>>> > >>>> On 4/23/2024 10:53 AM, maobibo wrote: > >>>>> > >>>>> > >>>>> On 2024/4/23 上午10:44, Mi, Dapeng wrote: > >>>>>> > >>>>>> On 4/23/2024 9:01 AM, maobibo wrote: > >>>>>>> > >>>>>>> > >>>>>>> On 2024/4/23 上午1:01, Sean Christopherson wrote: > >>>>>>>> On Mon, Apr 22, 2024, maobibo wrote: > >>>>>>>>> On 2024/4/16 上午6:45, Sean Christopherson wrote: > >>>>>>>>>> On Mon, Apr 15, 2024, Mingwei Zhang wrote: > >>>>>>>>>>> On Mon, Apr 15, 2024 at 10:38 AM Sean Christopherson > >>>>>>>>>>> <seanjc@xxxxxxxxxx> wrote: > >>>>>>>>>>>> One my biggest complaints with the current vPMU code is that > >>>>>>>>>>>> the roles and > >>>>>>>>>>>> responsibilities between KVM and perf are poorly defined, > >>>>>>>>>>>> which > >>>>>>>>>>>> leads to suboptimal > >>>>>>>>>>>> and hard to maintain code. > >>>>>>>>>>>> > >>>>>>>>>>>> Case in point, I'm pretty sure leaving guest values in PMCs > >>>>>>>>>>>> _would_ leak guest > >>>>>>>>>>>> state to userspace processes that have RDPMC permissions, as > >>>>>>>>>>>> the PMCs might not > >>>>>>>>>>>> be dirty from perf's perspective (see > >>>>>>>>>>>> perf_clear_dirty_counters()). > >>>>>>>>>>>> > >>>>>>>>>>>> Blindly clearing PMCs in KVM "solves" that problem, but in > >>>>>>>>>>>> doing so makes the > >>>>>>>>>>>> overall code brittle because it's not clear whether KVM > >>>>>>>>>>>> _needs_ > >>>>>>>>>>>> to clear PMCs, > >>>>>>>>>>>> or if KVM is just being paranoid. > >>>>>>>>>>> > >>>>>>>>>>> So once this rolls out, perf and vPMU are clients directly to > >>>>>>>>>>> PMU HW. > >>>>>>>>>> > >>>>>>>>>> I don't think this is a statement we want to make, as it opens a > >>>>>>>>>> discussion > >>>>>>>>>> that we won't win. Nor do I think it's one we *need* to make. > >>>>>>>>>> KVM doesn't need > >>>>>>>>>> to be on equal footing with perf in terms of owning/managing PMU > >>>>>>>>>> hardware, KVM > >>>>>>>>>> just needs a few APIs to allow faithfully and accurately > >>>>>>>>>> virtualizing a guest PMU. > >>>>>>>>>> > >>>>>>>>>>> Faithful cleaning (blind cleaning) has to be the baseline > >>>>>>>>>>> implementation, until both clients agree to a "deal" between > >>>>>>>>>>> them. > >>>>>>>>>>> Currently, there is no such deal, but I believe we could have > >>>>>>>>>>> one via > >>>>>>>>>>> future discussion. > >>>>>>>>>> > >>>>>>>>>> What I am saying is that there needs to be a "deal" in place > >>>>>>>>>> before this code > >>>>>>>>>> is merged. It doesn't need to be anything fancy, e.g. perf can > >>>>>>>>>> still pave over > >>>>>>>>>> PMCs it doesn't immediately load, as opposed to using > >>>>>>>>>> cpu_hw_events.dirty to lazily > >>>>>>>>>> do the clearing. But perf and KVM need to work together from > >>>>>>>>>> the > >>>>>>>>>> get go, ie. I > >>>>>>>>>> don't want KVM doing something without regard to what perf does, > >>>>>>>>>> and vice versa. > >>>>>>>>>> > >>>>>>>>> There is similar issue on LoongArch vPMU where vm can directly > >>>>>>>>> pmu > >>>>>>>>> hardware > >>>>>>>>> and pmu hw is shard with guest and host. Besides context switch > >>>>>>>>> there are > >>>>>>>>> other places where perf core will access pmu hw, such as tick > >>>>>>>>> timer/hrtimer/ipi function call, and KVM can only intercept > >>>>>>>>> context switch. > >>>>>>>> > >>>>>>>> Two questions: > >>>>>>>> > >>>>>>>> 1) Can KVM prevent the guest from accessing the PMU? > >>>>>>>> > >>>>>>>> 2) If so, KVM can grant partial access to the PMU, or is it all > >>>>>>>> or nothing? > >>>>>>>> > >>>>>>>> If the answer to both questions is "yes", then it sounds like > >>>>>>>> LoongArch *requires* > >>>>>>>> mediated/passthrough support in order to virtualize its PMU. > >>>>>>> > >>>>>>> Hi Sean, > >>>>>>> > >>>>>>> Thank for your quick response. > >>>>>>> > >>>>>>> yes, kvm can prevent guest from accessing the PMU and grant partial > >>>>>>> or all to access to the PMU. Only that if one pmu event is granted > >>>>>>> to VM, host can not access this pmu event again. There must be pmu > >>>>>>> event switch if host want to. > >>>>>> > >>>>>> PMU event is a software entity which won't be shared. did you > >>>>>> mean if > >>>>>> a PMU HW counter is granted to VM, then Host can't access the PMU HW > >>>>>> counter, right? > >>>>> yes, if PMU HW counter/control is granted to VM. The value comes from > >>>>> guest, and is not meaningful for host. Host pmu core does not know > >>>>> that it is granted to VM, host still think that it owns pmu. > >>>> > >>>> That's one issue this patchset tries to solve. Current new mediated > >>>> x86 > >>>> vPMU framework doesn't allow Host or Guest own the PMU HW resource > >>>> simultaneously. Only when there is no !exclude_guest event on host, > >>>> guest is allowed to exclusively own the PMU HW resource. > >>>> > >>>> > >>>>> > >>>>> Just like FPU register, it is shared by VM and host during different > >>>>> time and it is lately switched. But if IPI or timer interrupt uses > >>>>> FPU > >>>>> register on host, there will be the same issue. > >>>> > >>>> I didn't fully get your point. When IPI or timer interrupt reach, a > >>>> VM-exit is triggered to make CPU traps into host first and then the > >>>> host > >>> yes, it is. > >> > >> This is correct. And this is one of the points that we had debated > >> internally whether we should do PMU context switch at vcpu loop > >> boundary or VM Enter/exit boundary. (host-level) timer interrupt can > >> force VM Exit, which I think happens every 4ms or 1ms, depending on > >> configuration. > >> > >> One of the key reasons we currently propose this is because it is the > >> same boundary as the legacy PMU, i.e., it would be simple to propose > >> from the perf subsystem perspective. > >> > >> Performance wise, doing PMU context switch at vcpu boundary would be > >> way better in general. But the downside is that perf sub-system lose > >> the capability to profile majority of the KVM code (functions) when > >> guest PMU is enabled. > >> > >>> > >>>> interrupt handler is called. Or are you complaining the executing > >>>> sequence of switching guest PMU MSRs and these interrupt handler? > >>> In our vPMU implementation, it is ok if vPMU is switched in vm exit > >>> path, however there is problem if vPMU is switched during vcpu thread > >>> sched-out/sched-in path since IPI/timer irq interrupt access pmu > >>> register in host mode. > >> > >> Oh, the IPI/timer irq handler will access PMU registers? I thought > >> only the host-level NMI handler will access the PMU MSRs since PMI is > >> registered under NMI. > >> > >> In that case, you should disable IRQ during vcpu context switch. For > >> NMI, we prevent its handler from accessing the PMU registers. In > >> particular, we use a per-cpu variable to guard that. So, the > >> host-level PMI handler for perf sub-system will check the variable > >> before proceeding. > > > > perf core will access pmu hw in tick timer/hrtimer/ipi function call, > > such as function perf_event_task_tick() is called in tick timer, there > > are event_function_call(event, __perf_event_xxx, &value) in file > > kernel/events/core.c. > > > > https://lore.kernel.org/lkml/20240417065236.500011-1-gaosong@xxxxxxxxxxx/T/#m15aeb79fdc9ce72dd5b374edd6acdcf7a9dafcf4 > > > > Just go through functions (not sure if all), whether > perf_event_task_tick() or the callbacks of event_function_call() would > check the event->state first, if the event is in > PERF_EVENT_STATE_INACTIVE, the PMU HW MSRs would not be touched really. > In this new proposal, all host events with exclude_guest attribute would > be put on PERF_EVENT_STATE_INACTIVE sate if guest own the PMU HW > resource. So I think it's fine. > Is there any event in the host still having PERF_EVENT_STATE_ACTIVE? If so, hmm, it will reach perf_pmu_disable(event->pmu), which will access the global ctrl MSR.