On 2024-04-26 11:04 p.m., Mingwei Zhang wrote: > On Fri, Apr 26, 2024 at 12:46 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: >> >> On Fri, Apr 26, 2024, Kan Liang wrote: >>>> Optimization 4 >>>> allows the host side to immediately profiling this part instead of >>>> waiting for vcpu to reach to PMU context switch locations. Doing so >>>> will generate more accurate results. >>> >>> If so, I think the 4 is a must to have. Otherwise, it wouldn't honer the >>> definition of the exclude_guest. Without 4, it brings some random blind >>> spots, right? >> >> +1, I view it as a hard requirement. It's not an optimization, it's about >> accuracy and functional correctness. > > Well. Does it have to be a _hard_ requirement? no? The irq handler > triggered by "perf record -a" could just inject a "state". Instead of > immediately preempting the guest PMU context, perf subsystem could > allow KVM defer the context switch when it reaches the next PMU > context switch location. It depends on what is the upcoming PMU context switch location. If it's the upcoming VM-exit/entry, the defer should be fine. Because it's a exclude_guest event, nothing should be counted when a VM is running. If it's the upcoming vCPU boundary, no. I think there may be several VM-exit/entry before the upcoming vCPU switch. We may lose some results. > > This is the same as the preemption kernel logic. Do you want me to > stop the work immediately? Yes (if you enable preemption), or No, let > me finish my job and get to the scheduling point. I don't think it's necessary. Just make sure that the counters are scheduled in the upcoming VM-exit/entry boundary should be fine. Thanks, Kan > > Implementing this might be more difficult to debug. That's my real > concern. If we do not enable preemption, the PMU context switch will > only happen at the 2 pairs of locations. If we enable preemption, it > could happen at any time. > >> >> What _is_ an optimization is keeping guest state loaded while KVM is in its >> run loop, i.e. initial mediated/passthrough PMU support could land upstream with >> unconditional switches at entry/exit. The performance of KVM would likely be >> unacceptable for any production use cases, but that would give us motivation to >> finish the job, and it doesn't result in random, hard to diagnose issues for >> userspace. > > That's true. I agree with that. > >> >>>> Do we want to preempt that? I think it depends. For regular cloud >>>> usage, we don't. But for any other usages where we want to prioritize >>>> KVM/VMM profiling over guest vPMU, it is useful. >>>> >>>> My current opinion is that optimization 4 is something nice to have. >>>> But we should allow people to turn it off just like we could choose to >>>> disable preempt kernel. >>> >>> The exclude_guest means everything but the guest. I don't see a reason >>> why people want to turn it off and get some random blind spots. >