On Sat, Apr 27, 2024 at 5:59 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote: > > > On 4/27/2024 11:04 AM, Mingwei Zhang wrote: > > On Fri, Apr 26, 2024 at 12:46 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >> On Fri, Apr 26, 2024, Kan Liang wrote: > >>>> Optimization 4 > >>>> allows the host side to immediately profiling this part instead of > >>>> waiting for vcpu to reach to PMU context switch locations. Doing so > >>>> will generate more accurate results. > >>> If so, I think the 4 is a must to have. Otherwise, it wouldn't honer the > >>> definition of the exclude_guest. Without 4, it brings some random blind > >>> spots, right? > >> +1, I view it as a hard requirement. It's not an optimization, it's about > >> accuracy and functional correctness. > > Well. Does it have to be a _hard_ requirement? no? The irq handler > > triggered by "perf record -a" could just inject a "state". Instead of > > immediately preempting the guest PMU context, perf subsystem could > > allow KVM defer the context switch when it reaches the next PMU > > context switch location. > > > > This is the same as the preemption kernel logic. Do you want me to > > stop the work immediately? Yes (if you enable preemption), or No, let > > me finish my job and get to the scheduling point. > > > > Implementing this might be more difficult to debug. That's my real > > concern. If we do not enable preemption, the PMU context switch will > > only happen at the 2 pairs of locations. If we enable preemption, it > > could happen at any time. > > IMO I don't prefer to add a switch to enable/disable the preemption. I > think current implementation is already complicated enough and > unnecessary to introduce an new parameter to confuse users. Furthermore, > the switch could introduce an uncertainty and may mislead the perf user > to read the perf stats incorrectly. As for debug, it won't bring any > difference as long as no host event is created. > That's ok. It is about opinions and brainstorming. Adding a parameter to disable preemption is from the cloud usage perspective. The conflict of opinions is which one you prioritize: guest PMU or the host PMU? If you stand on the guest vPMU usage perspective, do you want anyone on the host to shoot a profiling command and generate turbulence? no. If you stand on the host PMU perspective and you want to profile VMM/KVM, you definitely want accuracy and no delay at all. Thanks. -Mingwei > > > > >> What _is_ an optimization is keeping guest state loaded while KVM is in its > >> run loop, i.e. initial mediated/passthrough PMU support could land upstream with > >> unconditional switches at entry/exit. The performance of KVM would likely be > >> unacceptable for any production use cases, but that would give us motivation to > >> finish the job, and it doesn't result in random, hard to diagnose issues for > >> userspace. > > That's true. I agree with that. > > > >>>> Do we want to preempt that? I think it depends. For regular cloud > >>>> usage, we don't. But for any other usages where we want to prioritize > >>>> KVM/VMM profiling over guest vPMU, it is useful. > >>>> > >>>> My current opinion is that optimization 4 is something nice to have. > >>>> But we should allow people to turn it off just like we could choose to > >>>> disable preempt kernel. > >>> The exclude_guest means everything but the guest. I don't see a reason > >>> why people want to turn it off and get some random blind spots.