On Fri, Apr 12, 2024, Xiong Y Zhang wrote: > >> 2. NMI watchdog > >> the perf event for NMI watchdog is a system wide cpu pinned event, it > >> will be stopped also during vm running, but it doesn't have > >> attr.exclude_guest=1, we add it in this RFC. But this still means NMI > >> watchdog loses function during VM running. > >> > >> Two candidates exist for replacing perf event of NMI watchdog: > >> a. Buddy hardlock detector[3] may be not reliable to replace perf event. > >> b. HPET-based hardlock detector [4] isn't in the upstream kernel. > > > > I think the simplest solution is to allow mediated PMU usage if and only if > > the NMI watchdog is disabled. Then whether or not the host replaces the NMI > > watchdog with something else becomes an orthogonal discussion, i.e. not KVM's > > problem to solve. > Make sense. KVM should not affect host high priority work. > NMI watchdog is a client of perf and is a system wide perf event, perf can't > distinguish a system wide perf event is NMI watchdog or others, so how about > we extend this suggestion to all the system wide perf events ? mediated PMU > is only allowed when all system wide perf events are disabled or non-exist at > vm creation. What other kernel-driven system wide perf events are there? > but NMI watchdog is usually enabled, this will limit mediated PMU usage. I don't think it is at all unreasonable to require users that want optimal PMU virtualization to adjust their environment. And we can and should document the tradeoffs and alternatives, e.g. so that users that want better PMU results don't need to re-discover all the "gotchas" on their own. This would even be one of the rare times where I would be ok with a dmesg log. E.g. if KVM is loaded with enable_mediated_pmu=true, but there are system wide perf events, pr_warn() to explain the conflict and direct the user at documentation explaining how to make their system compatible with mediate PMU usage. > >> 3. Dedicated kvm_pmi_vector > >> In emulated vPMU, host PMI handler notify KVM to inject a virtual > >> PMI into guest when physical PMI belongs to guest counter. If the > >> same mechanism is used in passthrough vPMU and PMI skid exists > >> which cause physical PMI belonging to guest happens after VM-exit, > >> then the host PMI handler couldn't identify this PMI belongs to > >> host or guest. > >> So this RFC uses a dedicated kvm_pmi_vector, PMI belonging to guest > >> has this vector only. The PMI belonging to host still has an NMI > >> vector. > >> > >> Without considering PMI skid especially for AMD, the host NMI vector > >> could be used for guest PMI also, this method is simpler and doesn't > > > > I don't see how multiplexing NMIs between guest and host is simpler. At best, > > the complexity is a wash, just in different locations, and I highly doubt it's > > a wash. AFAIK, there is no way to precisely know that an NMI came in via the > > LVTPC. > when kvm_intel.pt_mode=PT_MODE_HOST_GUEST, guest PT's PMI is a multiplexing > NMI between guest and host, we could extend guest PT's PMI framework to > mediated PMU. so I think this is simpler. Heh, what do you mean by "this"? Using a dedicated IRQ vector, or extending the PT framework of multiplexing NMI?