On Thu, Nov 09, 2023, Jim Mattson wrote: > On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Thu, Nov 09, 2023, Jim Mattson wrote: > > > A better solution may be to maintain two bitmaps of general purpose > > > counters that need to be incremented, one for instructions retired and > > > one for branch instructions retired. Set or clear these bits whenever > > > the PerfEvtSelN MSRs are written. I think I would keep the PGC bits > > > separate, on those microarchitectures that support PGC. Then, > > > kvm_pmu_trigger_event() need only consult the appropriate bitmap (or > > > the logical and of that bitmap with PGC). In most cases, the value > > > will be zero, and the function can simply return. > > > > > > This would work even for AMD microarchitectures that don't support PGC. > > > > Yeah. There are multiple lower-hanging fruits to be picked though, most of which > > won't conflict with using dedicated per-event bitmaps, or at worst are trivial > > to resolve. > > > > 1. Don't call into perf to get the eventsel (which generates an indirect call) > > on every invocation, let alone every iteration. > > > > 2. Avoid getting the CPL when it's irrelevant. > > > > 3. Check the eventsel before querying the event filter. > > > > 4. Mask out PMCs that aren't globally enabled from the get-go (masking out > > PMCs based on eventsel would essentially be the same as per-event bitmaps). > > The code below only looks at PGC. Even on CPUs that support PGC, some > PMU clients still use the enable bits in the PerfEvtSelN. Linux perf, > for instance, can't seem to make up its mind whether to use PGC or > PerfEvtSelN.EN. Ugh, as in perf leaves all GLOBAL_CTRL bits set and toggles only the eventsel enable bit? Lame. > > I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them, > > e.g. if we can eke out 99% of the performance just by doing a few obvious > > optimizations. > > > > This is the end result of what I'm testing and will (hopefully) post shortly: > > > > static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel) > > { > > return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); > > } > > The top nybble of AMD's 3-nybble event select collides with Intel's > IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be > in a transaction if KVM is emulating an instruction, but this probably > merits a comment. Argh, more pre-existing crud. This is silly, the vendor mask is already in kvm_pmu_ops.EVENTSEL_EVENT. What's one more patch... > The function name should also be more descriptive, so that it doesn't get > misused out of context. :) Heh, I think I'll just delete the darn thing. That also makes it easier to understand the comment about ignoring certain fields.