On Wed, Nov 02, 2011 at 12:01:51PM +0200, Avi Kivity wrote: > On 11/01/2011 02:30 PM, Gleb Natapov wrote: > > > > + > > > > +/* mapping between fixed pmc index and arch_events array */ > > > > +int fixed_pmc_events[] = {1, 0, 2}; > > > > + > > > > +static bool pmc_is_gp(struct kvm_pmc *pmc) > > > > +{ > > > > + return pmc->type == KVM_PMC_GP; > > > > +} > > > > + > > > > +static inline u64 pmc_bitmask(struct kvm_pmc *pmc) > > > > +{ > > > > + struct kvm_pmu *pmu = &pmc->vcpu->arch.pmu; > > > > + > > > > + return pmc_is_gp(pmc) ? pmu->gp_counter_bitmask : > > > > + pmu->fixed_counter_bitmask; > > > > +} > > > > > > Nicer to just push the bitmask (or bitwidth) into the counter itself. > > > > > Hmm, is it really nicer to replicate the same information 35 times? > > If it were 35 times, you could do pmu->type->bitmask. But it's just 5 > or 6 times. > It is 35. Perf defines X86_PMC_MAX_GENERIC to be 32 and X86_PMC_MAX_FIXED to be 3. I can do pmu->type->bitmask if you think it is better. > > > > + > > > > +static void kvm_perf_overflow_intr(struct perf_event *perf_event, > > > > + struct perf_sample_data *data, struct pt_regs *regs) > > > > +{ > > > > + struct kvm_pmc *pmc = perf_event->overflow_handler_context; > > > > + struct kvm_pmu *pmu = &pmc->vcpu->arch.pmu; > > > > + if (!__test_and_set_bit(pmc_to_global_idx(pmc), > > > > + (unsigned long *)&pmu->reprogram_pmi)) { > > > > + kvm_perf_overflow(perf_event, data, regs); > > > > + kvm_make_request(KVM_REQ_PMU, pmc->vcpu); > > > > + } > > > > +} > > > > > > Is it safe to use the __ versions here? > > > > > It supposed to run in an NMI context on the same CPU that just ran > > the vcpu so simultaneous access to the same variable from different > > CPUs shouldn't be possible. But if your scenario below can happen then > > that assumption may not hold. The question is if PMI delivery can be > > so skewed as to be delivered long after vmexit (which switches perf msr > > values btw). > > The compiler/runtime is allowed to implement __test_and_set_bit() as > multiple instructions, no? Do we have any similar sequences outside nmi > context? > Yes we do. On handling PMU event during guest entry and during event reprogramming. On x86 __ version is different from non __ version only by lock prefix. It would be pity to use locked functions here though. We need local_ functions for bitops. > > > Do we need to follow kvm_make_request() with kvm_vcpu_kick()? If there > > > is a skew between the overflow and the host PMI, the guest might have > > > executed a HLT. > > Is kvm_vcpu_kick() safe for NMI context? > > No. There is irq_work_queue() for that. Would be good to avoid it if > we know that it's safe to (for example if we have PF_VCPU set). > Checking PF_VCPU will not tell us that vcpu is going to reenter guest mode again. > > > > > > > + > > > > +static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx) > > > > +{ > > > > + unsigned en = en_pmi & 0x3; > > > > + bool pmi = en_pmi & 0x8; > > > > + > > > > + stop_counter(pmc); > > > > + > > > > + if (!en || !pmc_enabled(pmc)) > > > > + return; > > > > + > > > > + reprogram_counter(pmc, PERF_TYPE_HARDWARE, > > > > + arch_events[fixed_pmc_events[idx]].event_type, > > > > + !(en & 0x2), /* exclude user */ > > > > + !(en & 0x1), /* exclude kernel */ > > > > + pmi); > > > > > > Are there no #defines for those constants? > > > > > Nope. perf_event_intel.c open codes them too. > > Okay. > > > > > > > The user can cause this to be very small (even zero). Can this cause an > > > NMI storm? > > > > > If user will set it to zero then attr.sample_period will always be 0 and > > perf will think that the event is non sampling and will use max_period > > instead. For a small value greater than zero how is it different from > > userspace creating an event with sample_period of 1? > > I don't know. Does the kernel survive it? > Need to test, but I do not see anything in the kernel that prevent userspace from setting it to any value. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html