Hi Marc, On 1/19/20 6:58 PM, Marc Zyngier wrote: > On Thu, 5 Dec 2019 20:01:42 +0100 > Auger Eric <eric.auger@xxxxxxxxxx> wrote: > > Hi Eric, > >> Hi Marc, >> >> On 12/5/19 3:52 PM, Marc Zyngier wrote: >>> On 2019-12-05 14:06, Auger Eric wrote: >>>> Hi Marc, >>>> >>>> On 12/5/19 10:43 AM, Marc Zyngier wrote: >>>>> Hi Eric, >>>>> >>>>> On 2019-12-04 20:44, Eric Auger wrote: >>>>>> At the moment a SW_INCR counter always overflows on 32-bit >>>>>> boundary, independently on whether the n+1th counter is >>>>>> programmed as CHAIN. >>>>>> >>>>>> Check whether the SW_INCR counter is a 64b counter and if so, >>>>>> implement the 64b logic. >>>>>> >>>>>> Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") >>>>>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> >>>>>> --- >>>>>> virt/kvm/arm/pmu.c | 16 +++++++++++++++- >>>>>> 1 file changed, 15 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c >>>>>> index c3f8b059881e..7ab477db2f75 100644 >>>>>> --- a/virt/kvm/arm/pmu.c >>>>>> +++ b/virt/kvm/arm/pmu.c >>>>>> @@ -491,6 +491,8 @@ void kvm_pmu_software_increment(struct kvm_vcpu >>>>>> *vcpu, u64 val) >>>>>> >>>>>> enable = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); >>>>>> for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) { >>>>>> + bool chained = test_bit(i >> 1, vcpu->arch.pmu.chained); >>>>>> + >>>>> >>>>> I'd rather you use kvm_pmu_pmc_is_chained() rather than open-coding >>>>> this. But see below: >>>>> >>>>>> if (!(val & BIT(i))) >>>>>> continue; >>>>>> type = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i) >>>>>> @@ -500,8 +502,20 @@ void kvm_pmu_software_increment(struct kvm_vcpu >>>>>> *vcpu, u64 val) >>>>>> reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1; >>>>>> reg = lower_32_bits(reg); >>>>>> __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg; >>>>>> - if (!reg) >>>>>> + if (reg) /* no overflow */ >>>>>> + continue; >>>>>> + if (chained) { >>>>>> + reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i + 1) + 1; >>>>>> + reg = lower_32_bits(reg); >>>>>> + __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i + 1) = reg; >>>>>> + if (reg) >>>>>> + continue; >>>>>> + /* mark an overflow on high counter */ >>>>>> + __vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i + 1); >>>>>> + } else { >>>>>> + /* mark an overflow */ >>>>>> __vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i); >>>>>> + } >>>>>> } >>>>>> } >>>>>> } >>>>> >>>>> I think the whole function is a bit of a mess, and could be better >>>>> structured to treat 64bit counters as a first class citizen. >>>>> >>>>> I'm suggesting something along those lines, which tries to >>>>> streamline things a bit and keep the flow uniform between the >>>>> two word sizes. IMHO, it helps reasonning about it and gives >>>>> scope to the ARMv8.5 full 64bit counters... It is of course >>>>> completely untested. >>>> >>>> Looks OK to me as well. One remark though, don't we need to test if the >>>> n+1th reg is enabled before incrementing it? >>> >>> Hmmm. I'm not sure. I think we should make sure that we don't flag >>> a counter as being chained if the odd counter is disabled, rather >>> than checking it here. As long as the odd counter is not chained >>> *and* enabled, we shouldn't touch it.> >>> Again, untested: >>> >>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c >>> index cf371f643ade..47366817cd2a 100644 >>> --- a/virt/kvm/arm/pmu.c >>> +++ b/virt/kvm/arm/pmu.c >>> @@ -15,6 +15,7 @@ >>> #include <kvm/arm_vgic.h> >>> >>> static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 >>> select_idx); >>> +static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 >>> select_idx); >>> >>> #define PERF_ATTR_CFG1_KVM_PMU_CHAINED 0x1 >>> >>> @@ -298,6 +299,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu >>> *vcpu, u64 val) >>> * For high counters of chained events we must recreate the >>> * perf event with the long (64bit) attribute set. >>> */ >>> + kvm_pmu_update_pmc_chained(vcpu, i); >>> if (kvm_pmu_pmc_is_chained(pmc) && >>> kvm_pmu_idx_is_high_counter(i)) { >>> kvm_pmu_create_perf_event(vcpu, i); >>> @@ -645,7 +647,8 @@ static void kvm_pmu_update_pmc_chained(struct >>> kvm_vcpu *vcpu, u64 select_idx) >>> struct kvm_pmu *pmu = &vcpu->arch.pmu; >>> struct kvm_pmc *pmc = &pmu->pmc[select_idx]; >>> >>> - if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) { >>> + if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx) && >>> + kvm_pmu_counter_is_enabled(vcpu, pmc->idx)) { >> >> In create_perf_event(), has_chain_evtype() is used and a 64b sample >> period would be chosen even if the counters are disjoined (since the odd >> is disabled). We would need to use pmc_is_chained() instead. >> >> With perf_events, the check of whether the odd register is enabled is >> properly done (create_perf_event). Then I understand whenever there is a >> change in enable state or type we delete the previous perf event and >> re-create a new one. Enable state check just is missing for SW_INCR. > > Can you please respin this? I'd like to have it queued quickly, if at > all possible. Yes I am going to respin quickly. Thanks Eric > >> >> Some other questions: >> - do we need a perf event to be created even if the counter is not >> enabled? For instance on counter resets, create_perf_events get called. > > It shouldn't be necessary. > >> - also actions are made for counters which are not implemented. loop >> until ARMV8_PMU_MAX_COUNTERS. Do you think it is valuable to have a >> bitmask of supported counters stored before pmu readiness? >> I can propose such changes if you think they are valuable. > > That would certainly be a performance optimization. > > Thanks, > > M. > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm