Re: [kvm-unit-tests PATCH] arm: pmu-overflow-interrupt: Increase count values

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tue, Nov 07, 2023 at 03:28:12PM +0100, Eric Auger wrote:
> 
> 
> On 11/7/23 15:05, Alexandru Elisei wrote:
> > On Tue, Nov 07, 2023 at 02:34:05PM +0100, Eric Auger wrote:
> >> Hi Alexandru,
> >>
> >> On 11/7/23 10:52, Alexandru Elisei wrote:
> >>> Hi Eric,
> >>>
> >>> On Fri, Nov 03, 2023 at 11:01:39AM +0100, Eric Auger wrote:
> >>>> On some hardware, some pmu-overflow-interrupt failures can be observed.
> >>>> Although the even counter overflows, the interrupt is not seen as
> >>>> expected. This happens in the subtest after "promote to 64-b" comment.
> >>>> After analysis, the PMU overflow interrupt actually hits, ie.
> >>>> kvm_pmu_perf_overflow() gets called and KVM_REQ_IRQ_PENDING is set,
> >>>> as expected. However the PMCR.E is reset by the handle_exit path, at
> >>>> kvm_pmu_handle_pmcr() before the next guest entry and
> >>>> kvm_pmu_flush_hwstate/kvm_pmu_update_state subsequent call.
> >>>> There, since the enable bit has been reset, kvm_pmu_update_state() does
> >>>> not inject the interrupt into the guest.
> >>>>
> >>>> This does not seem to be a KVM bug but rather an unfortunate
> >>>> scenario where the test disables the PMCR.E too closely to the
> >>>> advent of the overflow interrupt.
> >>> If I understand correctly, the KVM PMU, after receiving the hardware PMUIRQ and
> >>> before injecting the interrupt, checks that the PMU is enabled according to the
> >>> pseudocode for the function CheckForPMUOverflow(). CheckForPMUOverflow() returns
> >>> false because PMCR_EL1.E is 0, so the KVM PMU decides not to inject the
> >>> interrupt.
> >>>
> >>> Is that correct?
> >> Yes that's correct
> >>> Changing the number of SW_INCR events might not be optimal - for example,
> >>> COUNT_INT > 100 might hide an error that otherwise would have been triggered if
> >>> the number of events were 100. Not very likely, but still a possibility.
> >> I also changed the COUNT for SW_INCR events to unify the code. However
> >> this is not strictly necessary to fix the issue I encounter. I can
> >> revert that change if you prefer.
> > I don't understand how that would solve the problem. As I see it, the problem is
> > that PMCR_EL1.E is cleared too fast after the PMU asserts the interrupt on
> > overflow, not the time it takes to get to the overflow condition (i.e, the
> > number of iterations mem_access_loop() does).
> 
> sorry I did not make my point clear. Indeed wrt SW_INCR overflow testing
> I do not intend to fix any issue by this change. I just intended to use
> the same number of iterations as for mem_access. So I will revert that
> change.
> >
> >>> Another approach would be to wait for a set amount of time for the CPU to take
> >>> the interrupt. There's something similar in timer.c::{test_timer_tval(),
> >>> timer_do_wfi()}.
> >> you're right. However this would urge me to have a separate asm code
> >> that loops with wfi after doing the mem_access loop. I am not sure this
> >> is worth the candle here?
> > I think plain C would work, I was thinking something like this:
> >
> > diff --git a/arm/pmu.c b/arm/pmu.c
> > index a91a7b1fd4be..fb2eb5fa2e50 100644
> > --- a/arm/pmu.c
> > +++ b/arm/pmu.c
> > @@ -979,6 +979,23 @@ static void test_overflow_interrupt(bool overflow_at_64bits)
> >         /* interrupts are disabled (PMINTENSET_EL1 == 0) */
> >
> >         mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> Currently PMCR_E is reset by mem_access_loop() (at msr pmcr_el0,
> xzr\n"). so if we want to introduce a delay between the overflow
> interrupt and the PMCR.E disable, we need to either add extra MEM_ACCESS
> or do wfi within mem_access_loop()

Sorry, missed that.

> Or we do something like what you suggest below and reset the PMCR_E
> afterwards with the downside to add extra code execution accounted by
> the PMU. I would prefer to avoid that since the purpose of having the
> asm code was to "master" what we measure.

Just an idea, we could re-enable the PMU in the C function right after the
mem_loop(), which will get trapped and KVM will probably inject the
interrupt because according to CheckForPMUOverflow() the PMUIRQ should be
asserted.  This is just a theory, haven't tested this and haven't looked at
the KVM code.

I still think we should have the wait loop after re-enabling the PMU, just
so the test is architecturally compliant (i.e., interrupts are not
delivered instantaneously after the device asserts the interrupt to the
GIC), but that could be left as an exercise for another patch.

Thanks,
Alex

> > +
> > +       if (!expect_interrupts(0)) {
> > +                for (i = 0; i < 10; i++) {
> > +                       local_irq_disable();
> > +                       if (expect_interrupts(0)) {
> > +                               local_irq_enable();
> > +                               break;
> > +                       }
> > +                       report_info("waiting for interrupt...");
> > +                       wfi();
> > +                       local_irq_enable();
> > +                       if (expect_interrupts(0))
> > +                               break;
> > +                        mdelay(100);
> > +                }
> > +       }
> > +
> >         report(expect_interrupts(0), "no overflow interrupt after preset");
> >
> >         set_pmcr(pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >
> > Can be cleaned up by moving it to separate function, etc. Has the downside that
> > it may performs extra memory accesses in expect_interrupts(). Your choice.
> >
> > By the way, pmu_stats is not declared volatile, which means that the
> > compiler is free to optimize accesses to the struct by caching previously
> > read values in registers. Have you tried declaring it as volatile, in case
> > that fixes the issues you were seeing?
> In my case it won't fix the issue because the stats match what happens
> but your suggestion makes total sense in general.
> 
> I will add that.
> 
> Eric
> >
> > If you do decide to go with the above suggestion, I strongly suggest
> > pmu_stats is declared as volatile, otherwise the compiler will likely end
> > up not reading from memory on every iteration.
> >
> > Thanks,
> > Alex
> >> Thanks!
> >>
> >> Eric
> >>> Thanks,
> >>> Alex
> >>>
> >>>> Since it looks like a benign and inlikely case, let's resize the number
> >>>> of iterations to prevent the PMCR enable bit from being resetted
> >>>> at the same time as the actual overflow event.
> >>>>
> >>>> COUNT_INT is introduced, arbitrarily set to 1000 iterations and is
> >>>> used in this test.
> >>>>
> >>>> Reported-by: Jan Richter <jarichte@xxxxxxxxxx>
> >>>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx>
> >>>> ---
> >>>>  arm/pmu.c | 15 ++++++++-------
> >>>>  1 file changed, 8 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/arm/pmu.c b/arm/pmu.c
> >>>> index a91a7b1f..acd88571 100644
> >>>> --- a/arm/pmu.c
> >>>> +++ b/arm/pmu.c
> >>>> @@ -66,6 +66,7 @@
> >>>>  #define PRE_OVERFLOW_64		0xFFFFFFFFFFFFFFF0ULL
> >>>>  #define COUNT 250
> >>>>  #define MARGIN 100
> >>>> +#define COUNT_INT 1000
> >>>>  /*
> >>>>   * PRE_OVERFLOW2 is set so that 1st @COUNT iterations do not
> >>>>   * produce 32b overflow and 2nd @COUNT iterations do. To accommodate
> >>>> @@ -978,13 +979,13 @@ static void test_overflow_interrupt(bool overflow_at_64bits)
> >>>>  
> >>>>  	/* interrupts are disabled (PMINTENSET_EL1 == 0) */
> >>>>  
> >>>> -	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>> +	mem_access_loop(addr, COUNT_INT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>>  	report(expect_interrupts(0), "no overflow interrupt after preset");
> >>>>  
> >>>>  	set_pmcr(pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>>  	isb();
> >>>>  
> >>>> -	for (i = 0; i < 100; i++)
> >>>> +	for (i = 0; i < COUNT_INT; i++)
> >>>>  		write_sysreg(0x2, pmswinc_el0);
> >>>>  
> >>>>  	isb();
> >>>> @@ -1002,15 +1003,15 @@ static void test_overflow_interrupt(bool overflow_at_64bits)
> >>>>  	write_sysreg(ALL_SET_32, pmintenset_el1);
> >>>>  	isb();
> >>>>  
> >>>> -	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>> +	mem_access_loop(addr, COUNT_INT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>>  
> >>>>  	set_pmcr(pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>>  	isb();
> >>>>  
> >>>> -	for (i = 0; i < 100; i++)
> >>>> +	for (i = 0; i < COUNT_INT; i++)
> >>>>  		write_sysreg(0x3, pmswinc_el0);
> >>>>  
> >>>> -	mem_access_loop(addr, 200, pmu.pmcr_ro);
> >>>> +	mem_access_loop(addr, COUNT_INT, pmu.pmcr_ro);
> >>>>  	report_info("overflow=0x%lx", read_sysreg(pmovsclr_el0));
> >>>>  	report(expect_interrupts(0x3),
> >>>>  		"overflow interrupts expected on #0 and #1");
> >>>> @@ -1029,7 +1030,7 @@ static void test_overflow_interrupt(bool overflow_at_64bits)
> >>>>  	write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
> >>>>  	write_regn_el0(pmevcntr, 0, pre_overflow);
> >>>>  	isb();
> >>>> -	mem_access_loop(addr, 200, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>> +	mem_access_loop(addr, COUNT_INT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>>  	report(expect_interrupts(0x1), "expect overflow interrupt");
> >>>>  
> >>>>  	/* overflow on odd counter */
> >>>> @@ -1037,7 +1038,7 @@ static void test_overflow_interrupt(bool overflow_at_64bits)
> >>>>  	write_regn_el0(pmevcntr, 0, pre_overflow);
> >>>>  	write_regn_el0(pmevcntr, 1, all_set);
> >>>>  	isb();
> >>>> -	mem_access_loop(addr, 400, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>> +	mem_access_loop(addr, COUNT_INT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> >>>>  	if (overflow_at_64bits) {
> >>>>  		report(expect_interrupts(0x1),
> >>>>  		       "expect overflow interrupt on even counter");
> >>>> -- 
> >>>> 2.41.0
> >>>>
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux