Hi, On 3/15/23 12:07, Eric Auger wrote: > On some HW (ThunderXv2), some random failures of > pmu-chain-promotion test can be observed. > > pmu-chain-promotion is composed of several subtests > which run 2 mem_access loops. The initial value of > the counter is set so that no overflow is expected on > the first loop run and overflow is expected on the second. > However it is observed that sometimes we get an overflow > on the first run. It looks related to some variability of > the mem_acess count. This variability is observed on all > HW I have access to, with different span though. On > ThunderX2 HW it looks the margin that is currently taken > is too small and we regularly hit failure. > > although the first goal of this series is to increase > the count/margin used in those tests, it also attempts > to improve the pmu-chain-promotion logs, add some barriers > in the mem-access loop, clarify the chain counter > enable/disable sequence. > > A new 'pmu-memaccess-reliability' is also introduced to > detect issues with MEM_ACCESS event variability and make > the debug easier. > > Obviously one can wonder if this variability is something normal > and does not hide any other bug. I hope this series will raise > additional discussions about this. > > https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v1 Gentle ping. Thanks Eric > > Eric Auger (6): > arm: pmu: pmu-chain-promotion: Improve debug messages > arm: pmu: pmu-chain-promotion: Introduce defines for count and margin > values > arm: pmu: Add extra DSB barriers in the mem_access loop > arm: pmu: Fix chain counter enable/disable sequences > arm: pmu: Add pmu-memaccess-reliability test > arm: pmu-chain-promotion: Increase the count and margin values > > arm/pmu.c | 189 +++++++++++++++++++++++++++++++++------------- > arm/unittests.cfg | 6 ++ > 2 files changed, 141 insertions(+), 54 deletions(-) >