Hi, On Wed, Jun 28, 2023 at 09:44:44AM +0200, Eric Auger wrote: > Hi Alexandru, Drew, > > On 6/19/23 22:03, Eric Auger wrote: > > On some HW (ThunderXv2), some random failures of > > pmu-chain-promotion test can be observed. > > > > pmu-chain-promotion is composed of several subtests > > which run 2 mem_access loops. The initial value of > > the counter is set so that no overflow is expected on > > the first loop run and overflow is expected on the second. > > However it is observed that sometimes we get an overflow > > on the first run. It looks related to some variability of > > the mem_acess count. This variability is observed on all > > HW I have access to, with different span though. On > > ThunderX2 HW it looks the margin that is currently taken > > is too small and we regularly hit failure. > > > > although the first goal of this series is to increase > > the count/margin used in those tests, it also attempts > > to improve the pmu-chain-promotion logs, add some barriers > > in the mem-access loop, clarify the chain counter > > enable/disable sequence. > > > > A new 'pmu-mem-access-reliability' is also introduced to > > detect issues with MEM_ACCESS event variability and make > > the debug easier. > > > > Obviously one can wonder if this variability is something normal > > and does not hide any other bug. I hope this series will raise > > additional discussions about this. > > > > https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3 > > > > History: > > > > v2 -> v3: > > - took into account Alexandru's comments. See individual log > > files > Gentle ping. Does this version match all your expectations? The series are on my radar, I'll have a look this Friday. Thanks, Alex > > Thanks > > Eric > > > > v1 -> v2: > > - Take into account Alexandru's & Mark's comments. Added some > > R-b's and T-b's. > > > > > > Eric Auger (6): > > arm: pmu: pmu-chain-promotion: Improve debug messages > > arm: pmu: pmu-chain-promotion: Introduce defines for count and margin > > values > > arm: pmu: Add extra DSB barriers in the mem_access loop > > arm: pmu: Fix chain counter enable/disable sequences > > arm: pmu: Add pmu-mem-access-reliability test > > arm: pmu-chain-promotion: Increase the count and margin values > > > > arm/pmu.c | 208 ++++++++++++++++++++++++++++++++-------------- > > arm/unittests.cfg | 6 ++ > > 2 files changed, 153 insertions(+), 61 deletions(-) > > >