On some HW (ThunderXv2), some random failures of pmu-chain-promotion test can be observed. pmu-chain-promotion is composed of several subtests which run 2 mem_access loops. The initial value of the counter is set so that no overflow is expected on the first loop run and overflow is expected on the second. However it is observed that sometimes we get an overflow on the first run. It looks related to some variability of the mem_acess count. This variability is observed on all HW I have access to, with different span though. On ThunderX2 HW it looks the margin that is currently taken is too small and we regularly hit failure. although the first goal of this series is to increase the count/margin used in those tests, it also attempts to improve the pmu-chain-promotion logs, add some barriers in the mem-access loop, clarify the chain counter enable/disable sequence. A new 'pmu-mem-access-reliability' is also introduced to detect issues with MEM_ACCESS event variability and make the debug easier. Obviously one can wonder if this variability is something normal and does not hide any other bug. I hope this series will raise additional discussions about this. https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3 History: v2 -> v3: - took into account Alexandru's comments. See individual log files v1 -> v2: - Take into account Alexandru's & Mark's comments. Added some R-b's and T-b's. Eric Auger (6): arm: pmu: pmu-chain-promotion: Improve debug messages arm: pmu: pmu-chain-promotion: Introduce defines for count and margin values arm: pmu: Add extra DSB barriers in the mem_access loop arm: pmu: Fix chain counter enable/disable sequences arm: pmu: Add pmu-mem-access-reliability test arm: pmu-chain-promotion: Increase the count and margin values arm/pmu.c | 208 ++++++++++++++++++++++++++++++++-------------- arm/unittests.cfg | 6 ++ 2 files changed, 153 insertions(+), 61 deletions(-) -- 2.38.1