Hi Alexandru, On 6/28/23 12:18, Alexandru Elisei wrote: > Hi, > > On Wed, Jun 28, 2023 at 09:44:44AM +0200, Eric Auger wrote: >> Hi Alexandru, Drew, >> >> On 6/19/23 22:03, Eric Auger wrote: >>> On some HW (ThunderXv2), some random failures of >>> pmu-chain-promotion test can be observed. >>> >>> pmu-chain-promotion is composed of several subtests >>> which run 2 mem_access loops. The initial value of >>> the counter is set so that no overflow is expected on >>> the first loop run and overflow is expected on the second. >>> However it is observed that sometimes we get an overflow >>> on the first run. It looks related to some variability of >>> the mem_acess count. This variability is observed on all >>> HW I have access to, with different span though. On >>> ThunderX2 HW it looks the margin that is currently taken >>> is too small and we regularly hit failure. >>> >>> although the first goal of this series is to increase >>> the count/margin used in those tests, it also attempts >>> to improve the pmu-chain-promotion logs, add some barriers >>> in the mem-access loop, clarify the chain counter >>> enable/disable sequence. >>> >>> A new 'pmu-mem-access-reliability' is also introduced to >>> detect issues with MEM_ACCESS event variability and make >>> the debug easier. >>> >>> Obviously one can wonder if this variability is something normal >>> and does not hide any other bug. I hope this series will raise >>> additional discussions about this. >>> >>> https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3 >>> >>> History: >>> >>> v2 -> v3: >>> - took into account Alexandru's comments. See individual log >>> files >> Gentle ping. Does this version match all your expectations? > The series are on my radar, I'll have a look this Friday. OK thanks :-) Eric > > Thanks, > Alex > >> Thanks >> >> Eric >>> v1 -> v2: >>> - Take into account Alexandru's & Mark's comments. Added some >>> R-b's and T-b's. >>> >>> >>> Eric Auger (6): >>> arm: pmu: pmu-chain-promotion: Improve debug messages >>> arm: pmu: pmu-chain-promotion: Introduce defines for count and margin >>> values >>> arm: pmu: Add extra DSB barriers in the mem_access loop >>> arm: pmu: Fix chain counter enable/disable sequences >>> arm: pmu: Add pmu-mem-access-reliability test >>> arm: pmu-chain-promotion: Increase the count and margin values >>> >>> arm/pmu.c | 208 ++++++++++++++++++++++++++++++++-------------- >>> arm/unittests.cfg | 6 ++ >>> 2 files changed, 153 insertions(+), 61 deletions(-) >>>