On Tue, Apr 04, 2023 at 02:47:49PM +0200, Andrew Jones wrote: > On Tue, Apr 04, 2023 at 08:23:15AM +0200, Eric Auger wrote: > > Hi, > > > > On 3/15/23 12:07, Eric Auger wrote: > > > On some HW (ThunderXv2), some random failures of > > > pmu-chain-promotion test can be observed. > > > > > > pmu-chain-promotion is composed of several subtests > > > which run 2 mem_access loops. The initial value of > > > the counter is set so that no overflow is expected on > > > the first loop run and overflow is expected on the second. > > > However it is observed that sometimes we get an overflow > > > on the first run. It looks related to some variability of > > > the mem_acess count. This variability is observed on all > > > HW I have access to, with different span though. On > > > ThunderX2 HW it looks the margin that is currently taken > > > is too small and we regularly hit failure. > > > > > > although the first goal of this series is to increase > > > the count/margin used in those tests, it also attempts > > > to improve the pmu-chain-promotion logs, add some barriers > > > in the mem-access loop, clarify the chain counter > > > enable/disable sequence. > > > > > > A new 'pmu-memaccess-reliability' is also introduced to > > > detect issues with MEM_ACCESS event variability and make > > > the debug easier. > > > > > > Obviously one can wonder if this variability is something normal > > > and does not hide any other bug. I hope this series will raise > > > additional discussions about this. > > > > > > https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v1 > > > > Gentle ping. > > I'd be happy to take this, but I was hoping to see some r-b's and/or t-b's > from some of the others. Any takers? Ricardo? Alexandru? Thanks, drew