On Fri, Aug 06, 2021 at 10:10:00AM +0800, Chao Gao wrote: > On Thu, Aug 05, 2021 at 08:37:27AM -0700, Paul E. McKenney wrote: > >On Thu, Aug 05, 2021 at 01:39:40PM +0800, Chao Gao wrote: > >> [snip] > >> >> This patch works well; no false-positive (marking TSC unstable) in a > >> >> 10hr stress test. > >> > > >> >Very good, thank you! May I add your Tested-by? > >> > >> sure. > >> Tested-by: Chao Gao <chao.gao@xxxxxxxxx> > > > >Very good, thank you! I will apply this on the next rebase. > > > >> >I expect that I will need to modify the patch a bit more to check for > >> >a system where it is -never- able to get a good fine-grained read from > >> >the clock. > >> > >> Agreed. > >> > >> >And it might be that your test run ended up in that state. > >> > >> Not that case judging from kernel logs. Coarse-grained check happened 6475 > >> times in 43k seconds (by grep "coarse-grained skew check" in kernel logs). > >> So, still many checks were fine-grained. > > > >Whew! ;-) > > > >So about once per 13 clocksource watchdog checks. > > > >To Andi's point, do you have enough information in your console log to > >work out the longest run of course-grained clocksource checks? > > Yes. 5 consecutive course-grained clocksource checks. Note that > considering the reinitialization after course-grained check, in my > calculation, two course-grained checks are considered consecutive if > they happens in 1s(+/- 0.3s). Very good, thank you! So it seems eminently reasonable to have the clocksource watchdog complain bitterly for more than (say) 100 consecutive course-grained checks. I am thinking in terms of a separate patch for this purpose. Thoughts? Thanx, Paul