On Thu, 5 Oct 2017, Gabriel Beddingfield wrote: > Hi Thomas, > > On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> > Which SoC/clocksource driver are you talking about? > >> > >> NXP i.MX 6SoloX > >> drivers/clocksource/timer-imx-gpt.c > > > > So that clocksource driver looks correct. Do you have an idea in which > > context this time jump happens? Does it happen when you exercise your high > > frequency suspend/resume dance or is that happening just when you let the > > machine run forever as well? > > We couldn't devise any reproduction steps. We observed it happening at > unexpected times in a fleet of devices -- and we couldn't find any > patterns to clue us in. Ok. Did you talk to NXP about that? Or did you try to exercise reads in a loop to detect the wreckage and maybe a pattern in there? > > The timekeeping_resume() path definitely has an issue: > > > > cycle_now = tk_clock_read(&tk->tkr_mono); > > if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) && > > cycle_now > tk->tkr_mono.cycle_last) { > > > > This works nice for clocksources which wont wrap across suspend/resume but > > not for those which can. That cycle_now -> cycle_last check should take > > cs-mask into account ... > > > > Of course for clocksources which can wrap within realistic suspend times, > > which 36 hours might be accounted for, this would need an extra sanity > > check against a RTC whether wrap time has been exceeded. > > > > I haven't thought it through whether that buggered check fully explains > > what you are observing, but it's wrong nevertheless. John? > > Nah. It looks like the consequence is that you'll either fail to inject > the sleep time or you'll fall back to having the RTC inject the sleep > time. In our case, we never sleep for more than a couple of minutes so > the error would be seconds rather than hours. Fair enough. It's still wrong though and wants to be fixed. Thanks, tglx