Re: Question about ktime_get_mono_fast_ns() non-monotonic behavior

John Stultz <jstultz@xxxxxxxxxx> · Wed, 12 Oct 2022 20:02:07 -0700

On Mon, Sep 26, 2022 at 2:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> Hey everyone,
>
> I have a question about ktime_get_mono_fast_ns(), which is used by the
> BPF helper bpf_ktime_get_ns() among other use cases. The comment above
> this function specifies that there are cases where the observed clock
> would not be monotonic.

Sorry for the slow response.

> I had 2 beginner questions:
>
> 1) Is there a (rough) bound as to how much the clock can go backwards?
> My understanding is that it is bounded by (slope update * delta), but
> I don't know what's the bound of either of those (if any).

So, it's been awhile since I was deep in this code, and I'd not call
these beginner questions :)
But from my memory your understanding is right.

If I recall, the standard adjustment limit from NTP is usually +/-
512ppm but additional adjustments (~10% via the tick adjustment) can
be made.  There isn't a hard limit in the code, as there's clocksource
mult granularity, and other considerations, but the kernel warns when
it's over 11%.

For the discontinuity issue, we accumulate time with cycle_interval
granularity which is basically HZ, and so when we adjust the frequency
we only have to compensate the base xtime_nsec to offset for the freq
change against the unaccumulated cycles (which are less then
cycle_interval - see the logic in timekeeping_apply_adjustment()).

Then it's just the issue of how far after the update that you end up
reading the clocksource (how long of a delay you hit). I think the
assumption is you can't be delayed by more than a tick (as you the
stale base could become the active one again), but its been awhile
since I've stewed on this bit.

So I think it reasonable to say its bounded by approximately  2 *
NSEC_PER_SEC/HZ +/- 11%.

> 2) The comment specifies that for a single cpu, the only way for this
> behavior to happen is when observing the time in the context of an NMI
> that happens during an update.
> For observations across different cpus, are the scenarios where the
> non-monotonic behavior happens also tied to observing time within NMI
> contexts? or is it something that can happen outside of NMI contexts
> as well?

Yes, I believe it can happen outside of NMI contexts as well.  The
read is effectively lock-free so if you are preempted or interrupted
in the middle of the read (before fast_tk_get_delta_ns), you may end
up using the old tk_fast base with a later clocksource cycle value,
which can cause the same issue.

thanks
-john