On Mon, Sep 26, 2022 at 2:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > Hey everyone, > > I have a question about ktime_get_mono_fast_ns(), which is used by the > BPF helper bpf_ktime_get_ns() among other use cases. The comment above > this function specifies that there are cases where the observed clock > would not be monotonic. Sorry for the slow response. > I had 2 beginner questions: > > 1) Is there a (rough) bound as to how much the clock can go backwards? > My understanding is that it is bounded by (slope update * delta), but > I don't know what's the bound of either of those (if any). So, it's been awhile since I was deep in this code, and I'd not call these beginner questions :) But from my memory your understanding is right. If I recall, the standard adjustment limit from NTP is usually +/- 512ppm but additional adjustments (~10% via the tick adjustment) can be made. There isn't a hard limit in the code, as there's clocksource mult granularity, and other considerations, but the kernel warns when it's over 11%. For the discontinuity issue, we accumulate time with cycle_interval granularity which is basically HZ, and so when we adjust the frequency we only have to compensate the base xtime_nsec to offset for the freq change against the unaccumulated cycles (which are less then cycle_interval - see the logic in timekeeping_apply_adjustment()). Then it's just the issue of how far after the update that you end up reading the clocksource (how long of a delay you hit). I think the assumption is you can't be delayed by more than a tick (as you the stale base could become the active one again), but its been awhile since I've stewed on this bit. So I think it reasonable to say its bounded by approximately 2 * NSEC_PER_SEC/HZ +/- 11%. > 2) The comment specifies that for a single cpu, the only way for this > behavior to happen is when observing the time in the context of an NMI > that happens during an update. > For observations across different cpus, are the scenarios where the > non-monotonic behavior happens also tied to observing time within NMI > contexts? or is it something that can happen outside of NMI contexts > as well? Yes, I believe it can happen outside of NMI contexts as well. The read is effectively lock-free so if you are preempted or interrupted in the middle of the read (before fast_tk_get_delta_ns), you may end up using the old tk_fast base with a later clocksource cycle value, which can cause the same issue. thanks -john