Re: Question about ktime_get_mono_fast_ns() non-monotonic behavior

John Stultz <jstultz@xxxxxxxxxx> · Thu, 13 Oct 2022 21:13:22 -0700

On Thu, Oct 13, 2022 at 8:47 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Thu, Oct 13, 2022 at 8:42 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> >
> > On Thu, Oct 13, 2022 at 8:26 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > > On Thu, Oct 13, 2022 at 7:39 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > > > On Mon, Sep 26, 2022 at 2:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > > > >
> > > > > I have a question about ktime_get_mono_fast_ns(), which is used by the
> > > > > BPF helper bpf_ktime_get_ns() among other use cases. The comment above
> > > > > this function specifies that there are cases where the observed clock
> > > > > would not be monotonic.
> > > > >
> > > > > I had 2 beginner questions:
> > > >
> > > > Thinking about this a bit more, I have my own "beginner question": Why
> > > > does bpf_ktime_get_ns() need to use the ktime_get_mono_fast_ns()
> > > > accessor instead of ktime_get_ns()?
> > > >
> > > > I don't know enough about the contexts that bpf logic can run, so it's
> > > > not clear to me and it's not obviously commented either.
> > >
> > > I am not the best person to answer this question (the BPF list is
> > > CC'd, it's full of more knowledgeable people).
> > >
> > > My understanding is that because BPF programs can basically be run in
> > > any context (because they can attach to almost all functions /
> > > tracepoints in the kernel), the time accessor needs to be safe in all
> > > contexts.
> >
> > Ah. Ok, the tracepoint connection is indeed likely the case. Thanks
> > for clarifying.
> >
> > > Now that I know that ktime_get_mono_fast_ns() can drift significantly,
> > > I am wondering why we don't just read sched_clock(). Can the
> > > difference between sched_clock() on different cpus be even higher than
> > > the potential drift from ktime_get_mono_fast_ns()?
> >
> > sched_clock is also lock free and so I think it's possible to have
> > inconsistencies.
>
> Right, I am just trying to figure out which is worse,
> ktime_get_mono_fast_ns() or sched_clock(). It appears to me that both
> can be inconsistent, but at least AFAICT sched_clock() can only be
> inconsistent if read across different cpus, right? It should also be
> faster (at least in my experimentation).
>
> I am wondering if there is a bound on the inconsistency we might
> observe from sched_clock() if we read it across different cpus, and if
> there is, how does it compare to ktime_get_mono_fast_ns() in that
> regard.

Again, I think ktime_get_raw_fast_ns() (so CLOCK_MONOTONIC_RAW) is
likely to be closer to sched_clock() as neither of them are NTP
adjusted.
(Which also likely makes them unusable for the case where timestamps
are compared with userland CLOCK_MONOTONIC timestamps).

So folks might need a new bpf interface for that.

Also I think folks would want to avoid exporting sched_clock
timestamps out to userland as they aren't connected to a well defined
clockid, and may have odd behavior around suspend/resume, etc.

thanks
-john