On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > The kernel test robot reported a performance hit in some will-it-scale > tests due to the multigrain timestamp patches. My own testing showed > about a 7% drop in performance on the pipe1_threads test, and the data > showed that coarse_ctime() was slowing down current_time(). So, you provided some useful detail about why coarse_ctime() was slow in your reply earlier, but it would be good to preserve that in the commit message here. > Move the multigrain timestamp floor tracking word into timekeeper.c. Add > two new public interfaces: The first fills a timespec64 with the later > of the coarse-grained clock and the floor time, and the second gets a > fine-grained time and tries to swap it into the floor and fills a > timespec64 with the result. > > The first function returns an opaque cookie that is suitable for passing > to the second, which will use it as the "old" value in the cmpxchg. The cookie usage isn't totally clear to me right off. It feels a bit more subtle then I'd expect. > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 5391e4167d60..bb039c9d525e 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = { > .base[1] = FAST_TK_INIT, > }; > > +/* > + * This represents the latest fine-grained time that we have handed out as a > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the > + * realtime clock on an as-needed basis. > + */ > +static __cacheline_aligned_in_smp atomic64_t mg_floor; > + So I do really like this general approach of having an internal floor value combined with special coarse/fine grained assessors that work with the floor, so we're not impacting the normal hotpath logic (basically I was writing up a suggestion to this effect to the thread with Arnd when I realized you had follow up patch in my inbox). > static inline void tk_normalize_xtime(struct timekeeper *tk) > { > while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) { > @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts) > } > EXPORT_SYMBOL(ktime_get_coarse_real_ts64); > > +/** > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor > + * @ts: timespec64 to be filled > + * > + * Adjust floor to realtime and compare it to the coarse time. Fill > + * @ts with the latest one. Returns opaque cookie suitable to pass > + * to ktime_get_real_ts64_mg. > + */ > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) > +{ > + struct timekeeper *tk = &tk_core.timekeeper; > + u64 floor = atomic64_read(&mg_floor); > + ktime_t f_real, offset, coarse; > + unsigned int seq; > + > + WARN_ON(timekeeping_suspended); > + > + do { > + seq = read_seqcount_begin(&tk_core.seq); > + *ts = tk_xtime(tk); > + offset = *offsets[TK_OFFS_REAL]; > + } while (read_seqcount_retry(&tk_core.seq, seq)); > + > + coarse = timespec64_to_ktime(*ts); > + f_real = ktime_add(floor, offset); > + if (ktime_after(f_real, coarse)) > + *ts = ktime_to_timespec64(f_real); > + return floor; > +} > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); Generally this looks ok to me. > +/** > + * ktime_get_real_ts64_mg - attempt to update floor value and return result > + * @ts: pointer to the timespec to be set > + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg() > + * > + * Get a current monotonic fine-grained time value and attempt to swap > + * it into the floor using @cookie as the "old" value. @ts will be > + * filled with the resulting floor value, regardless of the outcome of > + * the swap. > + */ Again this cookie argument usage and the behavior of this function isn't very clear to me. > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie) > +{ > + struct timekeeper *tk = &tk_core.timekeeper; > + ktime_t offset, mono, old = (ktime_t)cookie; > + unsigned int seq; > + u64 nsecs; > + > + WARN_ON(timekeeping_suspended); > + > + do { > + seq = read_seqcount_begin(&tk_core.seq); > + > + ts->tv_sec = tk->xtime_sec; > + mono = tk->tkr_mono.base; > + nsecs = timekeeping_get_ns(&tk->tkr_mono); > + offset = *offsets[TK_OFFS_REAL]; > + } while (read_seqcount_retry(&tk_core.seq, seq)); > + > + mono = ktime_add_ns(mono, nsecs); > + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { > + ts->tv_nsec = 0; > + timespec64_add_ns(ts, nsecs); > + } else { > + *ts = ktime_to_timespec64(ktime_add(old, offset)); > + } > + > +} > +EXPORT_SYMBOL(ktime_get_real_ts64_mg); So initially I was expecting this to look something like (sorry for the whitespace damage here): { do { seq = read_seqcount_begin(&tk_core.seq); ts->tv_sec = tk->xtime_sec; mono = tk->tkr_mono.base; nsecs = timekeeping_get_ns(&tk->tkr_mono); offset = *offsets[TK_OFFS_REAL]; } while (read_seqcount_retry(&tk_core.seq, seq)); mono = ktime_add_ns(mono, nsecs); do { old = atomic64_read(&mg_floor); if (floor >= mono) break; } while(!atomic64_try_cmpxchg(&mg_floor, old, mono); ts->tv_nsec = 0; timespec64_add_ns(ts, nsecs); } Where you read the tk data, atomically update the floor (assuming it's not in the future) and then return the finegrained value, not needing to manage a cookie value. But instead, it seems like if something has happened since the cookie value was saved (another cpu getting a fine grained timestamp), your ktime_get_real_ts64_mg() will fall back to returning the same coarse grained time saved to the cookie, as if no time had past? It seems like that could cause problems: cpu1 cpu2 -------------------------------------------------------------------------- t2a = ktime_get_coarse_real_ts64_mg t1a = ktime_get_coarse_real_ts64_mg() t1b = ktime_get_real_ts64_mg(t1a) t2b = ktime_get_real_ts64_mg(t2a) Where t2b will seem to be before t1b, even though it happened afterwards. thanks -john