Re: [PATCH] timekeeping: move multigrain ctime floor handling into timekeeper

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 12 Sep 2024 07:34:42 -0400

On Thu, 2024-09-12 at 10:01 +0000, Arnd Bergmann wrote:
> On Wed, Sep 11, 2024, at 20:43, Jeff Layton wrote:
> > 
> > I think we'd have to track this delta as an atomic value and cmpxchg
> > new values into place. The zeroing seems quite tricky to make race-
> > free.
> > 
> > Currently, we fetch the floor value early in the process and if it
> > changes before we can swap a new one into place, we just take whatever
> > the new value is (since it's just as good). Since these are monotonic
> > values, any new value is still newer than the original one, so its
> > fine. I'm not sure that still works if we're dealing with a delta that
> > is siding upward and downward.
> > 
> > Maybe it does though. I'll take a stab at this tomorrow and see how it
> > looks.
> 
> Right, the only idea I had for this would be to atomically
> update a 64-bit tuple of the 32-bit sequence count and the
> 32-bit delta value in the timerkeeper. That way I think the
> "coarse" reader would still get a correct value when running
> concurrently with both a fine-grained reader updating the count
> and the timer tick setting a new count.
> 
> There are still a couple of problems:
> 
> - this extends the timekeeper logic beyond what the seqlock
>   semantics normally allow, and I can't prove that this actually
>   works in all corner cases.
>
> - if the delta doesn't fit in a 32-bit value, there has to 
>   be another fallback mechanism.
> 

That could be a problem. I was hoping the delta couldn't grow that
large between timer ticks, but I guess it can. I guess the fallback
could be to just grab new fine-grained timestamps on each call until
the timer ticks.

> - This still requires an atomic64_cmpxchg() in the
>   fine-grained ktime_get_real_ts64() replacement, which
>   I think is what inode_set_ctime_current() needs today
>   as well to ensure that the next coarse value is the
>   highest one that has been read so far.
> 

Yes. We really don't want to take the seqlock for write just to update
timestamps. I'd prefer to keep the floor-handling lock-free if
possible.

> There is another idea that would completely replace
> your design with something /much/ simpler:
> 
>  - add a variant of ktime_get_real_ts64() that just
>    sets a flag in the timekeeper to signify that a
>    fine-grained time has been read since the last
>    timer tick
>  - add a variant of ktime_get_coarse_real_ts64()
>    that returns either tk_xtime() if the flag is
>    clear or calls ktime_get_real_ts64() if it's set
>  - reset the flag in timekeeping_advance() and any other
>    place that updates tk_xtime
> 
> That way you avoid the atomic64_try_cmpxchg()
> inode_set_ctime_current(), making that case faster,
> and avoid all overhead in coarse_ctime() unless you
> use both types during the same tick.
> 

With the current code we only get a fine grained timestamp iff:

1/ the timestamps have been queried (a'la I_CTIME_QUERIED)
2/ the current coarse-grained or floor time would not show a change in
the ctime

If we do what you're suggesting above, as soon as one task sets the
flag, anyone calling current_time() will end up getting a brand new
fine-grained timestamp, even when the current floor time would have
been fine.

That means a lot more calls into ktime_get_real_ts64(), at least until
the timer ticks, and would probably mean a lot of extra journal
transactions, since those timestamps would all be distinct from one
another and would need to go to disk more often.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>