On Thu, 2024-09-12 at 10:01 +0000, Arnd Bergmann wrote: > On Wed, Sep 11, 2024, at 20:43, Jeff Layton wrote: > > > > I think we'd have to track this delta as an atomic value and cmpxchg > > new values into place. The zeroing seems quite tricky to make race- > > free. > > > > Currently, we fetch the floor value early in the process and if it > > changes before we can swap a new one into place, we just take whatever > > the new value is (since it's just as good). Since these are monotonic > > values, any new value is still newer than the original one, so its > > fine. I'm not sure that still works if we're dealing with a delta that > > is siding upward and downward. > > > > Maybe it does though. I'll take a stab at this tomorrow and see how it > > looks. > > Right, the only idea I had for this would be to atomically > update a 64-bit tuple of the 32-bit sequence count and the > 32-bit delta value in the timerkeeper. That way I think the > "coarse" reader would still get a correct value when running > concurrently with both a fine-grained reader updating the count > and the timer tick setting a new count. > > There are still a couple of problems: > > - this extends the timekeeper logic beyond what the seqlock > semantics normally allow, and I can't prove that this actually > works in all corner cases. > > - if the delta doesn't fit in a 32-bit value, there has to > be another fallback mechanism. > That could be a problem. I was hoping the delta couldn't grow that large between timer ticks, but I guess it can. I guess the fallback could be to just grab new fine-grained timestamps on each call until the timer ticks. > - This still requires an atomic64_cmpxchg() in the > fine-grained ktime_get_real_ts64() replacement, which > I think is what inode_set_ctime_current() needs today > as well to ensure that the next coarse value is the > highest one that has been read so far. > Yes. We really don't want to take the seqlock for write just to update timestamps. I'd prefer to keep the floor-handling lock-free if possible. > There is another idea that would completely replace > your design with something /much/ simpler: > > - add a variant of ktime_get_real_ts64() that just > sets a flag in the timekeeper to signify that a > fine-grained time has been read since the last > timer tick > - add a variant of ktime_get_coarse_real_ts64() > that returns either tk_xtime() if the flag is > clear or calls ktime_get_real_ts64() if it's set > - reset the flag in timekeeping_advance() and any other > place that updates tk_xtime > > That way you avoid the atomic64_try_cmpxchg() > inode_set_ctime_current(), making that case faster, > and avoid all overhead in coarse_ctime() unless you > use both types during the same tick. > With the current code we only get a fine grained timestamp iff: 1/ the timestamps have been queried (a'la I_CTIME_QUERIED) 2/ the current coarse-grained or floor time would not show a change in the ctime If we do what you're suggesting above, as soon as one task sets the flag, anyone calling current_time() will end up getting a brand new fine-grained timestamp, even when the current floor time would have been fine. That means a lot more calls into ktime_get_real_ts64(), at least until the timer ticks, and would probably mean a lot of extra journal transactions, since those timestamps would all be distinct from one another and would need to go to disk more often. -- Jeff Layton <jlayton@xxxxxxxxxx>