Re: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2024-09-30 at 21:43 +0200, Thomas Gleixner wrote:
> On Thu, Sep 19 2024 at 18:50, Jeff Layton wrote:
> > The fix for this is to establish a floor value for the coarse-grained
> > clock. When stamping a file with a fine-grained timestamp, we update
> > the floor value with the current monotonic time (using cmpxchg). Then
> > later, when a coarse-grained timestamp is requested, check whether the
> > floor is later than the current coarse-grained time. If it is, then the
> > kernel will return the floor value (converted to realtime) instead of
> > the current coarse-grained clock. That allows us to maintain the
> > ordering guarantees.
> > 
> > My original implementation of this tracked the floor value in
> > fs/inode.c (also using cmpxchg), but that caused a performance
> > regression, mostly due to multiple calls into the timekeeper functions
> > with seqcount loops. By adding the floor to the timekeeper we can get
> > that back down to 1 seqcount loop.
> > 
> > Let me know if you have more questions about this, or suggestions about
> > how to do this better. The timekeeping code is not my area of expertise
> > (obviously) so I'm open to doing this a better way if there is one.
> 
> The comments I made about races and the clock_settime() inconsistency
> vs. the change log aside, I don't see room for improvement there.
> 
> What worries me is the atomic_cmpxchg() under contention on large
> machines, but as it is not a cmpxchg() loop it might be not completely
> horrible.
> 


Thanks for taking a look.

The code does try to avoid requesting a fine-grained timestamp unless
there is no other option. In practice, that means that a
write()+stat()+write() has to occur within the same coarse-grained
timer tick. That obviously does happen, but I'm hoping it's not very
frequent on real-world workloads.

The main place where we see a performance hit from this set is the
will-it-scale pipe test, which does 1-byte writes and reads on a pipe
as fast as possible.

The previous patchset tracked the floor inside fs/inode.c, and each
timestamp update on the pipe inode could make multiple trips into the
timekeeper seqcount loops. That added extra barriers and slowed things
down.

This patchset moves the code into the timekeeper, which means only a
single barrier. That gets the cost down, but it's not entirely free.
There is some small cost to dealing with the floor, regardless.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux