Re: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2024-09-16 at 12:12 +0200, Thomas Gleixner wrote:
> On Sat, Sep 14 2024 at 13:07, Jeff Layton wrote:
> > For multigrain timestamps, we must keep track of the latest timestamp
> 
> What is a multgrain timestamp? Can you please describe the concept
> behind it? I'm not going to chase random documentation or whatever
> because change logs have to self contained.
> 
> And again 'we' do nothing. Describe the problem in technical terms and
> do not impersonate code.
> 

Hi Thomas!

Sorry for the delay in responding. I'll try to summarize below, but
I'll also note that patch #7 in the v8 series adds a file to
Documentation/ that explains this in a bit more depth:

Currently the kernel always stamps files (mtime, ctime, etc.) using the
coarse-grained clock. This is usually a good thing, since it reduces
the number of metadata updates, but means that you can't reliably use
file timestamps to detect whether there have been changes to the file
since it was last checked. This is particularly a problem for NFSv3
clients, which use timestamps to know when to invalidate their
pagecache for an inode [1].

The idea is to allow the kernel to use fine-grained timestamps (mtime
and ctime) on files when they are under direct observation. When a task
does a ->getattr against an inode for STATX_MTIME or STATX_CTIME, a
flag is set in the inode that tells the kernel to use the fine-grained
clock for the timestamp update iff the current coarse-grained clock
value would not cause a change to the mtime/ctime.

This works, but there is a problem:

It's possible for one inode to get a fine-grained timestamp, and then
another to get a coarse-grained timestamp. If this happens within a
single coarse-grained timer tick, then the files may appear to have
been modified in reverse order, which breaks POSIX guarantees (and
obscure programs like "make").

The fix for this is to establish a floor value for the coarse-grained
clock. When stamping a file with a fine-grained timestamp, we update
the floor value with the current monotonic time (using cmpxchg). Then
later, when a coarse-grained timestamp is requested, check whether the
floor is later than the current coarse-grained time. If it is, then the
kernel will return the floor value (converted to realtime) instead of
the current coarse-grained clock. That allows us to maintain the
ordering guarantees.

My original implementation of this tracked the floor value in
fs/inode.c (also using cmpxchg), but that caused a performance
regression, mostly due to multiple calls into the timekeeper functions
with seqcount loops. By adding the floor to the timekeeper we can get
that back down to 1 seqcount loop.

Let me know if you have more questions about this, or suggestions about
how to do this better. The timekeeping code is not my area of expertise
(obviously) so I'm open to doing this a better way if there is one.

Thanks for the review so far!

[1]: NFSv4 mandates an opaque change attribute (usually using
inode->i_version), but only some filesystems have a proper
implementation of it (XFS being the notable exception). For the others,
we end up using the ctime to generate a change attribute, which means
that NFSv4 has the same problem on those filesystems. i_version also
doesn't help NFSv3 clients and servers.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux