On Mon, 2024-09-16 at 12:12 +0200, Thomas Gleixner wrote: > On Sat, Sep 14 2024 at 13:07, Jeff Layton wrote: > > For multigrain timestamps, we must keep track of the latest timestamp > > What is a multgrain timestamp? Can you please describe the concept > behind it? I'm not going to chase random documentation or whatever > because change logs have to self contained. > > And again 'we' do nothing. Describe the problem in technical terms and > do not impersonate code. > Hi Thomas! Sorry for the delay in responding. I'll try to summarize below, but I'll also note that patch #7 in the v8 series adds a file to Documentation/ that explains this in a bit more depth: Currently the kernel always stamps files (mtime, ctime, etc.) using the coarse-grained clock. This is usually a good thing, since it reduces the number of metadata updates, but means that you can't reliably use file timestamps to detect whether there have been changes to the file since it was last checked. This is particularly a problem for NFSv3 clients, which use timestamps to know when to invalidate their pagecache for an inode [1]. The idea is to allow the kernel to use fine-grained timestamps (mtime and ctime) on files when they are under direct observation. When a task does a ->getattr against an inode for STATX_MTIME or STATX_CTIME, a flag is set in the inode that tells the kernel to use the fine-grained clock for the timestamp update iff the current coarse-grained clock value would not cause a change to the mtime/ctime. This works, but there is a problem: It's possible for one inode to get a fine-grained timestamp, and then another to get a coarse-grained timestamp. If this happens within a single coarse-grained timer tick, then the files may appear to have been modified in reverse order, which breaks POSIX guarantees (and obscure programs like "make"). The fix for this is to establish a floor value for the coarse-grained clock. When stamping a file with a fine-grained timestamp, we update the floor value with the current monotonic time (using cmpxchg). Then later, when a coarse-grained timestamp is requested, check whether the floor is later than the current coarse-grained time. If it is, then the kernel will return the floor value (converted to realtime) instead of the current coarse-grained clock. That allows us to maintain the ordering guarantees. My original implementation of this tracked the floor value in fs/inode.c (also using cmpxchg), but that caused a performance regression, mostly due to multiple calls into the timekeeper functions with seqcount loops. By adding the floor to the timekeeper we can get that back down to 1 seqcount loop. Let me know if you have more questions about this, or suggestions about how to do this better. The timekeeping code is not my area of expertise (obviously) so I'm open to doing this a better way if there is one. Thanks for the review so far! [1]: NFSv4 mandates an opaque change attribute (usually using inode->i_version), but only some filesystems have a proper implementation of it (XFS being the notable exception). For the others, we end up using the ctime to generate a change attribute, which means that NFSv4 has the same problem on those filesystems. i_version also doesn't help NFSv3 clients and servers. -- Jeff Layton <jlayton@xxxxxxxxxx>