On Tue, 2023-09-26 at 08:32 +1000, Dave Chinner wrote: > On Mon, Sep 25, 2023 at 06:14:05AM -0400, Jeff Layton wrote: > > On Mon, 2023-09-25 at 08:18 +1000, Dave Chinner wrote: > > > On Sat, Sep 23, 2023 at 05:52:36PM +0300, Amir Goldstein wrote: > > > > On Sat, Sep 23, 2023 at 1:46 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > On Sat, 2023-09-23 at 10:15 +0300, Amir Goldstein wrote: > > > > > > On Fri, Sep 22, 2023 at 8:15 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > My initial goal was to implement multigrain timestamps on most major > > > > > > > filesystems, so we could present them to userland, and use them for > > > > > > > NFSv3, etc. > > > > > > > > > > > > > > With the current implementation however, we can't guarantee that a file > > > > > > > with a coarse grained timestamp modified after one with a fine grained > > > > > > > timestamp will always appear to have a later value. This could confuse > > > > > > > some programs like make, rsync, find, etc. that depend on strict > > > > > > > ordering requirements for timestamps. > > > > > > > > > > > > > > The goal of this version is more modest: fix XFS' change attribute. > > > > > > > XFS's change attribute is bumped on atime updates in addition to other > > > > > > > deliberate changes. This makes it unsuitable for export via nfsd. > > > > > > > > > > > > > > Jan Kara suggested keeping this functionality internal-only for now and > > > > > > > plumbing the fine grained timestamps through getattr [1]. This set takes > > > > > > > a slightly different approach and has XFS use the fine-grained attr to > > > > > > > fake up STATX_CHANGE_COOKIE in its getattr routine itself. > > > > > > > > > > > > > > While we keep fine-grained timestamps in struct inode, when presenting > > > > > > > the timestamps via getattr, we truncate them at a granularity of number > > > > > > > of ns per jiffy, > > > > > > > > > > > > That's not good, because user explicitly set granular mtime would be > > > > > > truncated too and booting with different kernels (HZ) would change > > > > > > the observed timestamps of files. > > > > > > > > > > > > > > > > Thinking about this some more, I think the first problem is easily > > > > > addressable: > > > > > > > > > > The ctime isn't explicitly settable and with this set, we're already not > > > > > truncating the atime. We haven't used any of the extra bits in the mtime > > > > > yet, so we could just carve out a flag in there that says "this mtime > > > > > was explicitly set and shouldn't be truncated before presentation". > > > > > > > > > > > > > I thought about this option too. > > > > But note that the "mtime was explicitly set" flag needs > > > > to be persisted to disk so you cannot store it in the high nsec bits. > > > > At least XFS won't store those bits if you use them - they have to > > > > be translated to an XFS inode flag and I don't know if changing > > > > XFS on-disk format was on your wish list. > > > > > > Remember: this multi-grain timestamp thing was an idea to solve the > > > NFS change attribute problem without requiring *any* filesystem with > > > sub-jiffie timestamp capability to change their on-disk format to > > > implement a persistent change attribute that matches the new > > > requires of the kernel nfsd. > > > > > > If we now need to change the on-disk format to support > > > some whacky new timestamp semantic to do this, then people have > > > completely lost sight of what problem the multi-grain timestamp idea > > > was supposed to address. > > > > > > > Yep. The main impetus for all of this was to fix XFS's change attribute > > without requiring an on-disk format change. If we have to rev the on- > > disk format, we're probably better off plumbing in a proper i_version > > counter and tossing this idea aside. > > > > That said, I think all we'd need for this scheme is a single flag per > > inode (to indicate that the mtime shouldn't be truncated before > > presentation). If that's possible to do without fully revving the inode > > format, then we could still pursue this. I take it that's probably not > > the case though. > > Older kernels that don't know what the flag means, but that should > be OK for an inode flag. The bigger issue is that none of the > userspace tools (xfs_db, xfs_repair, etc) know about it, so they > would have to be taught about it. And then there's testing it, which > likely means userspace needs visibility of the flag (e.g. FS_XFLAG > for it) and then there's more work.... > > It's really not worth it. > > > I think that Linus's suggestion of the in-memory inode timestamp > always being a 64bit, 100ns granularity value instead of a timespec > that gets truncated at sample time has merit as a general solution. > Changing how we store timestamps in struct inode is a good idea, and reducing the effective granularity to 100ns seems reasonable, but that alone won't fix XFS's i_version counter, or the ordering problems that we hit with the multigrain series that had to be reverted. > We also must not lose sight of the fact that the lazytime mount > option makes atime updates on XFS behave exactly as the nfsd/NFS > client application wants. That is, XFS will do in-memory atime > updates unless the atime update also sets S_VERSION to explicitly > bump the i_version counter if required. That leads to another > potential nfsd specific solution without requiring filesystems to > change on disk formats: the nfsd explicitly asks operations for lazy > atime updates... > Not exactly. The problem with XFS's i_version is that it also bumps it on atime updates. lazytime reduces the number of atime updates to ~1/day. To be exactly what nfsd wants, you'd need to make that 0. I suppose you can work around it with noatime, but that has problems of its own. > And we must also keep in sight the fact that io_uring wants > non-blocking timestamp updates to be possible (for all types of > updates). Hence it looks to me like we have more than one use case > for per-operation/application specific timestamp update semantics. > Perhaps there's a generic solution to this problem (e.g. operation > specific non-blocking, in-memory pure timestamp updates) that does > what everyone needs... -- Jeff Layton <jlayton@xxxxxxxxxx>