On Sat, 2023-09-23 at 10:15 +0300, Amir Goldstein wrote: > On Fri, Sep 22, 2023 at 8:15 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > My initial goal was to implement multigrain timestamps on most major > > filesystems, so we could present them to userland, and use them for > > NFSv3, etc. > > > > With the current implementation however, we can't guarantee that a file > > with a coarse grained timestamp modified after one with a fine grained > > timestamp will always appear to have a later value. This could confuse > > some programs like make, rsync, find, etc. that depend on strict > > ordering requirements for timestamps. > > > > The goal of this version is more modest: fix XFS' change attribute. > > XFS's change attribute is bumped on atime updates in addition to other > > deliberate changes. This makes it unsuitable for export via nfsd. > > > > Jan Kara suggested keeping this functionality internal-only for now and > > plumbing the fine grained timestamps through getattr [1]. This set takes > > a slightly different approach and has XFS use the fine-grained attr to > > fake up STATX_CHANGE_COOKIE in its getattr routine itself. > > > > While we keep fine-grained timestamps in struct inode, when presenting > > the timestamps via getattr, we truncate them at a granularity of number > > of ns per jiffy, > > That's not good, because user explicitly set granular mtime would be > truncated too and booting with different kernels (HZ) would change > the observed timestamps of files. > That's a very good point. > > which allows us to smooth over the fuzz that causes > > ordering problems. > > > > The reported ordering problems (i.e. cp -u) is not even limited to the > scope of a single fs, right? > It isn't. Most of the tools we're concerned with don't generally care about filesystem boundaries. > Thinking out loud - if the QERIED bit was not per inode timestamp > but instead in a global fs_multigrain_ts variable, then all the inodes > of all the mgtime fs would be using globally ordered timestamps > > That should eliminate the reported issues with time reorder for > fine vs coarse grained timestamps. > > The risk of extra unneeded "change cookie" updates compared to > per inode QUERIED bit may exist, but I think it is a rather small overhead > and maybe worth the tradeoff of having to maintain a real per inode > "change cookie" in addition to a "globally ordered mgtime"? > > If this idea is acceptable, you may still be able to salvage the reverted > ctime series for 6.7, because the change to use global mgtime should > be quite trivial? > This is basically the idea I was going to look at next once I got some other stuff settled here: Basically, when we apply a fine-grained timestamp to an inode, we'd advance the coarse-grained clock that filesystems use to that value. It could cause some write amplification: if you are streaming writes to a bunch of files at the same time and someone stats one of them, then they'd all end up getting an extra inode transaction. That doesn't sound _too_ bad on its face, but I probably need to implement it and then run some numbers to see. -- Jeff Layton <jlayton@xxxxxxxxxx>