Re: [PATCH v8 0/5] fs: multigrain timestamps for XFS's change_cookie

Jeff Layton <jlayton@xxxxxxxxxx> · Sat, 23 Sep 2023 06:22:54 -0400

On Sat, 2023-09-23 at 10:15 +0300, Amir Goldstein wrote:
> On Fri, Sep 22, 2023 at 8:15 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > 
> > My initial goal was to implement multigrain timestamps on most major
> > filesystems, so we could present them to userland, and use them for
> > NFSv3, etc.
> > 
> > With the current implementation however, we can't guarantee that a file
> > with a coarse grained timestamp modified after one with a fine grained
> > timestamp will always appear to have a later value. This could confuse
> > some programs like make, rsync, find, etc. that depend on strict
> > ordering requirements for timestamps.
> > 
> > The goal of this version is more modest: fix XFS' change attribute.
> > XFS's change attribute is bumped on atime updates in addition to other
> > deliberate changes. This makes it unsuitable for export via nfsd.
> > 
> > Jan Kara suggested keeping this functionality internal-only for now and
> > plumbing the fine grained timestamps through getattr [1]. This set takes
> > a slightly different approach and has XFS use the fine-grained attr to
> > fake up STATX_CHANGE_COOKIE in its getattr routine itself.
> > 
> > While we keep fine-grained timestamps in struct inode, when presenting
> > the timestamps via getattr, we truncate them at a granularity of number
> > of ns per jiffy,
> 
> That's not good, because user explicitly set granular mtime would be
> truncated too and booting with different kernels (HZ) would change
> the observed timestamps of files.
> 

That's a very good point.

> > which allows us to smooth over the fuzz that causes
> > ordering problems.
> > 
> 
> The reported ordering problems (i.e. cp -u) is not even limited to the
> scope of a single fs, right?
> 

It isn't. Most of the tools we're concerned with don't generally care
about filesystem boundaries.

> Thinking out loud - if the QERIED bit was not per inode timestamp
> but instead in a global fs_multigrain_ts variable, then all the inodes
> of all the mgtime fs would be using globally ordered timestamps
>
> That should eliminate the reported issues with time reorder for
> fine vs coarse grained timestamps.
> 
> The risk of extra unneeded "change cookie" updates compared to
> per inode QUERIED bit may exist, but I think it is a rather small overhead
> and maybe worth the tradeoff of having to maintain a real per inode
> "change cookie" in addition to a "globally ordered mgtime"?
> 
> If this idea is acceptable, you may still be able to salvage the reverted
> ctime series for 6.7, because the change to use global mgtime should
> be quite trivial?
> 

This is basically the idea I was going to look at next once I got some
other stuff settled here: Basically, when we apply a fine-grained
timestamp to an inode, we'd advance the coarse-grained clock that
filesystems use to that value.

It could cause some write amplification: if you are streaming writes to
a bunch of files at the same time and someone stats one of them, then
they'd all end up getting an extra inode transaction. That doesn't sound
_too_ bad on its face, but I probably need to implement it and then run
some numbers to see.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>