Re: [PATCH] xfs: fix i_version handling in xfs

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 18 Aug 2022 04:15:03 +0000

On Thu, 2022-08-18 at 13:37 +1000, Dave Chinner wrote:
> On Thu, Aug 18, 2022 at 01:11:09AM +0000, Trond Myklebust wrote:
> > On Wed, 2022-08-17 at 08:42 +1000, Dave Chinner wrote:
> > > 
> > > In XFS, we've defined the on-disk i_version field to mean
> > > "increments with any persistent inode data or metadata change",
> > > regardless of what the high level applications that use i_version
> > > might actually require.
> > > 
> > > That some network filesystem might only need a subset of the
> > > metadata to be covered by i_version is largely irrelevant - if we
> > > don't cover every persistent inode metadata change with
> > > i_version,
> > > then applications that *need* stuff like atime change
> > > notification
> > > can't be supported.
> > 
> > OK, I'll bite...
> > 
> > What real world application are we talking about here, and why
> > can't it
> > just read both the atime + i_version if it cares?
> 
> The whole point of i_version is that the aplication can skip the
> storage and comparison of individual metadata fields to determine if
> anythign changed. If you're going to store multiple fields and
> compare them all in addition to the change attribute, then what is
> the change attribute actually gaining you?

Information that is not contained in the fields themselves. Such as
metadata about the fact that they were explicitly changed by an
application.

> 
> > The value of the change attribute lies in the fact that it gives
> > you
> > ctime semantics without the time resolution limitation.
> > i.e. if the change attribute has changed, then you know that
> > someone
> > has explicitly modified either the file data or the file metadata
> > (with
> > the emphasis being on the word "explicitly").
> > Implicit changes such as the mtime change due to a write are
> > reflected
> > only because they are necessarily also accompanied by an explicit
> > change to the data contents of the file.
> > Implicit changes, such as the atime changes due to a read are not
> > reflected in the change attribute because there is no explicit
> > change
> > being made by an application.
> 
> That's the *NFSv4 requirements*, not what people were asking XFS to
> support in a persistent change attribute 10-15 years ago. What NFS
> required was just one of the inputs at the time, and what we
> implemented has kept NFSv4 happy for the past decade. I've mentioned
> other requirements elsewhere in the thread

NFS can work with a change attribute that tells it to invalidate its
cache on every read. The only side effect will be that the performance
graph will act as if you were filtering it through a cow's digestive
system...

> The problem we're talking about here is essentially a relatime
> filtering issue; it's triggering an filesystem update because the
> first access after a modification triggers an on-disk atime update
> rahter than just storing it in memory.

No. It's not... You appear to be discarding valuable information about
why the attributes changed. I've been asking you for reasons why, and
you're avoiding the question.

> This is not a filesystem issue - the VFS controls when the on-disk
> updates occur, and that what NFSv4 appears to need changed.
> If NFS doesn't want the filesystem to bump change counters for
> on-disk atime updates, then it should be asking the VFS to keep the
> atime updates in memory. e.g. use lazytime semantics.
> 
> This way, every filesystem will have the same behaviour, regardless
> of how they track/store persistent change count metadata.

Right now, the i_version updates are not exported via any common API,
so any piss poor performance side-effects of the implementation are
pretty much limited to the kernel users (NFS and... ???)

Who do you expect to use this attribute if it were to be exported via
statx() as Jeff is proposing, and why is the XFS behaviour appropriate?
It already differs from the behaviour of both btrfs and NFS, so the
argument that this will magically consolidate behaviour can be ignored.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx