Re: [PATCH] xfs: fix i_version handling in xfs

"NeilBrown" <neilb@xxxxxxx> · Fri, 19 Aug 2022 09:43:32 +1000

On Thu, 18 Aug 2022, Jeff Layton wrote:
> On Thu, 2022-08-18 at 10:34 +1000, NeilBrown wrote:
> > On Wed, 17 Aug 2022, Dave Chinner wrote:
> > > 
> > > In XFS, we've defined the on-disk i_version field to mean
> > > "increments with any persistent inode data or metadata change",
> > > regardless of what the high level applications that use i_version
> > > might actually require.
> > > 
> > > That some network filesystem might only need a subset of the
> > > metadata to be covered by i_version is largely irrelevant - if we
> > > don't cover every persistent inode metadata change with i_version,
> > > then applications that *need* stuff like atime change notification
> > > can't be supported.
> > 
> > So what you are saying is that the i_version provided by XFS does not
> > match the changeid semantics required by NFSv4.  Fair enough.  I guess
> > we shouldn't use the one to implement the other then.
> > 
> > Maybe we should just go back to using ctime.  ctime is *exactly* what
> > NFSv4 wants, as long as its granularity is sufficient to catch every
> > single change.  Presumably XFS doesn't try to ensure this.  How hard
> > would it be to get any ctime update to add at least one nanosecond?
> > This would be enabled by a mount option, or possibly be a direct request
> > from nfsd.
> > 
> 
> I think that would be an unfortunate outcome, but if we can't stop xfs
> from bumping the i_version on atime updates, then we may have no choice
> but to do so. I suppose we could add a fetch_iversion for xfs that takes
> it back to using the ctime.

"unfortunate" for who I wonder..

I think Trond's argument about not needing implicit updates to be
reflected i_version is sound - as the effective i_version can be
constructed from stored i_version plus all attributes, if you ever want
an effective i_version that covers all implicit changes to attributes.
However I doubt Dave will be convinced.

> 
> > <rant>NFSv4 changeid is really one of the more horrible parts of the
> > design</rant>
> > 
> 
> Hah! I was telling Tom Talpey yesterday that I thought that the change
> counter was one of the best ideas in NFSv4 and that we should be trying
> to get all filesystems to implement it correctly.
> 
> The part that does suck about the design is that the original specs
> weren't specific enough about its behavior. I think that's been somewhat
> remedied in more recent RFCs though.

That's enough bate for me to expand my rant...
The problem with changeid is that it imposes on the filesystem.  You
CANNOT provide a compliant NFSv4 services on a filesystem which has 1
second resolution of time stamps and no internal i_version.  When you
are designing a protocol - particularly one where interoperability is a
focus - making it impossible to export some common (at the time)
filesystems correctly is crazy.

And not at all necessary.  There is a much better way.

When reporting the ctime of a file, the server could add a Bool which is
true if that time is "now" to within the resolution of the timestamp.
If you see a "now" ctime, then you cannot use timestamps to validate any
cached data.  i.e.  a "now" ctime is different to every other ctime,
even another "now" ctime with the same timestamp.
If the filesystem has high resolution timestamps, it will never say
"now" and you get the same semantics as changeid.  If timestamps are low
resolution, then you cannot cache a file while it is changing (unless
you get a read delegation), but you can accurately cache a file as long
as it doesn't change - which is all that changeid gives you anyway.

NeilBrown

> -- 
> Jeff Layton <jlayton@xxxxxxxxxx>
>