On Sat, 2022-08-27 at 12:10 -0400, Jeff Layton wrote: > On Sat, 2022-08-27 at 16:03 +0000, Trond Myklebust wrote: > > On Sat, 2022-08-27 at 08:46 -0700, Darrick J. Wong wrote: > > > On Sat, Aug 27, 2022 at 09:14:30AM -0400, Jeff Layton wrote: > > > > On Sat, 2022-08-27 at 11:01 +0300, Amir Goldstein wrote: > > > > > On Sat, Aug 27, 2022 at 10:26 AM Amir Goldstein > > > > > <amir73il@xxxxxxxxx> wrote: > > > > > > > > > > > > On Sat, Aug 27, 2022 at 12:49 AM Jeff Layton > > > > > > <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > xfs will update the i_version when updating only the > > > > > > > atime > > > > > > > value, which > > > > > > > is not desirable for any of the current consumers of > > > > > > > i_version. Doing so > > > > > > > leads to unnecessary cache invalidations on NFS and extra > > > > > > > measurement > > > > > > > activity in IMA. > > > > > > > > > > > > > > Add a new XFS_ILOG_NOIVER flag, and use that to indicate > > > > > > > that > > > > > > > the > > > > > > > transaction should not update the i_version. Set that > > > > > > > value > > > > > > > in > > > > > > > xfs_vn_update_time if we're only updating the atime. > > > > > > > > > > > > > > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > > > > > > > Cc: NeilBrown <neilb@xxxxxxx> > > > > > > > Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> > > > > > > > Cc: David Wysochanski <dwysocha@xxxxxxxxxx> > > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > --- > > > > > > > fs/xfs/libxfs/xfs_log_format.h | 2 +- > > > > > > > fs/xfs/libxfs/xfs_trans_inode.c | 2 +- > > > > > > > fs/xfs/xfs_iops.c | 11 +++++++++-- > > > > > > > 3 files changed, 11 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > Dave has NACK'ed this patch, but I'm sending it as a way > > > > > > > to > > > > > > > illustrate > > > > > > > the problem. I still think this approach should at least > > > > > > > fix > > > > > > > the worst > > > > > > > problems with atime updates being counted. We can look to > > > > > > > carve out > > > > > > > other "spurious" i_version updates as we identify them. > > > > > > > > > > > > > > > > > > > AFAIK, "spurious" is only inode blocks map changes due to > > > > > > writeback > > > > > > of dirty pages. Anybody know about other cases? > > > > > > > > > > > > Regarding inode blocks map changes, first of all, I don't > > > > > > think > > > > > > that there is > > > > > > any practical loss from invalidating NFS client cache on > > > > > > dirty > > > > > > data writeback, > > > > > > because NFS server should be serving cold data most of the > > > > > > time. > > > > > > If there are a few unneeded cache invalidations they would > > > > > > only > > > > > > be temporary. > > > > > > > > > > > > > > > > Unless there is an issue with a writer NFS client that > > > > > invalidates its > > > > > own attribute > > > > > caches on server data writeback? > > > > > > > > > > > > > The client just looks at the file attributes (of which > > > > i_version is > > > > but > > > > one), and if certain attributes have changed (mtime, ctime, > > > > i_version, > > > > etc...) then it invalidates its cache. > > > > > > > > In the case of blocks map changes, could that mean a difference > > > > in > > > > the > > > > observable sparse regions of the file? If so, then a READ_PLUS > > > > before > > > > the change and a READ_PLUS after could give different results. > > > > Since > > > > that difference is observable by the client, I'd think we'd > > > > want to > > > > bump > > > > i_version for that anyway. > > > > > > How /is/ READ_PLUS supposed to detect sparse regions, anyway? I > > > know > > > that's been the subject of recent debate. At least as far as XFS > > > is > > > concerned, a file range can go from hole -> delayed allocation > > > reservation -> unwritten extent -> (actual writeback) -> written > > > extent. > > > The dance became rather more complex when we added COW. If any > > > of > > > that > > > will make a difference for READ_PLUS, then yes, I think you'd > > > want > > > file > > > writeback activities to bump iversion to cause client > > > invalidations, > > > like (I think) Dave said. > > > > > > The fs/iomap/ implementation of SEEK_DATA/SEEK_HOLE reports data > > > for > > > written and delalloc extents; and an unwritten extent will report > > > data > > > for any pagecache it finds. > > > > > > > READ_PLUS should never return anything different than a read() > > system > > call would return for any given area. The way it reports sparse > > regions > > vs. data regions is purely an RPC formatting convenience. > > > > The only point to note about NFS READ and READ_PLUS is that because > > the > > client is forced to send multiple RPC calls if the user is trying > > to > > read a region that is larger than the 'rsize' value, it is possible > > that these READ/READ_PLUS calls may be processed out of order, and > > so > > the result may end up looking different than if you had executed a > > read() call for the full region directly on the server. > > However each individual READ / READ_PLUS reply should look as if > > the > > user had called read() on that rsize-sized section of the file. > > > > > > Yeah, thinking about it some more, simply changing the block > allocation > is not something that should affect the ctime, so we probably don't > want > to bump i_version on it. It's an implicit change, IOW, not an > explicit > one. As you say, it is unfortunate that XFS does this, and it is unfortunate that it then changes the blocks allocated attribute post-fsync(), since all that does cause confusion for certain applications. However I agree 100% that this is an implicit change that is driven by the filesystem and not the user application. Hence it is not an action that needs to be recorded with a change attribute bump. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx