Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

Jan Kara <jack@xxxxxxx> · Fri, 23 Sep 2022 11:56:53 +0200

On Thu 22-09-22 16:18:02, Jeff Layton wrote:
> On Thu, 2022-09-22 at 06:18 -0400, Jeff Layton wrote:
> > On Thu, 2022-09-22 at 07:41 +1000, Dave Chinner wrote:
> > > e.g. The NFS server can track the i_version values when the NFSD
> > > syncs/commits a given inode. The nfsd can sample i_version it when
> > > calls ->commit_metadata or flushed data on the inode, and then when
> > > it peeks at i_version when gathering post-op attrs (or any other
> > > getattr op) it can decide that there is too much in-memory change
> > > (e.g. 10,000 counts since last sync) and sync the inode.
> > > 
> > > i.e. the NFS server can trivially cap the maximum number of
> > > uncommitted NFS change attr bumps it allows to build up in memory.
> > > At that point, the NFS server has a bound "maximum write count" that
> > > can be used in conjunction with the xattr based crash counter to
> > > determine how the change_attr is bumped by the crash counter.
> > 
> > Well, not "trivially". This is the bit where we have to grow struct
> > inode (or the fs-specific inode), as we'll need to know what the latest
> > on-disk value is for the inode.
> > 
> > I'm leaning toward doing this on the query side. Basically, when nfsd
> > goes to query the i_version, it'll check the delta between the current
> > version and the latest one on disk. If it's bigger than X then we'd just
> > return NFS4ERR_DELAY to the client.
> > 
> > If the delta is >X/2, maybe it can kick off a workqueue job or something
> > that calls write_inode with WB_SYNC_ALL to try to get the thing onto the
> > platter ASAP.
> 
> Still looking at this bit too. Probably we can just kick off a
> WB_SYNC_NONE filemap_fdatawrite at that point and hope for the best?

"Hope" is not a great assurance regarding data integrity ;) Anyway, it
depends on how you imagine the "i_version on disk" is going to be
maintained. It could be maintained by NFSD inside commit_inode_metadata() -
fetch current i_version value before asking filesystem for the sync and by the
time commit_metadata() returns we know that value is on disk. If we detect the
current - on_disk is > X/2, we call commit_inode_metadata() and we are
done. It is not even *that* expensive because usually filesystems optimize
away unnecessary IO when the inode didn't change since last time it got
synced.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR