On Wed, Sep 21, 2022 at 06:33:28AM -0400, Jeff Layton wrote: > On Wed, 2022-09-21 at 10:00 +1000, Dave Chinner wrote: > > > How do we determine what that offset should be? Your last email > > > suggested that there really is no limit to the number of i_version bumps > > > that can happen in memory before one of them makes it to disk. What can > > > we do to address that? > > > > <shrug> > > > > I'm just pointing out problems I see when defining this as behaviour > > for on-disk format purposes. If we define it as part of the on-disk > > format, then we have to be concerned about how it may be used > > outside the scope of just the NFS server application. > > > > However, If NFS keeps this metadata and functionaly entirely > > contained at the application level via xattrs, I really don't care > > what algorithm NFS developers decides to use for their crash > > sequencing. It's not my concern at this point, and that's precisely > > why NFS should be using xattrs for this NFS specific functionality. > > > > I get it: you'd rather not have to deal with what you see as an NFS > problem, but I don't get how what you're proposing solves anything. We > might be able to use that scheme to detect crashes, but that's only part > of the problem (and it's a relatively simple part of the problem to > solve, really). > > Maybe you can clarify it for me: > > Suppose we go with what you're saying and store some information in > xattrs that allows us to detect crashes in some fashion. The server > crashes and comes back up and we detect that there was a crash earlier. > > What does nfsd need to do now to ensure that it doesn't hand out a > duplicate change attribute? As I've already stated, the NFS server can hold the persistent NFS crash counter value in a second xattr that it bumps whenever it detects a crash and hence we take the local filesystem completely out of the equation. How the crash counter is then used by the nfsd to fold it into the NFS protocol change attribute is a nfsd problem, not a local filesystem problem. If you're worried about maximum number of writes outstanding vs i_version bumps that are held in memory, then *bound the maximum number of uncommitted i_version changes that the NFS server will allow to build up in memory*. By moving the crash counter to being a NFS server only function, the NFS server controls the entire algorithm and it doesn't have to care about external 3rd party considerations like local filesystems have to. e.g. The NFS server can track the i_version values when the NFSD syncs/commits a given inode. The nfsd can sample i_version it when calls ->commit_metadata or flushed data on the inode, and then when it peeks at i_version when gathering post-op attrs (or any other getattr op) it can decide that there is too much in-memory change (e.g. 10,000 counts since last sync) and sync the inode. i.e. the NFS server can trivially cap the maximum number of uncommitted NFS change attr bumps it allows to build up in memory. At that point, the NFS server has a bound "maximum write count" that can be used in conjunction with the xattr based crash counter to determine how the change_attr is bumped by the crash counter. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx