Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

bfields@xxxxxxxxxxxx (J. Bruce Fields) · Mon, 12 Sep 2022 08:54:25 -0400

On Mon, Sep 12, 2022 at 07:42:16AM -0400, Jeff Layton wrote:
> A scheme like that could work. It might be hard to do it without a
> spinlock or something, but maybe that's ok. Thinking more about how we'd
> implement this in the underlying filesystems:
> 
> To do this we'd need 2 64-bit fields in the on-disk and in-memory 
> superblocks for ext4, xfs and btrfs. On the first mount after a crash,
> the filesystem would need to bump s_version_max by the significant
> increment (2^40 bits or whatever). On a "clean" mount, it wouldn't need
> to do that.
> 
> Would there be a way to ensure that the new s_version_max value has made
> it to disk? Bumping it by a large value and hoping for the best might be
> ok for most cases, but there are always outliers, so it might be
> worthwhile to make an i_version increment wait on that if necessary. 

I was imagining that when you recognize you're getting close, you kick
off something which writes s_version_max+2^40 to disk, and then updates
s_version_max to that new value on success of the write.

The code that increments i_version checks to make sure it wouldn't
exceed s_version_max.  If it would, something has gone wrong--a write
has failed or taken a long time--so it waits or errors out or something,
depending on desired filesystem behavior in that case.

No locking required in the normal case?

--b.