On Tue, 2022-09-27 at 08:43 +1000, NeilBrown wrote: > On Fri, 23 Sep 2022, Jeff Layton wrote: > > > > Absolutely. That is the downside of this approach, but the priority here > > has always been to improve nfsd. If we don't get the ability to present > > this info via statx, then so be it. Later on, I suppose we can move that > > handling into the kernel in some fashion if we decide it's worthwhile. > > > > That said, not having this in statx makes it more difficult to test > > i_version behavior. Maybe we can add a generic ioctl for that in the > > interim? > > I wonder if we are over-thinking this, trying too hard, making "perfect" > the enemy of "good". > While we agree that the current implementation of i_version is > imperfect, it isn't causing major data corruption all around the world. > I don't think there are even any known bug reports are there? > So while we do want to fix it as best we can, we don't need to make that > the first priority. > > I think the first priority should be to document how we want it to work, > which is what this thread is really all about. The documentation can > note that some (all) filesystems do not provide perfect semantics across > unclean restarts, and can list any other anomalies that we are aware of. > And on that basis we can export the current i_version to user-space via > statx and start trying to write some test code. > > We can then look at moving the i_version/ctime update from *before* the > write to *after* the write, and any other improvements that can be > achieved easily in common code. We can then update the man page to say > "since Linux 6.42, this list of anomalies is no longer present". > > Then we can explore some options for handling unclean restart - in a > context where we can write tests and maybe even demonstrate a concrete > problem before we start trying to fix it. > We can also argue that crash resilience isn't a hard requirement for all possible applications. We'll definitely need some sort of mitigation for nfsd so we can claim that it's MONOTONIC [1], but local applications may not care whether the value rolls backward after a crash, since they would have presumably crashed as well and may not be persisting values. IOW, I think I agree with Dave C. that crash resilience for regular files is best handled at the application level (with the first application being knfsd). RFC 7862 requires that the change_attr_type be homogeneous across the entire filesystem, so we don't have the option of deciding that on a per-inode basis. If we want to advertise it, we have ensure that all inode types conform. I think for nfsd, a crash counter tracked in userland by nfsdcld multiplied by some large number of reasonable version bumps in a jiffy would work well and allow us to go back to advertising the value as MONOTONIC. That's a bit of a project though and may take a while. For presentation via statx, maybe we can create a STATX_ATTR_VERSION_MONOTONIC bit for stx_attributes for when the filesystem can provide that sort of guarantee. I may just add that internally for now anyway, since that would make for nicer layering. [1]: https://datatracker.ietf.org/doc/html/rfc7862#section-12.2.3 -- Jeff Layton <jlayton@xxxxxxxxxx>