On Mon, 30 Oct 2023 at 12:37, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > If XFS can ignore relatime or lazytime persistent updates for given > situations, then *we don't need to make periodic on-disk updates of > atime*. This makes the whole problem of "persistent atime update bumps > i_version" go away because then we *aren't making persistent atime > updates* except when some other persistent modification that bumps > [cm]time occurs. Well, I think this should be split into two independent questions: (a) are relatime or lazytime atime updates persistent if nothing else changes? (b) do atime updates _ever_ update i_version *regardless* of relatime or lazytime? and honestly, I think the best answer to (b) would be that "no, i_version should simply not change for atime updates". And I think that answer is what it is because no user of i_version seems to want it. Now, the reason it's a single question for you is that apparently for XFS, the only thing that matters is "inode was written to disk" and that "di_changecount" value is thus related to the persistence of atime updates, but splitting di_changecount out to be a separate thing from i_version seems to be on the table, so I think those two things really could be independent issues. > But I don't want to do this unconditionally - for systems not > running anything that samples i_version we want relatime/lazytime > to behave as they are supposed to and do periodic persistent updates > as per normal. Principle of least surprise and all that jazz. Well - see above: I think in a perfect world, we'd simply never change i_version at all for any atime updates, and relatime/lazytime simply wouldn't be an issue at all wrt i_version. Wouldn't _that_ be the trule "least surprising" behavior? Considering that nobody wants i_version to change for what are otherwise pure reads (that's kind of the *definition* of atime, after all). Now, the annoyance here is that *both* (a) and (b) then have that impact of "i_version no longer tracks di_changecount". So I don't think the issue here is "i_version" per se. I think in a vacuum, the best option of i_version is pretty obvious. But if you want i_version to track di_changecount, *then* you end up with that situation where the persistence of atime matters, and i_version needs to update whenever a (persistent) atime update happens. > So we really need an indication for inodes that we should enable this > mode for the inode. I have asked if we can have per-operation > context flag to trigger this given the needs for io_uring to have > context flags for timestamp updates to be added. I really think some kind of new and even *more* complex and non-intuitive behavior is the worst of both worlds. Having atime updates be conditionally persistent - on top of already being delayed by lazytime/relatime - and having the persistence magically change depending on whether something wants to get i_version updates - sounds just completely crazy. Particularly as *none* of the people who want i_version updates actually want them for atime at all. So I really think this all boils down to "is xfs really willing to split bi_changecount from i_version"? > I have asked if we can have an inode flag set by the VFS or > application code for this. e.g. a flag set by nfsd whenever it accesses a > given inode. > > I have asked if this inode flag can just be triggered if we ever see > I_VERSION_QUERIED set or statx is used to retrieve a change cookie, > and whether this is a reliable mechanism for setting such a flag. See above: linking this to I_VERSION_QUERIED is horrific. The people who set that bit do *NOT* want atime updates to change i_version under any circumstances. It was always a mistake. This really is all *entirely* an artifact of that "bi_changecount" vs "i_version" being tied together. You did seem to imply that you'd be ok with having "bi_changecount" be split from i_version, ie from an earlier email in this thread: "Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get the change cookie, we really don't need to expose di_changecount in i_version at all - we could simply copy an internal di_changecount value into the statx cookie field in xfs_vn_getattr() and there would be almost no change of behaviour from the perspective of NFS and IMA at all" but while I suspect *that* part is easy and straightforward, the problem then becomes one of "what about the persistence of i_version", and then you'd need a new field for *that* anyway, and would want a new on-disk format regardless. Linus