On Mon, Oct 30, 2023 at 01:11:56PM -1000, Linus Torvalds wrote: > On Mon, 30 Oct 2023 at 12:37, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > If XFS can ignore relatime or lazytime persistent updates for given > > situations, then *we don't need to make periodic on-disk updates of > > atime*. This makes the whole problem of "persistent atime update bumps > > i_version" go away because then we *aren't making persistent atime > > updates* except when some other persistent modification that bumps > > [cm]time occurs. > > Well, I think this should be split into two independent questions: > > (a) are relatime or lazytime atime updates persistent if nothing else changes? They only become persistent after 24 hours or, in the case of relatime, immediately persistent if mtime < atime (i.e. read after a modification). Those are the only times that the VFS triggers persistent writeback of atime, and it's the latter case (mtime < atime) that is the specific trigger that exposed the problem with atime bumping i_version in the first place. > (b) do atime updates _ever_ update i_version *regardless* of relatime > or lazytime? > > and honestly, I think the best answer to (b) would be that "no, > i_version should simply not change for atime updates". And I think > that answer is what it is because no user of i_version seems to want > it. As I keep repeating: Repeatedly stating that "atime should not bump i_version" does not address the questions I'm asking *at all*. > Now, the reason it's a single question for you is that apparently for > XFS, the only thing that matters is "inode was written to disk" and > that "di_changecount" value is thus related to the persistence of > atime updates, but splitting di_changecount out to be a separate thing > from i_version seems to be on the table, so I think those two things > really could be independent issues. Wrong way around - we'd have to split i_version out from di_changecount. It's i_version that has changed semantics, not di_changecount, and di_changecount behaviour must remain unchanged. What I really don't want to do is implement a new i_version field in the XFS on-disk format. What this redefinition of i_version semantics has made clear is that i_version is *user defined metadata*, not internal filesystem metadata that is defined by the filesystem on-disk format. User defined persistent metadata *belongs in xattrs*, not in the core filesystem on-disk formats. If the VFS wants to define and manage i_version behaviour, smeantics and persistence independently of the filesystems that manage the persistent storage (as it clearly does!) then we should treat it just like any other VFS defined inode metadata (e.g. per inode objects like security constraints, ACLs, fsverity digests, fscrypt keys, etc). i.e. it should be in a named xattr, not directly implemented in the filesystem on-disk format deinfitions. Then the application can change the meaning of the metadata whenever and however it likes. Then filesystem developers just don't need to care about it at all because the VFS specific persistent metadata is not part of the on-disk format we need to maintain cross-platform forwards and backwards compatibility for. > > But I don't want to do this unconditionally - for systems not > > running anything that samples i_version we want relatime/lazytime > > to behave as they are supposed to and do periodic persistent updates > > as per normal. Principle of least surprise and all that jazz. > > Well - see above: I think in a perfect world, we'd simply never change > i_version at all for any atime updates, and relatime/lazytime simply > wouldn't be an issue at all wrt i_version. Right, that's what I'd like, especially as the new definition of i_version - "only change when [cm]time changes" - means that the VFS i_version is really now just a glorified timestamp. > Wouldn't _that_ be the trule "least surprising" behavior? Considering > that nobody wants i_version to change for what are otherwise pure > reads (that's kind of the *definition* of atime, after all). So, if you don't like the idea of us ignoring relatime/lazytime conditionally, are we allowed to simply ignore them *all the time* and do all timestamp updates in the manner that causes users the least amount of pain? I mean, relatime only exists because atime updates cause users pain. lazytime only exists because relatime doesn't address the pain that timestamp updates cause mixed read/write or pure O_DSYNC overwrite workloads pain. noatime is a pain because it loses all atime updates. There is no "one size is right for everyone", so why not just let filesystems do what is most efficient from an internal IO and persistence POV whilst still maintaining the majority of "expected" behaviours? Keep in mind, though, that this is all moot if we can get rid of i_version entirely.... > Now, the annoyance here is that *both* (a) and (b) then have that > impact of "i_version no longer tracks di_changecount". .... and what is annoying is that that the new i_version just a glorified ctime change counter. What we should be fixing is ctime - integrating this change counting into ctime would allow us to make i_version go away entirely. i.e. We don't need a persistent ctime change counter if the ctime has sufficient resolution or persistent encoding that it does not need an external persistent change counter. That was reasoning behind the multi-grain timestamps. While the mgts implementation was flawed, the reasoning behind it certainly isn't. We should be trying to get rid of i_version by integrating it into ctime updates, not arguing how atime vs i_version should work. > So I don't think the issue here is "i_version" per se. I think in a > vacuum, the best option of i_version is pretty obvious. But if you > want i_version to track di_changecount, *then* you end up with that > situation where the persistence of atime matters, and i_version needs > to update whenever a (persistent) atime update happens. Yet I don't want i_version to track di_changecount. I want to *stop supporting i_version altogether* in XFS. I want i_version as filesystem internal metadata to die completely. I don't want to change the on disk format to add a new i_version field because we'll be straight back in this same siutation when the next i_version bug is found and semantics get changed yet again. Hence if we can encode the necessary change attributes into ctime, we can drop VFS i_version support altogether. Then the "atime bumps i_version" problem also goes away because then we *don't use i_version*. But if we can't get the VFS to do this with ctime, at least we have the abstractions available to us (i.e. timestamp granularity and statx change cookie) to allow XFS to implement this sort of ctime-with-integrated-change-counter internally to the filesystem and be able to drop i_version support.... [....] > This really is all *entirely* an artifact of that "bi_changecount" vs > "i_version" being tied together. You did seem to imply that you'd be > ok with having "bi_changecount" be split from i_version, ie from an > earlier email in this thread: > > "Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get > the change cookie, we really don't need to expose di_changecount in > i_version at all - we could simply copy an internal di_changecount > value into the statx cookie field in xfs_vn_getattr() and there > would be almost no change of behaviour from the perspective of NFS > and IMA at all" .... which is what I was talking about here. i.e. I was not talking about splitting i_version from di_changecount - I was talking about being able to stop supporting the VFS i_version counter entirely and still having NFS and IMA work correctly. Continually bring the argument back to "atime vs i_version" misses the bigger issues around this new i_version definition and implementation, and that the real solution should be to fix ctime updates to make i_version at the VFS level go away forever. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx