On Tue, 2023-10-24 at 10:08 +0300, Amir Goldstein wrote: > On Tue, Oct 24, 2023 at 6:40 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Mon, Oct 23, 2023 at 02:18:12PM -1000, Linus Torvalds wrote: > > > On Mon, 23 Oct 2023 at 13:26, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > > > The problem is the first read request after a modification has been > > > > made. That is causing relatime to see mtime > atime and triggering > > > > an atime update. XFS sees this, does an atime update, and in > > > > committing that persistent inode metadata update, it calls > > > > inode_maybe_inc_iversion(force = false) to check if an iversion > > > > update is necessary. The VFS sees I_VERSION_QUERIED, and so it bumps > > > > i_version and tells XFS to persist it. > > > > > > Could we perhaps just have a mode where we don't increment i_version > > > for just atime updates? > > > > > > Maybe we don't even need a mode, and could just decide that atime > > > updates aren't i_version updates at all? > > > > We do that already - in memory atime updates don't bump i_version at > > all. The issue is the rare persistent atime update requests that > > still happen - they are the ones that trigger an i_version bump on > > XFS, and one of the relatime heuristics tickle this specific issue. > > > > If we push the problematic persistent atime updates to be in-memory > > updates only, then the whole problem with i_version goes away.... > > > > > Yes, yes, it's obviously technically a "inode modification", but does > > > anybody actually *want* atime updates with no actual other changes to > > > be version events? > > > > Well, yes, there was. That's why we defined i_version in the on disk > > format this way well over a decade ago. It was part of some deep > > dark magical HSM beans that allowed the application to combine > > multiple scans for different inode metadata changes into a single > > pass. atime changes was one of the things it needed to know about > > for tiering and space scavenging purposes.... > > > > But if this is such an ancient mystical program, why do we have to > keep this XFS behavior in the present? > BTW, is this the same HSM whose DMAPI ioctls were deprecated > a few years back? > > I mean, I understand that you do not want to change the behavior of > i_version update without an opt-in config or mount option - let the distro > make that choice. > But calling this an "on-disk format change" is a very long stretch. > > Does xfs_repair guarantee that changes of atime, or any inode changes > for that matter, update i_version? No, it does not. > So IMO, "atime does not update i_version" is not an "on-disk format change", > it is a runtime behavior change, just like lazytime is. > This would certainly be my preference. I don't want to break any existing users though. Perhaps this ought to be a mkfs option? Existing XFS filesystems could still behave with the legacy behavior, but we could make mkfs.xfs build filesystems by default that work like NFS requires. -- Jeff Layton <jlayton@xxxxxxxxxx>