On Fri, Oct 27, 2023 at 06:35:58AM -0400, Jeff Layton wrote: > On Thu, 2023-10-26 at 13:20 +1100, Dave Chinner wrote: > > On Wed, Oct 25, 2023 at 08:25:35AM -0400, Jeff Layton wrote: > > > On Wed, 2023-10-25 at 19:05 +1100, Dave Chinner wrote: > > > > On Tue, Oct 24, 2023 at 02:40:06PM -0400, Jeff Layton wrote: > > > In earlier discussions you alluded to some repair and/or analysis tools > > > that depended on this counter. > > > > Yes, and one of those "tools" is *me*. > > > > I frequently look at the di_changecount when doing forensic and/or > > failure analysis on filesystem corpses. SOE analysis, relative > > modification activity, etc all give insight into what happened to > > the filesystem to get it into the state it is currently in, and > > di_changecount provides information no other metadata in the inode > > contains. > > > > > I took a quick look in xfsprogs, but I > > > didn't see anything there. Is there a library or something that these > > > tools use to get at this value? > > > > xfs_db is the tool I use for this, such as: > > > > $ sudo xfs_db -c "sb 0" -c "a rootino" -c "p v3.change_count" /dev/mapper/fast > > v3.change_count = 35 > > $ > > > > The root inode in this filesystem has a change count of 35. The root > > inode has 32 dirents in it, which means that no entries have ever > > been removed or renamed. This sort of insight into the past history > > of inode metadata is largely impossible to get any other way, and > > it's been the difference between understanding failure and having no > > clue more than once. > > > > Most block device parsing applications simply write their own > > decoder that walks the on-disk format. That's pretty trivial to do, > > developers can get all the information needed to do this from the > > on-disk format specification documentation we keep on kernel.org... > > > > Fair enough. I'm not here to tell you that you guys that you need to > change how di_changecount works. If it's too valuable to keep it > counting atime-only updates, then so be it. > > If that's the case however, and given that the multigrain timestamp work > is effectively dead, then I don't see an alternative to growing the on- > disk inode. Do you? Yes, I do see alternatives. That's what I've been trying (unsuccessfully) to describe and get consensus on. I feel like I'm being ignored and rail-roaded here, because nobody is even acknowledging that I'm proposing alternatives and keeps insisting that the only solution is a change of on-disk format. So, I'll summarise the situation *yet again* in the hope that this time I won't get people arguing about atime vs i-version and what constitutes an on-disk format change because that goes nowhere and does nothing to determine which solution might be acceptible. The basic situation is this: If XFS can ignore relatime or lazytime persistent updates for given situations, then *we don't need to make periodic on-disk updates of atime*. This makes the whole problem of "persistent atime update bumps i_version" go away because then we *aren't making persistent atime updates* except when some other persistent modification that bumps [cm]time occurs. But I don't want to do this unconditionally - for systems not running anything that samples i_version we want relatime/lazytime to behave as they are supposed to and do periodic persistent updates as per normal. Principle of least surprise and all that jazz. So we really need an indication for inodes that we should enable this mode for the inode. I have asked if we can have per-operation context flag to trigger this given the needs for io_uring to have context flags for timestamp updates to be added. I have asked if we can have an inode flag set by the VFS or application code for this. e.g. a flag set by nfsd whenever it accesses a given inode. I have asked if this inode flag can just be triggered if we ever see I_VERSION_QUERIED set or statx is used to retrieve a change cookie, and whether this is a reliable mechanism for setting such a flag. I have suggested mechanisms for using masked off bits of timestamps to encode sub-timestamp granularity change counts and keep them invisible to userspace and then not using i_version at all for XFS. This avoids all the problems that the multi-grain timestamp infrastructure exposed due to variable granularity of user visible timestamps and ordering across inodes with different granularity. This is potentially a general solution, too. So, yeah, there are *lots* of ways we can solve this problem without needing to change on-disk formats. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx