On Wed, Nov 1, 2023 at 12:16 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 01-11-23 08:57:09, Dave Chinner wrote: > > 5. When-ever the inode is persisted, the timestamp is copied to the > > on-disk structure and the current change counter is folded in. > > > > This means the on-disk structure always contains the latest > > change attribute that has been persisted, just like we > > currently do with i_version now. > > > > 6. When-ever we read the inode off disk, we split the change counter > > from the timestamp and update the appropriate internal structures > > with this information. > > > > This ensures that the VFS and userspace never see the change > > counter implementation in the inode timestamps. > > OK, but is this compatible with the current XFS behavior? AFAICS currently > XFS sets sb->s_time_gran to 1 so timestamps currently stored on disk will > have some mostly random garbage in low bits of the ctime. Now if you look > at such inode with a kernel using this new scheme, stat(2) will report > ctime with low bits zeroed-out so if the ctime fetched in the old kernel was > stored in some external database and compared to the newly fetched ctime, it > will appear that ctime has gone backwards... Maybe we don't care but it is > a user visible change that can potentially confuse something. See xfs_inode_has_bigtime() and auto-upgrade of inode format in xfs_inode_item_precommit(). In the case of BIGTIME inode format, admin needs to opt-in to BIGTIME format conversion by setting an INCOMPAT_BIGTIME sb feature flag. I imagine that something similar (inode flag + sb flag) would need to be done for the versioned-timestamp, but I think that in that case, the feature flag could be RO_COMPAT - there is no harm in exposing made-up nsec lower bits if fs would be mounted read-only on an old kernel, is there? The same RO_COMPAT feature flag could also be used to determine s_time_gran, because IIUC, s_time_gran for timestamp updates is uniform across all inodes. I know that Dave said he wants to avoid changing on-disk format, but I am hoping that this well defined and backward compat with lazy upgrade, on-disk format change may be acceptable? Thanks, Amir.