On Wed, 2023-11-01 at 13:38 +0200, Amir Goldstein wrote: > On Wed, Nov 1, 2023 at 12:16 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Wed 01-11-23 08:57:09, Dave Chinner wrote: > > > 5. When-ever the inode is persisted, the timestamp is copied to the > > > on-disk structure and the current change counter is folded in. > > > > > > This means the on-disk structure always contains the latest > > > change attribute that has been persisted, just like we > > > currently do with i_version now. > > > > > > 6. When-ever we read the inode off disk, we split the change counter > > > from the timestamp and update the appropriate internal structures > > > with this information. > > > > > > This ensures that the VFS and userspace never see the change > > > counter implementation in the inode timestamps. > > > > OK, but is this compatible with the current XFS behavior? AFAICS currently > > XFS sets sb->s_time_gran to 1 so timestamps currently stored on disk will > > have some mostly random garbage in low bits of the ctime. Now if you look > > at such inode with a kernel using this new scheme, stat(2) will report > > ctime with low bits zeroed-out so if the ctime fetched in the old kernel was > > stored in some external database and compared to the newly fetched ctime, it > > will appear that ctime has gone backwards... Maybe we don't care but it is > > a user visible change that can potentially confuse something. > > See xfs_inode_has_bigtime() and auto-upgrade of inode format in > xfs_inode_item_precommit(). > > In the case of BIGTIME inode format, admin needs to opt-in to > BIGTIME format conversion by setting an INCOMPAT_BIGTIME > sb feature flag. > > I imagine that something similar (inode flag + sb flag) would need > to be done for the versioned-timestamp, but I think that in that case, > the feature flag could be RO_COMPAT - there is no harm in exposing > made-up nsec lower bits if fs would be mounted read-only on an old > kernel, is there? > > The same RO_COMPAT feature flag could also be used to determine > s_time_gran, because IIUC, s_time_gran for timestamp updates > is uniform across all inodes. > > I know that Dave said he wants to avoid changing on-disk format, > but I am hoping that this well defined and backward compat with > lazy upgrade, on-disk format change may be acceptable? With the ctime, we're somewhat saved by the fact that it's not settable by users, so we don't need to worry as much about returning specific values there, I think. With the scheme Dave is proposing, booting to a new kernel vs. an old kernel might show a different ctime on an inode though. That might be enough to justify needing a way to opt-in to the change on existing filesystems. -- Jeff Layton <jlayton@xxxxxxxxxx>