On Sun, May 26, 2024 at 08:20:16AM -0400, Jeff Layton wrote: > > Switch the __i_ctime fields to a single ktime_t. Move the i_generation > down above i_fsnotify_mask and then move the i_version into the > resulting 8 byte hole. This shrinks struct inode by 8 bytes total, and > should improve the cache footprint as the i_version and __i_ctime are > usually updated together. So first of all, this patch is a bit confusing because the patch doens't change __i_ctime, but rather i_ctime_sec and i_ctime_nsec, and Linus's tree doesn't have those fields. That's because the base commit in the patch, a6f48ee9b741, isn't in Linus's tree, and apparently this patch is dependent on "fs: switch timespec64 fields in inode to discrete integers"[1]. [1] https://lore.kernel.org/all/20240517-amtime-v1-1-7b804ca4be8f@xxxxxxxxxx/ > The one downside I can see to switching to a ktime_t is that if someone > has a filesystem with files on it that has ctimes outside the ktime_t > range (before ~1678 AD or after ~2262 AD), we won't be able to display > them properly in stat() without some special treatment. I'm operating > under the assumption that this is not a practical problem. There are two additional possible problems with this. The first is that if there is a maliciously fuzzed file system with timestamp outside of ctimes outside of the ktime_t range, this will almost certainly trigger UBSAN warnings, which will result in Syzkaller security bugs reported to file system developers. This can be fixed by explicitly and clamping ranges whenever converting to ktime_t in include/linux/fs.h, but that leads to another problem. The second problem is if the file system converts their on-disk inode to the in-memory struct inode, and then converts it from the in-memory to the on-disk inode format, and the timestamp is outside of the ktime_t range, this could result the on-disk inode having its ctime field getting corrupted. Now, *most* of the time, whenver the inode needs to be written back to disk, the ctime field will be changed anyway. However, if there are changes that don't result userspace-visible changes, but involves internal file system changes (for example, in case of an online repair or defrag, or a COW snap), where the file system doesn't set ctime --- and in it's possible that this might result in the ctime field messed. We could argue that ctime fields outside of the ktime_t range should never, ever happen (except for malicious fuzz testing by systems like syzkaller), and so it could be argued that we don't care(tm), but it's worth at least a mention in the comments and commit description. Of course, a file system which *did* care could work around the problem by having their own copy of ctime in the file-specific inode, but this would come at the cost of space saving benefits of this commit. I'd suspect what I'd do is to add a manual check for an out of range ctime on-disk, log a warning, and then clamp the ctime to the maximum ktime_t value, which is what would be returned by stat(2), and then write that clamped value back to disk if the ctime field doesn't get set to the current time before it needs to be written back to disk. Cheers, - Ted