On Wed, 2023-10-18 at 12:18 -0700, Linus Torvalds wrote: > On Wed, 18 Oct 2023 at 10:41, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > One way to prevent this is to ensure that when we stamp a file with a > > fine-grained timestamp, that we use that value to establish a floor for > > any later timestamp update. > > I'm very leery of this. > > I don't like how it's using a global time - and a global fine-grained > offset - when different filesystems will very naturally have different > granularities. I also don't like how it's no using that global lock. > > Yes, yes, since the use of this all is then gated by the 'is_mgtime()' > thing, any filesystem with big granularities will presumably never set > FS_MGTIME in the first time, and that hides the worst pointless cases. > But it still feels iffy to me. > Thanks for taking a look! I'm not too crazy about the global lock either, but the global fine grained value ensures that when we have mtime changes that occur across filesystems that they appear to be in the correct order. We could (hypothetically) track an offset per superblock or something, but then you could see out-of-order timestamps in inodes across different filesystems (even of the same type). I think it'd better not to do that if we can get away with it. > Also, the whole current_ctime() logic seems wrong. Later (in 4/9), you do this: > > static struct timespec64 current_ctime(struct inode *inode) > { > if (is_mgtime(inode)) > return current_mgtime(inode); > > and current_mgtime() does > > if (nsec & I_CTIME_QUERIED) { > ktime_get_real_ts64(&now); > return timestamp_truncate(now, inode); > } > > so once the code has set I_CTIME_QUERIED, it will now use the > expensive fine-grained time - even when it makes no sense. > > As far as I can tell, there is *never* a reason to get the > fine-grained time if the old inode ctime is already sufficiently far > away. > > IOW, my gut feel is that all this logic should always not only be > guarded by FS_MGTIME (like you do seem to do), *and* by "has anybody > even queried this time" - it should *also* always be guarded by "if I > get the coarse-grained time, is that sufficient?" > > So I think the current_ctime() logic should be something along the lines of > > static struct timespec64 current_ctime(struct inode *inode) > { > struct timespec64 ts64 = current_time(inode); > unsigned long old_ctime_sec = inode->i_ctime_sec; > unsigned int old_ctime_nsec = inode->i_ctime_nsec; > > if (ts64.tv_sec != old_ctime_sec) > return ts64; > > /* > * No need to check is_mgtime(inode) - the I_CTIME_QUERIED > * flag is only set for mgtime filesystems > */ > if (!(old_ctime_nsec & I_CTIME_QUERIED)) > return ts64; > old_ctime_nsec &= ~I_CTIME_QUERIED; > if (ts64.tv_nsec > old_ctime_nsec + inode->i_sb->s_time_gran) > return ts64; > Does that really do what you expect here? current_time will return a value that has already been through timestamp_truncate. Regardless, I don't think this function makes as big a difference as you might think. > > /* Ok, only *now* do we do a finegrained value */ > ktime_get_real_ts64(&ts64); > return timestamp_truncate(ts64); > } > > or whatever. Make it *very* clear that the finegrained timestamp is > the absolute last option, after we have checked that the regular one > isn't possible. current_mgtime is calling ktime_get_real_ts64, which is an existing interface that does not take the global spinlock and won't advance the global offset. That call should be quite cheap. The reason we can use that call here is because current_ctime and current_mgtime are only called from inode_needs_update_time, which is only called to check whether we need to get write access to the inode. What we do is look at the current clock and see whether the timestamps would perceivably change if we were to do the update right then. If so, we get write access and then call inode_set_ctime_current(). That will fetch its own timestamps and reevaluate what sort of update to do. That's the only place that fetches an expensive fine-grained timestamp that advances the offset. So, I think this set already is only getting the expensive fine-grained timestamps as a last resort. This is probably an indicator that I need to document this code better though. It may also be a good idea to reorganize inode_needs_update_time, current_ctime and current_mgtime for better clarity. Many thanks for the comments! -- Jeff Layton <jlayton@xxxxxxxxxx>