Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2023-10-18 at 12:18 -0700, Linus Torvalds wrote:
> On Wed, 18 Oct 2023 at 10:41, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > 
> > One way to prevent this is to ensure that when we stamp a file with a
> > fine-grained timestamp, that we use that value to establish a floor for
> > any later timestamp update.
> 
> I'm very leery of this.
> 
> I don't like how it's using a global time - and a global fine-grained
> offset - when different filesystems will very naturally have different
> granularities. I also don't like how it's no using that global lock.
> 
> Yes, yes, since the use of this all is then gated by the 'is_mgtime()'
> thing, any filesystem with big granularities will presumably never set
> FS_MGTIME in the first time, and that hides the worst pointless cases.
> But it still feels iffy to me.
> 

Thanks for taking a look!

I'm not too crazy about the global lock either, but the global fine
grained value ensures that when we have mtime changes that occur across
filesystems that they appear to be in the correct order.

We could (hypothetically) track an offset per superblock or something,
but then you could see out-of-order timestamps in inodes across
different filesystems (even of the same type). I think it'd better not
to do that if we can get away with it.


> Also, the whole current_ctime() logic seems wrong. Later (in 4/9), you do this:
> 
>  static struct timespec64 current_ctime(struct inode *inode)
>  {
>         if (is_mgtime(inode))
>                 return current_mgtime(inode);
> 
> and current_mgtime() does
> 
>         if (nsec & I_CTIME_QUERIED) {
>                 ktime_get_real_ts64(&now);
>                 return timestamp_truncate(now, inode);
>         }
> 
> so once the code has set I_CTIME_QUERIED, it will now use the
> expensive fine-grained time - even when it makes no sense.
> 
> As far as I can tell, there is *never* a reason to get the
> fine-grained time if the old inode ctime is already sufficiently far
> away.
> 
> IOW, my gut feel is that all this logic should always not only be
> guarded by FS_MGTIME (like you do seem to do), *and* by "has anybody
> even queried this time" - it should *also* always be guarded by "if I
> get the coarse-grained time, is that sufficient?"
> 
> So I think the current_ctime() logic should be something along the lines of
> 
>     static struct timespec64 current_ctime(struct inode *inode)
>     {
>         struct timespec64 ts64 = current_time(inode);
>         unsigned long old_ctime_sec = inode->i_ctime_sec;
>         unsigned int old_ctime_nsec = inode->i_ctime_nsec;
> 
>         if (ts64.tv_sec != old_ctime_sec)
>                 return ts64;
> 
>         /*
>          * No need to check is_mgtime(inode) - the I_CTIME_QUERIED
>          * flag is only set for mgtime filesystems
>          */
>         if (!(old_ctime_nsec & I_CTIME_QUERIED))
>                 return ts64;
>         old_ctime_nsec &= ~I_CTIME_QUERIED;
>         if (ts64.tv_nsec > old_ctime_nsec + inode->i_sb->s_time_gran)
>                 return ts64;
> 

Does that really do what you expect here? current_time will return a
value that has already been through timestamp_truncate. Regardless, I
don't think this function makes as big a difference as you might think.

>
>         /* Ok, only *now* do we do a finegrained value */
>         ktime_get_real_ts64(&ts64);
>         return timestamp_truncate(ts64);
>     }
> 
> or whatever. Make it *very* clear that the finegrained timestamp is
> the absolute last option, after we have checked that the regular one
> isn't possible.

current_mgtime is calling ktime_get_real_ts64, which is an existing
interface that does not take the global spinlock and won't advance the
global offset. That call should be quite cheap.

The reason we can use that call here is because current_ctime and
current_mgtime are only called from  inode_needs_update_time, which is
only called to check whether we need to get write access to the
inode. What we do is look at the current clock and see whether the
timestamps would perceivably change if we were to do the update right
then.

If so, we get write access and then call inode_set_ctime_current(). That
will fetch its own timestamps and reevaluate what sort of update to do.
That's the only place that fetches an expensive fine-grained timestamp
that advances the offset.

So, I think this set already is only getting the expensive fine-grained
timestamps as a last resort.

This is probably an indicator that I need to document this code better
though. It may also be a good idea to reorganize
inode_needs_update_time, current_ctime and current_mgtime for better
clarity.

Many thanks for the comments!
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux