Re: [RFC PATCH] fs: elide the smp_rmb fence in fd_install()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 5, 2024 at 9:01 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Thu, Dec 05, 2024 at 08:03:24PM +0100, Mateusz Guzik wrote:
> > On Thu, Dec 5, 2024 at 7:41 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Dec 05, 2024 at 03:43:41PM +0100, Mateusz Guzik wrote:
> > > > On Thu, Dec 5, 2024 at 3:18 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Dec 05, 2024 at 01:03:32PM +0100, Mateusz Guzik wrote:
> > > > > >  void fd_install(unsigned int fd, struct file *file)
> > > > > >  {
> > > > > > -     struct files_struct *files = current->files;
> > > > > > +     struct files_struct *files;
> > > > > >       struct fdtable *fdt;
> > > > > >
> > > > > >       if (WARN_ON_ONCE(unlikely(file->f_mode & FMODE_BACKING)))
> > > > > >               return;
> > > > > >
> > > > > > +     /*
> > > > > > +      * Synchronized with expand_fdtable(), see that routine for an
> > > > > > +      * explanation.
> > > > > > +      */
> > > > > >       rcu_read_lock_sched();
> > > > > > +     files = READ_ONCE(current->files);
> > > > >
> > > > > What are you trying to do with that READ_ONCE()?  current->files
> > > > > itself is *not* changed by any of that code; current->files->fdtab is.
> > > >
> > > > To my understanding this is the idiomatic way of spelling out the
> > > > non-existent in Linux smp_consume_load, for the resize_in_progress
> > > > flag.
> > >
> > > In Linus, "smp_consume_load()" is named rcu_dereference().
> >
> > ok
>
> And rcu_dereference(), and for that matter memory_order_consume, only
> orders the load of the pointer against subsequent dereferences of that
> same pointer against dereferences of that same pointer preceding the
> store of that pointer.
>
>         T1                              T2
>         a: p->a = 1;                    d: q = rcu_dereference(gp);
>         b: r1 = p->b;                   e: r2 = p->a;
>         c: rcu_assign_pointer(gp, p);   f: p->b = 42;
>
> Here, if (and only if!) T2's load into q gets the value stored by
> T1, then T1's statements e and f are guaranteed to happen after T2's
> statements a and b.  In your patch, I do not see this pattern for the
> files->resize_in_progress flag.
>
> > > > Anyway to elaborate I'm gunning for a setup where the code is
> > > > semantically equivalent to having a lock around the work.
> > >
> > > Except that rcu_read_lock_sched() provides mutual-exclusion guarantees
> > > only with later RCU grace periods, such as those implemented by
> > > synchronize_rcu().
> >
> > To my understanding the pre-case is already with the flag set upfront
> > and waiting for everyone to finish (which is already taking place in
> > stock code) + looking at it within the section.
>
> I freely confess that I do not understand the purpose of assigning to
> files->resize_in_progress both before (pre-existing) and within (added)
> expand_fdtable().  If the assignments before and after the call to
> expand_fdtable() and the checks were under that lock, that could work,
> but removing that lockless check might have performance and scalability
> consequences.
>
> > > > Pretend ->resize_lock exists, then:
> > > > fd_install:
> > > > files = current->files;
> > > > read_lock(files->resize_lock);
> > > > fdt = rcu_dereference_sched(files->fdt);
> > > > rcu_assign_pointer(fdt->fd[fd], file);
> > > > read_unlock(files->resize_lock);
> > > >
> > > > expand_fdtable:
> > > > write_lock(files->resize_lock);
> > > > [snip]
> > > > rcu_assign_pointer(files->fdt, new_fdt);
> > > > write_unlock(files->resize_lock);
> > > >
> > > > Except rcu_read_lock_sched + appropriately fenced resize_in_progress +
> > > > synchronize_rcu do it.
> > >
> > > OK, good, you did get the grace-period part of the puzzle.
> > >
> > > Howver, please keep in mind that synchronize_rcu() has significant
> > > latency by design.  There is a tradeoff between CPU consumption and
> > > latency, and synchronize_rcu() therefore has latencies ranging upwards of
> > > several milliseconds (not microseconds or nanoseconds).  I would be very
> > > surprised if expand_fdtable() users would be happy with such a long delay.
> >
> > The call is already there since 2015 and I only know of one case where
> > someone took an issue with it (and it could have been sorted out with
> > dup2 upfront to grow the table to the desired size). Amusingly I see
> > you patched it in 2018 from synchronize_sched to synchronize_rcu.
> > Bottom line though is that I'm not *adding* it. latency here. :)
>
> Are you saying that the smp_rmb() is unnecessary?  It doesn't seem like
> you are saying that, because otherwise your patch could simply remove
> it without additional code changes.  On the other hand, if it is a key
> component of the synchronization, I don't see how that smp_rmb() can be
> removed while still preserving that synchronization without adding another
> synchronize_rcu() to that function to compensate.
>
> Now, it might be that you are somehow cleverly reusing the pre-existing
> synchronize_rcu(), but I am not immediately seeing how this would work.
>
> And no, I do not recall making that particular change back in the
> day, only that I did change all the calls to synchronize_sched() to
> synchronize_rcu().  Please accept my apologies for my having failed
> to meet your expectations.  And do not be too surprised if others have
> similar expectations of you at some point in the future.  ;-)
>
> > So assuming the above can be ignored, do you confirm the patch works
> > (even if it needs some cosmetic changes)?
> >
> > The entirety of the patch is about removing smp_rmb in fd_install with
> > small code rearrangement, while relying on the machinery which is
> > already there.
>
> The code to be synchronized is fairly small.  So why don't you
> create a litmus test and ask herd7?  Please see tools/memory-model for
> documentation and other example litmus tests.  This tool does the moral
> equivalent of a full state-space search of the litmus tests, telling you
> whether your "exists" condition is always, sometimes, or never satisfied.
>

I think there is quite a degree of talking past each other in this thread.

I was not aware of herd7. Testing the thing with it sounds like a plan
to get out of it, so I'm going to do it and get back to you in a day
or two. Worst case the patch is a bust, best case the fence is already
of no use.

-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux