On Thu, Dec 05, 2024 at 09:15:14PM +0100, Mateusz Guzik wrote: > On Thu, Dec 5, 2024 at 9:01 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Thu, Dec 05, 2024 at 08:03:24PM +0100, Mateusz Guzik wrote: > > > On Thu, Dec 5, 2024 at 7:41 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > On Thu, Dec 05, 2024 at 03:43:41PM +0100, Mateusz Guzik wrote: > > > > > On Thu, Dec 5, 2024 at 3:18 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > On Thu, Dec 05, 2024 at 01:03:32PM +0100, Mateusz Guzik wrote: > > > > > > > void fd_install(unsigned int fd, struct file *file) > > > > > > > { > > > > > > > - struct files_struct *files = current->files; > > > > > > > + struct files_struct *files; > > > > > > > struct fdtable *fdt; > > > > > > > > > > > > > > if (WARN_ON_ONCE(unlikely(file->f_mode & FMODE_BACKING))) > > > > > > > return; > > > > > > > > > > > > > > + /* > > > > > > > + * Synchronized with expand_fdtable(), see that routine for an > > > > > > > + * explanation. > > > > > > > + */ > > > > > > > rcu_read_lock_sched(); > > > > > > > + files = READ_ONCE(current->files); > > > > > > > > > > > > What are you trying to do with that READ_ONCE()? current->files > > > > > > itself is *not* changed by any of that code; current->files->fdtab is. > > > > > > > > > > To my understanding this is the idiomatic way of spelling out the > > > > > non-existent in Linux smp_consume_load, for the resize_in_progress > > > > > flag. > > > > > > > > In Linus, "smp_consume_load()" is named rcu_dereference(). > > > > > > ok > > > > And rcu_dereference(), and for that matter memory_order_consume, only > > orders the load of the pointer against subsequent dereferences of that > > same pointer against dereferences of that same pointer preceding the > > store of that pointer. > > > > T1 T2 > > a: p->a = 1; d: q = rcu_dereference(gp); > > b: r1 = p->b; e: r2 = p->a; > > c: rcu_assign_pointer(gp, p); f: p->b = 42; > > > > Here, if (and only if!) T2's load into q gets the value stored by > > T1, then T1's statements e and f are guaranteed to happen after T2's > > statements a and b. In your patch, I do not see this pattern for the > > files->resize_in_progress flag. > > > > > > > Anyway to elaborate I'm gunning for a setup where the code is > > > > > semantically equivalent to having a lock around the work. > > > > > > > > Except that rcu_read_lock_sched() provides mutual-exclusion guarantees > > > > only with later RCU grace periods, such as those implemented by > > > > synchronize_rcu(). > > > > > > To my understanding the pre-case is already with the flag set upfront > > > and waiting for everyone to finish (which is already taking place in > > > stock code) + looking at it within the section. > > > > I freely confess that I do not understand the purpose of assigning to > > files->resize_in_progress both before (pre-existing) and within (added) > > expand_fdtable(). If the assignments before and after the call to > > expand_fdtable() and the checks were under that lock, that could work, > > but removing that lockless check might have performance and scalability > > consequences. > > > > > > > Pretend ->resize_lock exists, then: > > > > > fd_install: > > > > > files = current->files; > > > > > read_lock(files->resize_lock); > > > > > fdt = rcu_dereference_sched(files->fdt); > > > > > rcu_assign_pointer(fdt->fd[fd], file); > > > > > read_unlock(files->resize_lock); > > > > > > > > > > expand_fdtable: > > > > > write_lock(files->resize_lock); > > > > > [snip] > > > > > rcu_assign_pointer(files->fdt, new_fdt); > > > > > write_unlock(files->resize_lock); > > > > > > > > > > Except rcu_read_lock_sched + appropriately fenced resize_in_progress + > > > > > synchronize_rcu do it. > > > > > > > > OK, good, you did get the grace-period part of the puzzle. > > > > > > > > Howver, please keep in mind that synchronize_rcu() has significant > > > > latency by design. There is a tradeoff between CPU consumption and > > > > latency, and synchronize_rcu() therefore has latencies ranging upwards of > > > > several milliseconds (not microseconds or nanoseconds). I would be very > > > > surprised if expand_fdtable() users would be happy with such a long delay. > > > > > > The call is already there since 2015 and I only know of one case where > > > someone took an issue with it (and it could have been sorted out with > > > dup2 upfront to grow the table to the desired size). Amusingly I see > > > you patched it in 2018 from synchronize_sched to synchronize_rcu. > > > Bottom line though is that I'm not *adding* it. latency here. :) > > > > Are you saying that the smp_rmb() is unnecessary? It doesn't seem like > > you are saying that, because otherwise your patch could simply remove > > it without additional code changes. On the other hand, if it is a key > > component of the synchronization, I don't see how that smp_rmb() can be > > removed while still preserving that synchronization without adding another > > synchronize_rcu() to that function to compensate. > > > > Now, it might be that you are somehow cleverly reusing the pre-existing > > synchronize_rcu(), but I am not immediately seeing how this would work. > > > > And no, I do not recall making that particular change back in the > > day, only that I did change all the calls to synchronize_sched() to > > synchronize_rcu(). Please accept my apologies for my having failed > > to meet your expectations. And do not be too surprised if others have > > similar expectations of you at some point in the future. ;-) > > > > > So assuming the above can be ignored, do you confirm the patch works > > > (even if it needs some cosmetic changes)? > > > > > > The entirety of the patch is about removing smp_rmb in fd_install with > > > small code rearrangement, while relying on the machinery which is > > > already there. > > > > The code to be synchronized is fairly small. So why don't you > > create a litmus test and ask herd7? Please see tools/memory-model for > > documentation and other example litmus tests. This tool does the moral > > equivalent of a full state-space search of the litmus tests, telling you > > whether your "exists" condition is always, sometimes, or never satisfied. > > > > I think there is quite a degree of talking past each other in this thread. > > I was not aware of herd7. Testing the thing with it sounds like a plan > to get out of it, so I'm going to do it and get back to you in a day > or two. Worst case the patch is a bust, best case the fence is already > of no use. Very good! My grouchiness earlier in this thread notwithstanding, I am happy to review your litmus tests. (You will likely need more than one.) And the inevitable unsolicited advice: Make those litmus tests as small as you possibly can. Full-state-space search is extremely powerful, but it does not scale very well. Thanx, Paul