On Wed, 2024-09-04 at 15:01 +1000, NeilBrown wrote: > On Wed, 04 Sep 2024, NeilBrown wrote: > > > > I agree that dropping and reclaiming a lock is an anti-pattern and in > > best avoided in general. I cannot see a better alternative in this > > case. > > It occurred to me what I should spell out the alternate that I DO see so > you have the option of disagreeing with my assessment that it isn't > "better". > > We need RCU to call into nfsd, we need a per-cpu ref on the net (which > we can only get inside nfsd) and NOT RCU to call > nfsd_file_acquire_local(). > > The current code combines these (because they are only used together) > and so the need to drop rcu. > > I thought briefly that it could simply drop rcu and leave it dropped > (__releases(rcu)) but not only do I generally like that LESS than > dropping and reclaiming, I think it would be buggy. While in the nfsd > module code we need to be holding either rcu or a ref on the server else > the code could disappear out from under the CPU. So if we exit without > a ref on the server - which we do if nfsd_file_acquire_local() fails - > then we need to reclaim RCU *before* dropping the ref. So the current > code is slightly buggy. > > We could instead split the combined call into multiple nfs_to > interfaces. > > So nfs_open_local_fh() in nfs_common/nfslocalio.c would be something > like: > > rcu_read_lock(); > net = READ_ONCE(uuid->net); > if (!net || !nfs_to.get_net(net)) { > rcu_read_unlock(); > return ERR_PTR(-ENXIO); > } > rcu_read_unlock(); > localio = nfs_to.nfsd_open_local_fh(....); > if (IS_ERR(localio)) > nfs_to.put_net(net); > return localio; > > So we have 3 interfaces instead of 1, but no hidden unlock/lock. > > As I said, I don't think this is a net win, but reasonable people might > disagree with me. > I considered a few alternate designs here as well, and came to the same conclusion. This interface is ugly, but it's not materially worse than the alternatives. I think we just have to document this well, and deal with the ugliness. Luckily most of the gory details are managed inside nfsd and the nfs_common functions so the caller (nfs) shouldn't have to deal with the complex locking. One thing that might be good if we're sticking with this code, is a __might_sleep() at the top of nfs_open_local_fh function in nfs_common. That should help ensure that no one tries to call it with the rcu_read_lock() held (which is the main danger here). -- Jeff Layton <jlayton@xxxxxxxxxx>