> On Sep 4, 2024, at 9:54 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Wed, 2024-09-04 at 15:01 +1000, NeilBrown wrote: >> On Wed, 04 Sep 2024, NeilBrown wrote: >>> >>> I agree that dropping and reclaiming a lock is an anti-pattern and in >>> best avoided in general. I cannot see a better alternative in this >>> case. >> >> It occurred to me what I should spell out the alternate that I DO see so >> you have the option of disagreeing with my assessment that it isn't >> "better". >> >> We need RCU to call into nfsd, we need a per-cpu ref on the net (which >> we can only get inside nfsd) and NOT RCU to call >> nfsd_file_acquire_local(). >> >> The current code combines these (because they are only used together) >> and so the need to drop rcu. >> >> I thought briefly that it could simply drop rcu and leave it dropped >> (__releases(rcu)) but not only do I generally like that LESS than >> dropping and reclaiming, I think it would be buggy. While in the nfsd >> module code we need to be holding either rcu or a ref on the server else >> the code could disappear out from under the CPU. So if we exit without >> a ref on the server - which we do if nfsd_file_acquire_local() fails - >> then we need to reclaim RCU *before* dropping the ref. So the current >> code is slightly buggy. >> >> We could instead split the combined call into multiple nfs_to >> interfaces. >> >> So nfs_open_local_fh() in nfs_common/nfslocalio.c would be something >> like: >> >> rcu_read_lock(); >> net = READ_ONCE(uuid->net); >> if (!net || !nfs_to.get_net(net)) { >> rcu_read_unlock(); >> return ERR_PTR(-ENXIO); >> } >> rcu_read_unlock(); >> localio = nfs_to.nfsd_open_local_fh(....); >> if (IS_ERR(localio)) >> nfs_to.put_net(net); >> return localio; >> >> So we have 3 interfaces instead of 1, but no hidden unlock/lock. >> >> As I said, I don't think this is a net win, but reasonable people might >> disagree with me. >> > > I considered a few alternate designs here as well, and came to the same > conclusion. This interface is ugly, but it's not materially worse than > the alternatives. I think we just have to document this well, and deal > with the ugliness. To be clear, I largely agree with that; but I think documenting it in code rather than writing more comments is a good choice. > Luckily most of the gory details are managed inside nfsd and the > nfs_common functions so the caller (nfs) shouldn't have to deal with > the complex locking. > > One thing that might be good if we're sticking with this code, is a > __might_sleep() at the top of nfs_open_local_fh function in nfs_common. > That should help ensure that no one tries to call it with the > rcu_read_lock() held (which is the main danger here). > -- > Jeff Layton <jlayton@xxxxxxxxxx> -- Chuck Lever