> On Sep 5, 2024, at 10:21 AM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > On Wed, Sep 04, 2024 at 09:47:07AM -0400, Chuck Lever wrote: >> On Wed, Sep 04, 2024 at 03:01:46PM +1000, NeilBrown wrote: >>> On Wed, 04 Sep 2024, NeilBrown wrote: >>>> >>>> I agree that dropping and reclaiming a lock is an anti-pattern and in >>>> best avoided in general. I cannot see a better alternative in this >>>> case. >>> >>> It occurred to me what I should spell out the alternate that I DO see so >>> you have the option of disagreeing with my assessment that it isn't >>> "better". >>> >>> We need RCU to call into nfsd, we need a per-cpu ref on the net (which >>> we can only get inside nfsd) and NOT RCU to call >>> nfsd_file_acquire_local(). >>> >>> The current code combines these (because they are only used together) >>> and so the need to drop rcu. >>> >>> I thought briefly that it could simply drop rcu and leave it dropped >>> (__releases(rcu)) but not only do I generally like that LESS than >>> dropping and reclaiming, I think it would be buggy. While in the nfsd >>> module code we need to be holding either rcu or a ref on the server else >>> the code could disappear out from under the CPU. So if we exit without >>> a ref on the server - which we do if nfsd_file_acquire_local() fails - >>> then we need to reclaim RCU *before* dropping the ref. So the current >>> code is slightly buggy. >>> >>> We could instead split the combined call into multiple nfs_to >>> interfaces. >>> >>> So nfs_open_local_fh() in nfs_common/nfslocalio.c would be something >>> like: >>> >>> rcu_read_lock(); >>> net = READ_ONCE(uuid->net); >>> if (!net || !nfs_to.get_net(net)) { >>> rcu_read_unlock(); >>> return ERR_PTR(-ENXIO); >>> } >>> rcu_read_unlock(); >>> localio = nfs_to.nfsd_open_local_fh(....); >>> if (IS_ERR(localio)) >>> nfs_to.put_net(net); >>> return localio; >>> >>> So we have 3 interfaces instead of 1, but no hidden unlock/lock. >> >> Splitting up the function call occurred to me as well, but I didn't >> come up with a specific bit of surgery. Thanks for the suggestion. >> >> At this point, my concern is that we will lose your cogent >> explanation of why the release/lock is done. Having it in email is >> great, but email is more ephemeral than actually putting it in the >> code. >> >> >>> As I said, I don't think this is a net win, but reasonable people might >>> disagree with me. >> >> The "win" here is that it makes this code self-documenting and >> somewhat less likely to be broken down the road by changes in and >> around this area. Since I'm more forgetful these days I lean towards >> the more obvious kinds of coding solutions. ;-) >> >> Mike, how do you feel about the 3-interface suggestion? > > I dislike expanding from 1 indirect function call to 2 in rapid > succession (3 for the error path, not a problem, just being precise. > But I otherwise like it.. maybe.. heh. > > FYI, I did run with the suggestion to make nfs_to a pointer that just > needs a simple assignment rather than memcpy to initialize. So Neil's > above code becames: > > rcu_read_lock(); > net = rcu_dereference(uuid->net); > if (!net || !nfs_to->nfsd_serv_try_get(net)) { > rcu_read_unlock(); > return ERR_PTR(-ENXIO); > } > rcu_read_unlock(); > /* We have an implied reference to net thanks to nfsd_serv_try_get */ > localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt, > cred, nfs_fh, fmode); > if (IS_ERR(localio)) > nfs_to->nfsd_serv_put(net); > return localio; > > I do think it cleans the code up... full patch is here: > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=nfs-localio-for-next.v15-with-fixups&id=e85306941878a87070176702de687f2779436061 > > But I'm still on the fence.. someone help push me over! I wasn't expecting it would be less ugly, but it does look harder to screw up down the road when we've forgotten why the API looks this way. > Tangent, but in the related business of "what are next steps?": > > I updated headers with various provided Reviewed-by:s and Acked-by:s, > fixed at least 1 commit header, fixed some sparse issues, various > fixes to nfs_to patch (removed EXPORT_SYMBOL_GPL, switched to using > pointer, updated nfs_to callers). Etc... > > But if I fold those changes in I compromise the provided Reviewed-by > and Acked-by.. so I'm leaning toward posting a v16 that has > these incremental fixes/improvements, see the 3 topmost commits here: > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next.v15-with-fixups > > Or if you can review the incremental patches I can fold them in and > preserve the various Reviewed-by and Acked-by... For the three topmost patches in that branch: Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx <mailto:chuck.lever@xxxxxxxxxx>> HTH > You can also see incremental diff from .v15 to .v15-with-fixups with: > git remote update snitzer > git diff snitzer/nfs-localio-for-next.v15 snitzer/nfs-localio-for-next.v15-with-fixups > > Either way, I should post a v16 right? SO question is: should I fold > these incremental changes in to the original or keep them split out? > > I'm good with whatever you guys think. But whatever is decided: this > needs to be the handoff point to focused NFS client review and hopeful > staging for 6.12 inclusion, I've pivoted to working with Trond to > make certain he is good with everything. -- Chuck Lever