On Sat, Sep 07, 2024 at 07:56:47AM +1000, NeilBrown wrote: > On Sat, 07 Sep 2024, Mike Snitzer wrote: > > > > But I'd just like to point out that something like the below patch > > > wouldn't be needed if we kept my "heavy" approach (nfs reference on > > > nfsd modules via nfs_common using symbol_request): > > > https://marc.info/?l=linux-nfs&m=172499445027800&w=2 > > > (that patch has stuff I since cleaned up, e.g. removed typedefs and > > > EXPORT_SYMBOL_GPLs..) > > > > > > I knew we were going to pay for being too cute with how nfs took its > > > reference on nfsd. > > > > > > So here we are, needing fiddly incremental fixes like this to close a > > > really-small-yet-will-be-deadly race: > > > > <snip required delicate rcu re-locking requirements patch> > > > > I prefer this incremental re-implementation of my symbol_request patch > > that eliminates all concerns about the validity of 'nfs_to' calls: > > We could achieve the same effect without using symbol_request() (which > hardly anyone uses) if we did a __module_get (or try_module_get) at the > same place you are calling symbol_request(), and module_put() where you > do symbol_put(). > > This would mean that once NFS LOCALIO had detected a path to the local > server, it would hold the nfsd module until the nfs server were shutdown > and the nfs client noticed. So you wouldn't be able to unmount the nfsd > module immediately after stopping all nfsd servers. s/unmount the nfsd module/remove the nfsd module/ Right, the nfsd module wouldn't be able to be unloaded if LOCALIO client is still active (on host or within some container) with a mount from an export the now shutdown server hosted. With LOCALIO the client has extended the footprint of its kernel code to include portions of the nfsd module (indirectly via nfs_common). That said, we can preserve the fine-grained rcu-based locking dances that I reinforced with the first patch (call it "option 1") I shared in this sub-thread ;) > Maybe that doesn't matter. I think it is important to be able to > completely shut down the NFS server at any time. I think it is > important to be able to completely shutdown a network namespace at any > time. I am less concerned about being able to rmmod the nfsd module > after all obvious users have been disabled. Yes, and even the last case: if all obvious users have been disabled and unmounted then LOCALIO nfs client would have no cause to be holding a reference for nfsd. > So if others think that the improvements in code maintainability are > worth the loss of being able to rmmod nfsd without (potentially) having > to unmount all NFS filesystems, then I won't argue against it. But I > really would want it to be get/put of the module, not of some symbol. I can do that. > .... BTW you probably wanted to use symbol_get(), not symbol_request(). > The latter tries to load the module if it isn't already loaded. Using > symbol_get() does have the benefit that you don't need any locking dance > to prevent the module unloading while we get the ref. So if we really > want to go for less tricky locking that might be a justification - but > you don't need much locking for try_module_get()... Yes that was the entire reason I did it this way (call it "option 2", nicely avoids the bouncing of the rcu locking while putting nfsd_serv ref in nfs_open_local_fh's error path). Will look at switching to try_module_get(). Whichever way we go with fixing nfs_open_local_fh's narrow race with putting the nfsd_serv ref, thankfully this is a pretty minor point that we can quickly fix once Anna and Trond are comfortable with LOCALIO being merged. option 2's coarser nfs reference for nfsd via try_module_get would reduce think-time if/when we make LOCALIO changes in future. But I'd be fine with either fix. Thanks, Mike