On Wed, 12 Jun 2024, Mike Snitzer wrote: > On Wed, Jun 12, 2024 at 02:09:21PM +1000, NeilBrown wrote: > > On Wed, 12 Jun 2024, Mike Snitzer wrote: > > > On Wed, Jun 12, 2024 at 01:17:05PM +1000, NeilBrown wrote: > > > > On Wed, 12 Jun 2024, Mike Snitzer wrote: > > > > > > > > > > SO I looked, and I'm saddened to see Neil's 6.8 commit 1e3577a4521e > > > > > ("SUNRPC: discard sv_refcnt, and svc_get/svc_put"). > > > > > > > > > > [the lack of useful refcounting with the current code kind of blew me > > > > > away.. but nice to see it existed not too long ago.] > > > > > > > > > > Rather than immediately invest the effort to revert commit > > > > > 1e3577a4521e for my apparent needs... I'll send out v2 to allow for > > > > > further review and discussion. > > > > > > > > > > But it really does feel like I _need_ svc_{get,put} and nfsd_{get,put} > > > > > > > > You are taking a reference, and at the right time. But it is to the > > > > wrong thing. > > > > > > Well, that reference is to ensure nfsd (and nfsd_open_local_fh) is > > > available for the duration of a local client connected to it. > > > > > > Really wasn't trying to keep nn->nfsd_serv around with this ;) > > > > > > > You call symbol_request(nfsd_open_local_fh) and so get a reference to > > > > the nfsd module. But you really want a reference to the nfsd service. > > > > > > > > I would suggest that you use symbol_request() to get a function which > > > > you then call and immediately symbol_put().... unless you need to use it > > > > to discard the reference to the service later. > > > > > > Getting the nfsd_open_local_fh symbol once when client handshakes with > > > server is meant to avoid needing to do so for every IO the client > > > issues to the local server. > > > > > > > The function would take nfsd_mutex, check there is an nfsd_serv, sets a > > > > flag or whatever to indicate the serv is being used for local_io, and > > > > maybe returns the nfsd_serv. As long as that flag is set the serv > > > > cannot be destroy. > > > > > > > > Do you need there to be available threads for LOCAL_IO to work? If so > > > > the flag would cause setting the num threads to zero to fail. > > > > If not .... that is weird. It would mean that setting the number of > > > > threads to zero would not destroy the service and I don't think we want > > > > to do that. > > > > > > > > So I think that when LOCAL_IO is in use, setting number of threads to > > > > zero must return EBUSY or similar, even if you don't need the threads. > > > > > > Yes, but I really dislike needing to play games with a tangential > > > characteristic of nfsd_serv (that threads are what hold reference), > > > rather than have the ability to keep the nfsd_serv around in a cleaner > > > way. > > > > > > This localio code doesn't run in nfsd context so it isn't using nfsd's > > > threads. Forcing threads to be held in reserve because localio doesn't > > > want nfsd_serv to go away isn't ideal. > > > > I started reading the rest of the patches and it seems that localio is > > only used for READ, WRTE, COMMIT. Is that correct? Is there > > documentation so that I don't have to ask? > > The header for v2's patch 7 (nfs/nfsd: add "localio" support) starts with: > Add client support for bypassing NFS for localhost reads, writes, and > commits. > > But I should've made it clearer by saying the same in the 0th header. Or maybe even a something to Documentation/ which describes your new side-protocol including how the UUIDs are used and what happens when a match is found.. > > > Obviously there are lots of other NFS requests so you wouldn't be able > > to use localio without nfsd threads running.... > > That's very true. > > > But a normal remote client doesn't pin the nfsd threads or the > > nfsd_serv. If the threads go away, the client blocks until the service > > comes back. Would that be appropriate semantics for localio?? i.e. on > > each nfsd_open_local_fh() call you mutex_trylock and hold that long > > enough to get the 'struct file *'. If it fails because there is no > > serv, you simply fall-back to the same path you use for other requests. > > > > Could that work? > > I can try it, but feels like it'd elevate nfsd_mutex to "contended", > as such it feels heavy. > > > > Does it maybe make sense to introduce a more narrow svc_get/svc_put > > > for this auxillary usecase? > > > > I don't think so. nfsd is a self-contained transactional service. It > > doesn't promise to persist beyond each transaction. > > Current transactions return status and/or data. Adding a new transaction > > that returns a 'struct file *' fits that model reasonable well. > > Sure. But to be clear, I am adding global state to nfs_common that > tracks nfsd_uuids. Those change every time a new nfsd_net is created > for a given server (client will then lookup the uuid to see if local). > I missed the full importance of this on my read-through. It would certainly make sense for the NFS client to get a counted-reference to something managed by nfs_common and created/destroyed by nfsd. It could then easily check if the handle is still valid and repeat the lookup only if the handle has been marked as invalid. We still need a way for the filehande-to-struct-file lookup to proceed without taking nfsd_mutex. Possibly we could use srcu and put a synchronise_srcu() call at the top of nfsd_destroy_serv()... > > But even if we went to the extreme where nfsd instances are bouncing > like crazy, the 'nfsd_uuids' list in nfs_common should work fine. > > Just not seeing what is gained by nfsd being so ephemeral. Maybe your > point is, it should work in that model too?.. I think it would, just > less efficiently due to make-work to re-get resources it needs. Exactly. localio shouldn't prevent the nfsd server from being stopped and restarted, but it needn't work efficiently when that is happening. Thanks, NeilBrown