Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 12 Nov 2024 09:36:09 -0500

On Tue, Nov 12, 2024 at 11:49:30AM +1100, NeilBrown wrote:
> On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > On Tue, Nov 12, 2024 at 10:23:19AM +1100, NeilBrown wrote:
> > > On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > > > On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> > > > > 
> > > > > I don't like NFS_CS_LOCAL_IO_CAPABLE.
> > > > > A use case that I imagine (and a customer does something like this) is an
> > > > > HA cluster where the NFS server can move from one node to another.  All
> > > > > the node access the filesystem, most over NFS.  If a server-migration
> > > > > happens (e.g.  the current server node failed) then the new server node
> > > > > would suddenly become LOCALIO-capable even though it wasn't at
> > > > > mount-time.  I would like it to be able to detect this and start doing
> > > > > localio.
> > > > 
> > > > Server migration while retaining the client being local to the new
> > > > server?  So client migrates too?
> > > 
> > > No.  Client doesn't migrate.  Server migrates and appears on the same
> > > host as the client.  The client can suddenly get better performance.  It
> > > should benefit from that.
> > > 
> > > > 
> > > > If the client migrates then it will negotiate with server using
> > > > LOCALIO protocol.
> > > > 
> > > > Anyway, this HA hypothetical feels contrived.  It is fine that you
> > > > dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
> > > > as an alternative.  Or why the simplicity in my approach lacking.
> > > 
> > > We have customers with exactly this HA config.  This is why I put work
> > > into make sure loop-back NFS (client and server on same node) works
> > > cleanly without memory allocation deadlocks.
> > >   https://lwn.net/Articles/595652/
> > > Getting localio in that config would be even better.
> > > 
> > > Your approach assumes that if LOCALIO isn't detected at mount time, it
> > > will never been available.  I think that is a flawed assumption.
> > 
> > That's fair, I agree your HA scenario is valid.  It was terse as
> > initially presented but I understand now, thanks.
> > 
> > > > > So I don't want NFS_CS_LOCAL_IO_CAPABLE.  I think tracking when the
> > > > > network connection is re-established is sufficient.
> > > > 
> > > > Eh, that type of tracking doesn't really buy me anything if I've lost
> > > > context (that LOCALIO was previously established and should be
> > > > re-established).
> > > > 
> > > > NFS v3 is stateless, hence my hooking off read and write paths to
> > > > trigger nfs_local_probe_async().  Unlike NFS v4, with its grace, more
> > > > care is needed to avoid needless calls to nfs_local_probe_async().
> > > 
> > > I think it makes perfect sense to trigger the probe on a successful
> > > read/write with some rate limiting to avoid sending a LOCALIO probe on
> > > EVERY read/write.  Your rate-limiting for NFSv3 is:
> > >    - never probe if the mount-time probe was not successful
> > >    - otherwise probe once every 256 IO requests.
> > > 
> > > I think the first is too restrictive, and the second is unnecessarily
> > > frequent.
> > > I propose:
> > >    - probe once each time the client reconnects with the server
> > > 
> > > This will result in many fewer probes in practice, but any successful
> > > probe will happen at nearly the earliest possible moment.
> > 
> > I'm all for what you're proposing (its what I wanted from the start).
> > In practice I just don't quite grok the client reconnect awareness
> > implementation you're saying is at our finger tips.
> > 
> > > > Your previous email about just tracking network connection change was
> > > > an optimization for avoiding repeat (pointless) probes.  We still
> > > > need to know to do the probe to begin with.  Are you saying you want
> > > > to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
> > > 
> > > You don't "know to do the probe" at mount time.  You simply always do
> > > it.  Similarly whenever localio isn't active it is always appropriate to
> > > probe - with rate limiting.
> > > 
> > > And NFSv3 already has a grace period - in the NLM/STAT protocols.  We
> > > could use STAT to detect when the server has restarted and so it is worth
> > > probing again.  But STAT is not as reliable as we might like and it
> > > would be more complexity with no real gain.
> > 
> > If you have a specific idea for the mechanism we need to create to
> > detect the v3 client reconnects to the server please let me know.
> > Reusing or augmenting an existing thing is fine by me.
> 
> nfs3_local_probe(struct nfs_server *server)
> {
>   struct nfs_client *clp = server->nfs_client;
>   nfs_uuid_t *nfs_uuid = &clp->cl_uuid;
> 
>   if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
>        nfs_local_probe_async()
> }
> 
> static void nfs_local_probe_async_work(struct work_struct *work)
> {
>   struct nfs_client *clp = container_of(work, struct nfs_client,
>                               cl_local_probe_work);
>   clp->cl_uuid.connect_cookie =
>      clp->cl_rpcclient->cl_xprt->connect_cookie;
>   nfs_local_probe(clp);
> }
> 
> Or maybe assign connect_cookie (which we have to add to uuid) inside
> nfs_local_probe(). 

The problem with per-connection checks is that a change in export
security policy could disable LOCALIO rather persistently. The only
way to recover, if checking is done only when a connection is
established, is to remount or force a disconnect.

> Though you need rcu_dereference_pointer() to access cl_xprt and
> rcu_read_lock() protection around that.
> (cl_xprt can change when the NFS client follows a "location" reported by
>  the server to handle migration or mirroring.  Conceivably we should
>  check for either cl_xprt changing or cl_xprt->connect_cookie changing,
>  but if that were an issue it would be easier to initialise
>  ->connect_cookie to a random number)
>  
> Note that you don't need local_probe_mutex.  A given work_struct
> (cl_local_probe_work) can only be running once.  If you try to
> queue_work() it while it is running, queue_work() will do nothing.
> 
> You'll want to only INIT_WORK() once - not on every
> nfs_local_probe_async() call.
> 
> NeilBrown

-- 
Chuck Lever