Re: [for-6.13 PATCH 10/19] nfs_common: move localio_lock to new lock member of nfs_uuid_t

"NeilBrown" <neilb@xxxxxxx> · Tue, 12 Nov 2024 11:49:30 +1100

On Tue, 12 Nov 2024, Mike Snitzer wrote:
> On Tue, Nov 12, 2024 at 10:23:19AM +1100, NeilBrown wrote:
> > On Tue, 12 Nov 2024, Mike Snitzer wrote:
> > > On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote:
> > > > 
> > > > I don't like NFS_CS_LOCAL_IO_CAPABLE.
> > > > A use case that I imagine (and a customer does something like this) is an
> > > > HA cluster where the NFS server can move from one node to another.  All
> > > > the node access the filesystem, most over NFS.  If a server-migration
> > > > happens (e.g.  the current server node failed) then the new server node
> > > > would suddenly become LOCALIO-capable even though it wasn't at
> > > > mount-time.  I would like it to be able to detect this and start doing
> > > > localio.
> > > 
> > > Server migration while retaining the client being local to the new
> > > server?  So client migrates too?
> > 
> > No.  Client doesn't migrate.  Server migrates and appears on the same
> > host as the client.  The client can suddenly get better performance.  It
> > should benefit from that.
> > 
> > > 
> > > If the client migrates then it will negotiate with server using
> > > LOCALIO protocol.
> > > 
> > > Anyway, this HA hypothetical feels contrived.  It is fine that you
> > > dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like
> > > as an alternative.  Or why the simplicity in my approach lacking.
> > 
> > We have customers with exactly this HA config.  This is why I put work
> > into make sure loop-back NFS (client and server on same node) works
> > cleanly without memory allocation deadlocks.
> >   https://lwn.net/Articles/595652/
> > Getting localio in that config would be even better.
> > 
> > Your approach assumes that if LOCALIO isn't detected at mount time, it
> > will never been available.  I think that is a flawed assumption.
> 
> That's fair, I agree your HA scenario is valid.  It was terse as
> initially presented but I understand now, thanks.
> 
> > > > So I don't want NFS_CS_LOCAL_IO_CAPABLE.  I think tracking when the
> > > > network connection is re-established is sufficient.
> > > 
> > > Eh, that type of tracking doesn't really buy me anything if I've lost
> > > context (that LOCALIO was previously established and should be
> > > re-established).
> > > 
> > > NFS v3 is stateless, hence my hooking off read and write paths to
> > > trigger nfs_local_probe_async().  Unlike NFS v4, with its grace, more
> > > care is needed to avoid needless calls to nfs_local_probe_async().
> > 
> > I think it makes perfect sense to trigger the probe on a successful
> > read/write with some rate limiting to avoid sending a LOCALIO probe on
> > EVERY read/write.  Your rate-limiting for NFSv3 is:
> >    - never probe if the mount-time probe was not successful
> >    - otherwise probe once every 256 IO requests.
> > 
> > I think the first is too restrictive, and the second is unnecessarily
> > frequent.
> > I propose:
> >    - probe once each time the client reconnects with the server
> > 
> > This will result in many fewer probes in practice, but any successful
> > probe will happen at nearly the earliest possible moment.
> 
> I'm all for what you're proposing (its what I wanted from the start).
> In practice I just don't quite grok the client reconnect awareness
> implementation you're saying is at our finger tips.
> 
> > > Your previous email about just tracking network connection change was
> > > an optimization for avoiding repeat (pointless) probes.  We still
> > > need to know to do the probe to begin with.  Are you saying you want
> > > to backfill the equivalent of grace (or pseudo-grace) to NFSv3?
> > 
> > You don't "know to do the probe" at mount time.  You simply always do
> > it.  Similarly whenever localio isn't active it is always appropriate to
> > probe - with rate limiting.
> > 
> > And NFSv3 already has a grace period - in the NLM/STAT protocols.  We
> > could use STAT to detect when the server has restarted and so it is worth
> > probing again.  But STAT is not as reliable as we might like and it
> > would be more complexity with no real gain.
> 
> If you have a specific idea for the mechanism we need to create to
> detect the v3 client reconnects to the server please let me know.
> Reusing or augmenting an existing thing is fine by me.

nfs3_local_probe(struct nfs_server *server)
{
  struct nfs_client *clp = server->nfs_client;
  nfs_uuid_t *nfs_uuid = &clp->cl_uuid;

  if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie)
       nfs_local_probe_async()
}

static void nfs_local_probe_async_work(struct work_struct *work)
{
  struct nfs_client *clp = container_of(work, struct nfs_client,
                              cl_local_probe_work);
  clp->cl_uuid.connect_cookie =
     clp->cl_rpcclient->cl_xprt->connect_cookie;
  nfs_local_probe(clp);
}

Or maybe assign connect_cookie (which we have to add to uuid) inside
nfs_local_probe(). 

Though you need rcu_dereference_pointer() to access cl_xprt and
rcu_read_lock() protection around that.
(cl_xprt can change when the NFS client follows a "location" reported by
 the server to handle migration or mirroring.  Conceivably we should
 check for either cl_xprt changing or cl_xprt->connect_cookie changing,
 but if that were an issue it would be easier to initialise
 ->connect_cookie to a random number)

Note that you don't need local_probe_mutex.  A given work_struct
(cl_local_probe_work) can only be running once.  If you try to
queue_work() it while it is running, queue_work() will do nothing.

You'll want to only INIT_WORK() once - not on every
nfs_local_probe_async() call.

NeilBrown