On Tue, Nov 12, 2024 at 11:49:30AM +1100, NeilBrown wrote: > On Tue, 12 Nov 2024, Mike Snitzer wrote: > > On Tue, Nov 12, 2024 at 10:23:19AM +1100, NeilBrown wrote: > > > On Tue, 12 Nov 2024, Mike Snitzer wrote: > > > > On Tue, Nov 12, 2024 at 07:35:04AM +1100, NeilBrown wrote: > > > > > > > > > > I don't like NFS_CS_LOCAL_IO_CAPABLE. > > > > > A use case that I imagine (and a customer does something like this) is an > > > > > HA cluster where the NFS server can move from one node to another. All > > > > > the node access the filesystem, most over NFS. If a server-migration > > > > > happens (e.g. the current server node failed) then the new server node > > > > > would suddenly become LOCALIO-capable even though it wasn't at > > > > > mount-time. I would like it to be able to detect this and start doing > > > > > localio. > > > > > > > > Server migration while retaining the client being local to the new > > > > server? So client migrates too? > > > > > > No. Client doesn't migrate. Server migrates and appears on the same > > > host as the client. The client can suddenly get better performance. It > > > should benefit from that. > > > > > > > > > > > If the client migrates then it will negotiate with server using > > > > LOCALIO protocol. > > > > > > > > Anyway, this HA hypothetical feels contrived. It is fine that you > > > > dislike NFS_CS_LOCAL_IO_CAPABLE but I don't understand what you'd like > > > > as an alternative. Or why the simplicity in my approach lacking. > > > > > > We have customers with exactly this HA config. This is why I put work > > > into make sure loop-back NFS (client and server on same node) works > > > cleanly without memory allocation deadlocks. > > > https://lwn.net/Articles/595652/ > > > Getting localio in that config would be even better. > > > > > > Your approach assumes that if LOCALIO isn't detected at mount time, it > > > will never been available. I think that is a flawed assumption. > > > > That's fair, I agree your HA scenario is valid. It was terse as > > initially presented but I understand now, thanks. > > > > > > > So I don't want NFS_CS_LOCAL_IO_CAPABLE. I think tracking when the > > > > > network connection is re-established is sufficient. > > > > > > > > Eh, that type of tracking doesn't really buy me anything if I've lost > > > > context (that LOCALIO was previously established and should be > > > > re-established). > > > > > > > > NFS v3 is stateless, hence my hooking off read and write paths to > > > > trigger nfs_local_probe_async(). Unlike NFS v4, with its grace, more > > > > care is needed to avoid needless calls to nfs_local_probe_async(). > > > > > > I think it makes perfect sense to trigger the probe on a successful > > > read/write with some rate limiting to avoid sending a LOCALIO probe on > > > EVERY read/write. Your rate-limiting for NFSv3 is: > > > - never probe if the mount-time probe was not successful > > > - otherwise probe once every 256 IO requests. > > > > > > I think the first is too restrictive, and the second is unnecessarily > > > frequent. > > > I propose: > > > - probe once each time the client reconnects with the server > > > > > > This will result in many fewer probes in practice, but any successful > > > probe will happen at nearly the earliest possible moment. > > > > I'm all for what you're proposing (its what I wanted from the start). > > In practice I just don't quite grok the client reconnect awareness > > implementation you're saying is at our finger tips. > > > > > > Your previous email about just tracking network connection change was > > > > an optimization for avoiding repeat (pointless) probes. We still > > > > need to know to do the probe to begin with. Are you saying you want > > > > to backfill the equivalent of grace (or pseudo-grace) to NFSv3? > > > > > > You don't "know to do the probe" at mount time. You simply always do > > > it. Similarly whenever localio isn't active it is always appropriate to > > > probe - with rate limiting. > > > > > > And NFSv3 already has a grace period - in the NLM/STAT protocols. We > > > could use STAT to detect when the server has restarted and so it is worth > > > probing again. But STAT is not as reliable as we might like and it > > > would be more complexity with no real gain. > > > > If you have a specific idea for the mechanism we need to create to > > detect the v3 client reconnects to the server please let me know. > > Reusing or augmenting an existing thing is fine by me. > > nfs3_local_probe(struct nfs_server *server) > { > struct nfs_client *clp = server->nfs_client; > nfs_uuid_t *nfs_uuid = &clp->cl_uuid; > > if (nfs_uuid->connect_cookie != clp->cl_rpcclient->cl_xprt->connect_cookie) > nfs_local_probe_async() > } > > static void nfs_local_probe_async_work(struct work_struct *work) > { > struct nfs_client *clp = container_of(work, struct nfs_client, > cl_local_probe_work); > clp->cl_uuid.connect_cookie = > clp->cl_rpcclient->cl_xprt->connect_cookie; > nfs_local_probe(clp); > } > > Or maybe assign connect_cookie (which we have to add to uuid) inside > nfs_local_probe(). The problem with per-connection checks is that a change in export security policy could disable LOCALIO rather persistently. The only way to recover, if checking is done only when a connection is established, is to remount or force a disconnect. > Though you need rcu_dereference_pointer() to access cl_xprt and > rcu_read_lock() protection around that. > (cl_xprt can change when the NFS client follows a "location" reported by > the server to handle migration or mirroring. Conceivably we should > check for either cl_xprt changing or cl_xprt->connect_cookie changing, > but if that were an issue it would be easier to initialise > ->connect_cookie to a random number) > > Note that you don't need local_probe_mutex. A given work_struct > (cl_local_probe_work) can only be running once. If you try to > queue_work() it while it is running, queue_work() will do nothing. > > You'll want to only INIT_WORK() once - not on every > nfs_local_probe_async() call. > > NeilBrown -- Chuck Lever