> On Mar 23, 2021, at 7:31 PM, Nagendra Tomar <Nagendra.Tomar@xxxxxxxxxxxxx> wrote: >> >>> I was hoping that such a client side change could be useful to possibly more >>> users with similar setups, after all file->connection affinity doesn't sound too >>> arcane and one can think of benefits of one node processing one file. No? >> >> That's where I'm getting hung up (outside the personal preference >> that we not introduce yes another mount option). While I understand >> what's going on now (thanks!) I'm not sure this is a common usage >> scenario for NFSv3. Other opinions welcome here! >> >> Nor does it seem like one that we want to encourage over solutions >> like pNFS. Generally the Linux community has taken the position >> that server bugs should be addressed on the server, and this seems >> like a problem that is introduced by your middlebox and server >> combination. > > I would like to look at it not as a problem created by our server setup, > but rather as "one more scenario" which the client can much easily and > generically handle and hence the patch. > >> The client is working properly and is complying with spec. > > The nconnect roundrobin distribution is just one way of utilizing multiple > connections, which happens to be limiting for this specific usecase. > My patch proposes another way of distributing RPCs over the connections, > which is more suitable for this usecase and maybe others. Indeed, the nconnect work isn't quite complete, and the client will need some way to specify how to schedule RPCs over several connections to the same server. There seems to be two somewhat orthogonal components to your proposal: A. The introduction of a mount option to specify an RPC connection scheduling mechanism B. The use of a file handle hash to do that scheduling For A: Again, I'd rather avoid adding more mount options, for reasons I've described most recently over in the d_type/READDIR thread. There are other options here. Anna has proposed a sysfs API that exposes each kernel RPC connection for fine-grained control. See this thread: https://lore.kernel.org/linux-nfs/20210312211826.360959-1-Anna.Schumaker@xxxxxxxxxx/ Dan Aloni has proposed an additional mechanism that enables user space to associate an NFS mount point to its underlying RPC connections. These approaches might be suitable for your purpose, or they might only be a little inspiration to get creative. For B: I agree with Tom that leaving this up to client system administrators is a punt and usually not a scalable or future-looking solution. And I maintain you will be better off with a centralized and easily configurable mechanism for balancing load, not a fixed algorithm that you have to introduce to your clients via code changes or repeated distributed changes to mount options. There are other ways to utilize your LB. Since this is NFSv3, you might expose your back-end NFSv3 servers by destination port (aka, a set of NAT rules). MDS NFSv4 server: clients get to it at the VIP address, port 2049 DS NFSv3 server A: clients get to it at the VIP address, port i DS NFSv3 server B: clients get to it at the VIP address, port j DS NFSv3 server C: clients get to it at the VIP address, port k The LB translates [VIP]:i into [server A]:2049, [VIP]:j into [server B]:2049, and so on. I'm not sure if the flexfiles layout carries universal addresses with port information, though. If it did, that would enable you to expose all your backend data servers directly to clients via a single VIP, and yet the LB would still be just a Layer 3 forwarding service and not application-aware. -- Chuck Lever