Thanks Trond and Chuck for your inputs! Sorry, I didn’t make the NFS version clear. It’s v3, so pNFS and anything fancy is not an option. Actually nconnect works great with the change to force specific file requests to specific connections. I’ll add a mount-time-selectable xprt policy, as Trond suggested. Will try to keep the change generic so that it can be of use for others with similar “don’t spread same file requests to multiple nodes” usecases. I’m sure there would be other clustered NFS servers who will benefit from restricting file requests to specific connections/nodes. Thanks, Tomar -----Original Message----- From: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> Sent: 18 March 2021 19:44 To: Nagendra Tomar <Nagendra.Tomar@xxxxxxxxxxxxx>; chuck.lever@xxxxxxxxxx Cc: linux-nfs@xxxxxxxxxxxxxxx Subject: [EXTERNAL] Re: [RFC] nconnect xprt stickiness for a file On Thu, 2021-03-18 at 13:57 +0000, Chuck Lever III wrote: > > > > On Mar 17, 2021, at 9:56 PM, Nagendra Tomar < > > Nagendra.Tomar@xxxxxxxxxxxxx> wrote: > > > > We have a clustered NFS server behind a L4 load-balancer with the > > following > > Characteristics (relevant to this discussion): > > > > 1. RPC requests for the same file issued to different cluster nodes > > are not efficient. > > One file one cluster node is efficient. This is particularly > > true for WRITEs. > > 2. Multiple nconnect xprts land on different cluster nodes due to > > the source > > port being different for all. > > > > Due to this, the default nconnect roundrobin policy does not work > > very well as > > it results in RPCs targeted to the same file to be serviced by > > different cluster nodes. > > > > To solve this, we tweaked the nfs multipath code to always choose > > the same xprt > > for the same file. We do that by adding a new integer field to > > rpc_message, > > rpc_xprt_hint, which is set by NFS layer and used by RPC layer to > > pick a xprt. > > NFS layer sets it to the hash of the target file's filehandle, thus > > ensuring same file > > requests always use the same xprt. This works well. > > > > I am interested in knowing your thoughts on this, has anyone else > > also come across > > similar issue, is there any other way of solving this, etc. > > Would a pNFS file layout work? The MDS could direct I/O for > a particular file to a specific DS. That's the other option if your customers are using NFSv4.1 or NFSv4.2. That has the advantage that it also would allow the server to dynamically load balance the I/O across the available cluster nodes by recalling some layouts for nodes that are too hot and migrating them to nodes that have spare capacity. The file metadata and directory data+metadata will however still be retrieved from the node that the NFS client is mounting from. I don't know if that might still be a problem for this cluster setup? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx