Re: [PATCH 0/5] nfs: Add mount option for forcing RPC requests for one file over one connection

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Wed, 24 Mar 2021 14:34:45 +0000

> On Mar 23, 2021, at 7:31 PM, Nagendra Tomar <Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
>> 
>>> I was hoping that such a client side change could be useful to possibly more
>>> users with similar setups, after all file->connection affinity doesn't sound too
>>> arcane and one can think of benefits of one node processing one file. No?
>> 
>> That's where I'm getting hung up (outside the personal preference
>> that we not introduce yes another mount option). While I understand
>> what's going on now (thanks!) I'm not sure this is a common usage
>> scenario for NFSv3. Other opinions welcome here!
>> 
>> Nor does it seem like one that we want to encourage over solutions
>> like pNFS. Generally the Linux community has taken the position
>> that server bugs should be addressed on the server, and this seems
>> like a problem that is introduced by your middlebox and server
>> combination. 
> 
> I would like to look at it not as a problem created by our server setup,
> but rather as "one more scenario" which the client can much easily and
> generically handle and hence the patch.
> 
>> The client is working properly and is complying with spec.
> 
> The nconnect roundrobin distribution is just one way of utilizing multiple
> connections, which happens to be limiting for this specific usecase. 
> My patch proposes another way of distributing RPCs over the connections,
> which is more suitable for this usecase and maybe others.

Indeed, the nconnect work isn't quite complete, and the client will
need some way to specify how to schedule RPCs over several connections
to the same server. There seems to be two somewhat orthogonal
components to your proposal:

A. The introduction of a mount option to specify an RPC connection
scheduling mechanism

B. The use of a file handle hash to do that scheduling

For A: Again, I'd rather avoid adding more mount options, for reasons
I've described most recently over in the d_type/READDIR thread. There
are other options here. Anna has proposed a sysfs API that exposes
each kernel RPC connection for fine-grained control. See this thread:

https://lore.kernel.org/linux-nfs/20210312211826.360959-1-Anna.Schumaker@xxxxxxxxxx/

Dan Aloni has proposed an additional mechanism that enables user space
to associate an NFS mount point to its underlying RPC connections.

These approaches might be suitable for your purpose, or they might
only be a little inspiration to get creative.

For B: I agree with Tom that leaving this up to client system
administrators is a punt and usually not a scalable or future-looking
solution.

And I maintain you will be better off with a centralized and easily
configurable mechanism for balancing load, not a fixed algorithm that
you have to introduce to your clients via code changes or repeated
distributed changes to mount options.

There are other ways to utilize your LB. Since this is NFSv3, you
might expose your back-end NFSv3 servers by destination port (aka,
a set of NAT rules).

MDS NFSv4 server: clients get to it at the VIP address, port 2049
DS NFSv3 server A: clients get to it at the VIP address, port i
DS NFSv3 server B: clients get to it at the VIP address, port j
DS NFSv3 server C: clients get to it at the VIP address, port k

The LB translates [VIP]:i into [server A]:2049, [VIP]:j into
[server B]:2049, and so on.

I'm not sure if the flexfiles layout carries universal addresses with
port information, though. If it did, that would enable you to expose
all your backend data servers directly to clients via a single VIP,
and yet the LB would still be just a Layer 3 forwarding service and
not application-aware.

--
Chuck Lever