Re: [PATCH 0/5] nfs: Add mount option for forcing RPC requests for one file over one connection

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 23 Mar 2021 18:25:57 +0000

> On Mar 23, 2021, at 2:01 PM, Nagendra Tomar <Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
> 
>>> On Mar 23, 2021, at 12:29 PM, Nagendra Tomar
>> <Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
>>> 
>>>>> On Mar 23, 2021, at 11:57 AM, Nagendra Tomar
>>>> <Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
>>>>> 
>>>>>>> On Mar 23, 2021, at 1:46 AM, Nagendra Tomar
>>>>>> <Nagendra.Tomar@xxxxxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> From: Nagendra S Tomar <natomar@xxxxxxxxxxxxx>
>>>>>> 
>>>>>> The flexfiles layout can handle an NFSv4.1 client and NFSv3 data
>>>>>> servers. In fact it was designed for exactly this kind of mix of
>>>>>> NFS versions.
>>>>>> 
>>>>>> No client code change will be necessary -- there are a lot more
>>>>>> clients than servers. The MDS can be made to work smartly in
>>>>>> concert with the load balancer, over time; or it can adopt other
>>>>>> clever strategies.
>>>>>> 
>>>>>> IMHO pNFS is the better long-term strategy here.
>>>>> 
>>>>> The fundamental difference here is that the clustered NFSv3 server
>>>>> is available over a single virtual IP, so IIUC even if we were to use
>>>>> NFSv41 with flexfiles layout, all it can handover to the client is that single
>>>>> (load-balanced) virtual IP and now when the clients do connect to the
>>>>> NFSv3 DS we still have the same issue. Am I understanding you right?
>>>>> Can you pls elaborate what you mean by "MDS can be made to work
>>>>> smartly in concert with the load balancer"?
>>>> 
>>>> I had thought there were multiple NFSv3 server targets in play.
>>>> 
>>>> If the load balancer is making them look like a single IP address,
>>>> then take it out of the equation: expose all the NFSv3 servers to
>>>> the clients and let the MDS direct operations to each data server.
>>>> 
>>>> AIUI this is the approach (without the use of NFSv3) taken by
>>>> NetApp next generation clusters.
>>> 
>>> Yeah, if could have clients access all the NFSv3 servers then I agree, pNFS
>>> would be a viable option. Unfortunately that's not an option in this case. The
>>> cluster has 100's of nodes and it's not an on-prem server, but a cloud service,
>>> so the simplicity of the single LB VIP is critical.
>> 
>> The clients mount only the MDS. The MDS provides the DS addresses, they are
>> not exposed to client administrators. If the MDS adopts the load balancer's IP
>> address, then the clients would simply mount that same server address using
>> NFSv4.1.
> 
> I understand/agree with the "client mounts the single MDS IP" part. What I meant 
> by "simplicity of the single LB VIP" is to not having to have so many routable 
> IP addresses, since the clients could be on a (very) different network than the 
> storage cluster they are accessing, even though client admins will not deal with
> those addresses themselves, as you mention.

Got it.

>> The other alternative is to make the load balancer sniff the FH from each
>> NFS request and direct it to a consistent NFSv3 DS. I still prefer that
>> over adding a very special-case mount option to the Linux client. Again,
>> you'd be deploying a code change in one place, under your control, instead
>> of on 100's of clients.
> 
> That is one option but that makes LB application aware and potentially less 
> performant. Appreciate your suggestion, though!

You might get part of the way there by having the LB direct
traffic from a particular client to a particular backend NFS
server. The client and its applications are bound to have a
narrow file working set.

> I was hoping that such a client side change could be useful to possibly more 
> users with similar setups, after all file->connection affinity doesn't sound too 
> arcane and one can think of benefits of one node processing one file. No?

That's where I'm getting hung up (outside the personal preference
that we not introduce yes another mount option). While I understand
what's going on now (thanks!) I'm not sure this is a common usage
scenario for NFSv3. Other opinions welcome here!

Nor does it seem like one that we want to encourage over solutions
like pNFS. Generally the Linux community has taken the position
that server bugs should be addressed on the server, and this seems
like a problem that is introduced by your middlebox and server
combination. The client is working properly and is complying with
spec.

If the server cluster prefers particular requests to go to particular
targets, a layout is the way to go, IMHO.

(I'm not speaking for the NFS client maintainers, just offering an
opinion and hoping my comments clarify the scenario for others on
the list paying attention to this thread).

--
Chuck Lever