Re: [RFC PATCH] NFS: CB_OFFLOAD should return DELAY when no copy state ID matches

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 13 Feb 2025 14:22:26 -0500

On 2/13/25 2:12 PM, Trond Myklebust wrote:
> On Thu, 2025-02-13 at 18:49 +0000, Trond Myklebust wrote:
>> On Thu, 2025-02-13 at 13:47 -0500, Chuck Lever wrote:
>>> On 2/13/25 1:44 PM, Trond Myklebust wrote:
>>>> On Thu, 2025-02-13 at 12:53 -0500, Olga Kornievskaia wrote:
>>>>> On Thu, Feb 13, 2025 at 11:59 AM Trond Myklebust
>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> On Thu, 2025-02-13 at 11:15 -0500, cel@xxxxxxxxxx wrote:
>>>>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>>>>
>>>>>>> The NFSv4.2 protocol requires that a client match a
>>>>>>> CB_OFFLOAD
>>>>>>> callback to a COPY reply containing the same copy state ID.
>>>>>>> However,
>>>>>>> it's possible that the order of the callback and reply
>>>>>>> processing
>>>>>>> on
>>>>>>> the client can cause the CB_OFFLOAD to be received and
>>>>>>> processed
>>>>>>> /before/ the client has dealt with the COPY reply.
>>>>>>>
>>>>>>> Currently, in this case, the Linux NFS client will queue a
>>>>>>> fresh
>>>>>>> struct nfs4_copy_state in the CB_OFFLOAD handler.
>>>>>>> handle_async_copy() then checks for a matching
>>>>>>> nfs4_copy_state
>>>>>>> before settling down to wait for a CB_OFFLOAD reply.
>>>>>>>
>>>>>>> But it would be simpler for the client's callback service
>>>>>>> to
>>>>>>> respond
>>>>>>> to such a CB_OFFLOAD with "I'm not ready yet" and have the
>>>>>>> server
>>>>>>> send the CB_OFFLOAD again later. This avoids the need for
>>>>>>> the
>>>>>>> client's CB_OFFLOAD processing to allocate an extra struct
>>>>>>> nfs4_copy_state -- in most cases that allocation will be
>>>>>>> tossed
>>>>>>> immediately, and it's one less memory allocation that we
>>>>>>> have
>>>>>>> to
>>>>>>> worry about accidentally leaking or accumulating over time.
>>>>>>
>>>>>> Why can't the server just fill an appropriate entry for
>>>>>> csa_referring_call_lists<> in the CB_SEQUENCE operation for
>>>>>> the
>>>>>> CB_OFFLOAD callback? That's the mechanism that is intended to
>>>>>> be
>>>>>> used
>>>>>> to avoid the above kind of race.
>>>>>
>>>>> Let's say the linux server does implement the list but what
>>>>> about
>>>>> other implementations that will not. The client still needs an
>>>>> approach to handle CB_OFFLOAD/COPY reply.
>>>>>>
>>>>
>>>> There are several cases that need to be handled. Off the top of
>>>> my
>>>> head:
>>>>    1. The reply to COPY hasn't yet been processed.
>>>>    2. The COPY is complete, and the state has been forgotten.
>>>>    3. The stateid presented by CB_OFFLOAD is one that was reused
>>>> for a
>>>>       second COPY request after a previous one completed.
>>>>
>>>> The client will want to send different errors for either case
>>>> (NFS4ERR_DELAY in the first and third case, NFS4ERR_BAD_STATEID
>>>> in
>>>> the
>>>> second).
>>>> With csa_referring_call_lists<>, the client can easily
>>>> distinguish
>>>> between the cases and return the right response. Without it, the
>>>> client
>>>> might end up returning NFS4ERR_BAD_STATEID in case 3, or
>>>> NFS4ERR_DELAY
>>>> in case 2, etc...
>>>>
>>>> So in practice, we want all servers to do the right thing if they
>>>> want
>>>> to avoid confusion over state. The client can't fix these races
>>>> on
>>>> its
>>>> own.
>>>>
>>>
>>> We are currently living in a world where all NFSD-based servers do
>>> not
>>> return referring calls. I think we need to understand what the
>>> client
>>> should do in those cases.
>>
>>
>> My answer is: not try to fix that which cannot be fixed.
>>>
> 
> Put differently: it is a lot easier to just implement this properly on
> the server instead of doing a load of contortions on the client.
> 
> All that needs to be done is to store 3 extra pieces of information
> when you create the stateid (the session id, slot id and sequence id
> for the operation that created the stateid, i.e. 24 bytes of
> information). When you update the stateid, you also replace the stored
> extra information with the new session id, slot id and sequence id for
> the operation that caused the stateid's sequence number to be bumped.
> 
> Then in the CB_OFFLOAD callback, when you look up the stateid, you also
> look up the 3 stored values and put them in the CB_SEQUENCE op. That's
> literally all that is needed for the client to be able to figure out
> the order in which it needs to process the operations.

Yes, that part is obvious, and I'm in the middle of doing just that.

But I'm not fixing a /new/ problem.

The issue I'm getting at is what to do on the client to handle the
existing cohort of NFSD servers that do /not/ currently support
referring call lists. That is 100% of the population of today's NFSD
servers.

I think your answer is to leave the client CB_OFFLOAD implementation
as it is: it saves unrecognized copy state IDs, expecting that later
a COPY response will match it. That works well enough, but...

I'm wondering what happens if a bad network actor injects CB_OFFLOAD
calls with bogus copy state IDs. Those nfs4_copy_state structures
won't be freed until the mount is gone, and there doesn't appear to
be a limit on how many can be created.

IMO we need to remove that client logic, eventually. I don't think
it should be left in the client.

-- 
Chuck Lever