On 2/13/25 2:12 PM, Trond Myklebust wrote: > On Thu, 2025-02-13 at 18:49 +0000, Trond Myklebust wrote: >> On Thu, 2025-02-13 at 13:47 -0500, Chuck Lever wrote: >>> On 2/13/25 1:44 PM, Trond Myklebust wrote: >>>> On Thu, 2025-02-13 at 12:53 -0500, Olga Kornievskaia wrote: >>>>> On Thu, Feb 13, 2025 at 11:59 AM Trond Myklebust >>>>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> On Thu, 2025-02-13 at 11:15 -0500, cel@xxxxxxxxxx wrote: >>>>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>>> >>>>>>> The NFSv4.2 protocol requires that a client match a >>>>>>> CB_OFFLOAD >>>>>>> callback to a COPY reply containing the same copy state ID. >>>>>>> However, >>>>>>> it's possible that the order of the callback and reply >>>>>>> processing >>>>>>> on >>>>>>> the client can cause the CB_OFFLOAD to be received and >>>>>>> processed >>>>>>> /before/ the client has dealt with the COPY reply. >>>>>>> >>>>>>> Currently, in this case, the Linux NFS client will queue a >>>>>>> fresh >>>>>>> struct nfs4_copy_state in the CB_OFFLOAD handler. >>>>>>> handle_async_copy() then checks for a matching >>>>>>> nfs4_copy_state >>>>>>> before settling down to wait for a CB_OFFLOAD reply. >>>>>>> >>>>>>> But it would be simpler for the client's callback service >>>>>>> to >>>>>>> respond >>>>>>> to such a CB_OFFLOAD with "I'm not ready yet" and have the >>>>>>> server >>>>>>> send the CB_OFFLOAD again later. This avoids the need for >>>>>>> the >>>>>>> client's CB_OFFLOAD processing to allocate an extra struct >>>>>>> nfs4_copy_state -- in most cases that allocation will be >>>>>>> tossed >>>>>>> immediately, and it's one less memory allocation that we >>>>>>> have >>>>>>> to >>>>>>> worry about accidentally leaking or accumulating over time. >>>>>> >>>>>> Why can't the server just fill an appropriate entry for >>>>>> csa_referring_call_lists<> in the CB_SEQUENCE operation for >>>>>> the >>>>>> CB_OFFLOAD callback? That's the mechanism that is intended to >>>>>> be >>>>>> used >>>>>> to avoid the above kind of race. >>>>> >>>>> Let's say the linux server does implement the list but what >>>>> about >>>>> other implementations that will not. The client still needs an >>>>> approach to handle CB_OFFLOAD/COPY reply. >>>>>> >>>> >>>> There are several cases that need to be handled. Off the top of >>>> my >>>> head: >>>> 1. The reply to COPY hasn't yet been processed. >>>> 2. The COPY is complete, and the state has been forgotten. >>>> 3. The stateid presented by CB_OFFLOAD is one that was reused >>>> for a >>>> second COPY request after a previous one completed. >>>> >>>> The client will want to send different errors for either case >>>> (NFS4ERR_DELAY in the first and third case, NFS4ERR_BAD_STATEID >>>> in >>>> the >>>> second). >>>> With csa_referring_call_lists<>, the client can easily >>>> distinguish >>>> between the cases and return the right response. Without it, the >>>> client >>>> might end up returning NFS4ERR_BAD_STATEID in case 3, or >>>> NFS4ERR_DELAY >>>> in case 2, etc... >>>> >>>> So in practice, we want all servers to do the right thing if they >>>> want >>>> to avoid confusion over state. The client can't fix these races >>>> on >>>> its >>>> own. >>>> >>> >>> We are currently living in a world where all NFSD-based servers do >>> not >>> return referring calls. I think we need to understand what the >>> client >>> should do in those cases. >> >> >> My answer is: not try to fix that which cannot be fixed. >>> > > Put differently: it is a lot easier to just implement this properly on > the server instead of doing a load of contortions on the client. > > All that needs to be done is to store 3 extra pieces of information > when you create the stateid (the session id, slot id and sequence id > for the operation that created the stateid, i.e. 24 bytes of > information). When you update the stateid, you also replace the stored > extra information with the new session id, slot id and sequence id for > the operation that caused the stateid's sequence number to be bumped. > > Then in the CB_OFFLOAD callback, when you look up the stateid, you also > look up the 3 stored values and put them in the CB_SEQUENCE op. That's > literally all that is needed for the client to be able to figure out > the order in which it needs to process the operations. Yes, that part is obvious, and I'm in the middle of doing just that. But I'm not fixing a /new/ problem. The issue I'm getting at is what to do on the client to handle the existing cohort of NFSD servers that do /not/ currently support referring call lists. That is 100% of the population of today's NFSD servers. I think your answer is to leave the client CB_OFFLOAD implementation as it is: it saves unrecognized copy state IDs, expecting that later a COPY response will match it. That works well enough, but... I'm wondering what happens if a bad network actor injects CB_OFFLOAD calls with bogus copy state IDs. Those nfs4_copy_state structures won't be freed until the mount is gone, and there doesn't appear to be a limit on how many can be created. IMO we need to remove that client logic, eventually. I don't think it should be left in the client. -- Chuck Lever