On Thu, 2025-02-13 at 18:49 +0000, Trond Myklebust wrote: > On Thu, 2025-02-13 at 13:47 -0500, Chuck Lever wrote: > > On 2/13/25 1:44 PM, Trond Myklebust wrote: > > > On Thu, 2025-02-13 at 12:53 -0500, Olga Kornievskaia wrote: > > > > On Thu, Feb 13, 2025 at 11:59 AM Trond Myklebust > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Thu, 2025-02-13 at 11:15 -0500, cel@xxxxxxxxxx wrote: > > > > > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > > > > > > > > > The NFSv4.2 protocol requires that a client match a > > > > > > CB_OFFLOAD > > > > > > callback to a COPY reply containing the same copy state ID. > > > > > > However, > > > > > > it's possible that the order of the callback and reply > > > > > > processing > > > > > > on > > > > > > the client can cause the CB_OFFLOAD to be received and > > > > > > processed > > > > > > /before/ the client has dealt with the COPY reply. > > > > > > > > > > > > Currently, in this case, the Linux NFS client will queue a > > > > > > fresh > > > > > > struct nfs4_copy_state in the CB_OFFLOAD handler. > > > > > > handle_async_copy() then checks for a matching > > > > > > nfs4_copy_state > > > > > > before settling down to wait for a CB_OFFLOAD reply. > > > > > > > > > > > > But it would be simpler for the client's callback service > > > > > > to > > > > > > respond > > > > > > to such a CB_OFFLOAD with "I'm not ready yet" and have the > > > > > > server > > > > > > send the CB_OFFLOAD again later. This avoids the need for > > > > > > the > > > > > > client's CB_OFFLOAD processing to allocate an extra struct > > > > > > nfs4_copy_state -- in most cases that allocation will be > > > > > > tossed > > > > > > immediately, and it's one less memory allocation that we > > > > > > have > > > > > > to > > > > > > worry about accidentally leaking or accumulating over time. > > > > > > > > > > Why can't the server just fill an appropriate entry for > > > > > csa_referring_call_lists<> in the CB_SEQUENCE operation for > > > > > the > > > > > CB_OFFLOAD callback? That's the mechanism that is intended to > > > > > be > > > > > used > > > > > to avoid the above kind of race. > > > > > > > > Let's say the linux server does implement the list but what > > > > about > > > > other implementations that will not. The client still needs an > > > > approach to handle CB_OFFLOAD/COPY reply. > > > > > > > > > > > There are several cases that need to be handled. Off the top of > > > my > > > head: > > > 1. The reply to COPY hasn't yet been processed. > > > 2. The COPY is complete, and the state has been forgotten. > > > 3. The stateid presented by CB_OFFLOAD is one that was reused > > > for a > > > second COPY request after a previous one completed. > > > > > > The client will want to send different errors for either case > > > (NFS4ERR_DELAY in the first and third case, NFS4ERR_BAD_STATEID > > > in > > > the > > > second). > > > With csa_referring_call_lists<>, the client can easily > > > distinguish > > > between the cases and return the right response. Without it, the > > > client > > > might end up returning NFS4ERR_BAD_STATEID in case 3, or > > > NFS4ERR_DELAY > > > in case 2, etc... > > > > > > So in practice, we want all servers to do the right thing if they > > > want > > > to avoid confusion over state. The client can't fix these races > > > on > > > its > > > own. > > > > > > > We are currently living in a world where all NFSD-based servers do > > not > > return referring calls. I think we need to understand what the > > client > > should do in those cases. > > > My answer is: not try to fix that which cannot be fixed. > > Put differently: it is a lot easier to just implement this properly on the server instead of doing a load of contortions on the client. All that needs to be done is to store 3 extra pieces of information when you create the stateid (the session id, slot id and sequence id for the operation that created the stateid, i.e. 24 bytes of information). When you update the stateid, you also replace the stored extra information with the new session id, slot id and sequence id for the operation that caused the stateid's sequence number to be bumped. Then in the CB_OFFLOAD callback, when you look up the stateid, you also look up the 3 stored values and put them in the CB_SEQUENCE op. That's literally all that is needed for the client to be able to figure out the order in which it needs to process the operations. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx