On Fri, Nov 16, 2018 at 02:49:00PM -0500, Olga Kornievskaia wrote: > On Fri, Nov 16, 2018 at 2:30 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > On Fri, Nov 16, 2018 at 01:52:29PM -0500, Olga Kornievskaia wrote: > > > On Fri, Nov 16, 2018 at 1:30 PM Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > > > Then how does the copy knows not to go wait for the callback? Copy > > > > checks the pending_callback list to see if received a callback. If > > > > not, it puts itself on the copy list and goes to sleep. The callback, > > > > checks the copy list and if it finds a copy signals it, if not it puts > > > > itself on the pending_callback list. a lock is held over checking one > > > > list and putting yourself on the other. > > > > OK, apologies, I don't really understand those data structures yet, but > > something seems wrong to me. > > > > Under what circumstances could we recieve a CB_OFFLOAD without having > > started the corresponding copy already? > > It can receive a CB_OFFLOAD before it receives a reply to the COPY. > It's possible and I can trigger it during testing when doing a really > short copy. The copy is done and callback thread sends a reply. > CB_OFFLOAD call and COPY reply can be switched on the server or on the > processing on the client. That race is discussed in https://tools.ietf.org/html/rfc5661#section-2.10.6.3 and is supposed to be dealt with by using referring triples and/or returning DELAY. > > And shouldn't CB_OFFLOAD be returning bad_stateid in the case it doesn't > > recognize the given stateid? > > It could but what should the server do in this case. I would imagine > it wouldn't do anything. There is nothing it can do. So now we have a > copy that send the call and is going to wait on the reply which will > never come as the 1st one came and we rejected it and now copy will > wait forever. > > Please describe what "is wrong" with the current implementation. I > believe it provide a reasonable solution to the race condition. Looks like a server that sends bad stateids in callbacks could cause you to allocate something that will never get freed. --b. > > It looks like the allocation failure is > > the *only* way we'll return an error on CB_OFFLOAD, and that seems > > wrong. > > Yes it is the only error we currently return. Do you see any other > useful errors that a client should return (and would be useful to > handle on the server). I don't see any need for any more > complications. > > > > > > I also wonder if SERVERFAULT is really the best error for a memory > > > > > allocation failure there. > > > > > > > > I guess EIO or ENOMEM might be better. But I don't think this error > > > > gets returned anywhere to the main process. > > > > > > > > > > Wait. It is returning SERVERFAULT because it's the callback server replying > > > back to the server's CB_RECALL call and I believe SERVERFAULT is the > > > appropriate error here. NFS doesn't have ENOMEM error. > > > > We could return DELAY if we think it might be worth the server trying > > the CB_RECALL again. (That's what nfsd usually returns on allocation > > failures. I don't know if that's really ideal.) > > If the client had any smarts to say correct this error that would be > useful to return but this is not the case. I don't believe there is a