On Fri, Nov 16, 2018 at 1:01 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Fri, Nov 16, 2018 at 12:56:45PM -0500, J. Bruce Fields wrote: > > On Fri, Nov 16, 2018 at 11:25:50AM -0500, Olga Kornievskaia wrote: > > > On Fri, Nov 16, 2018 at 9:27 AM J. Bruce Fields <bfields@xxxxxxxxxxxx> > > > wrote: > > > > > > > From bc0c9079b48d "NFS handle COPY reply CB_OFFLOAD call race": > > > > > > > > + spin_lock(&server->nfs_client->cl_lock); > > > > + list_for_each_entry(copy, &server->nfs_client->pending_cb_stateids, > > > > + copies) { > > > > + if (memcmp(&res->write_res.stateid, ©->stateid, > > > > + NFS4_STATEID_SIZE)) > > > > + continue; > > > > + found_pending = true; > > > > + list_del(©->copies); > > > > + break; > > > > + } > > > > + if (found_pending) { > > > > + spin_unlock(&server->nfs_client->cl_lock); > > > > + goto out; > > > > + } > > > > > > > > copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_NOFS); > > > > > > > > At this point we're still holding cl_lock. > > > > > > > > Best might be to allocate "copy" before taking the lock, then free it on > > > > any > > > > paths where we don't end up needing it. > > > > > > > > > > > Thanks. I'll do that. > > > > Thanks. And, I just noticed--nfs4_callback_offload has the same > > problem. nfs4_callback_offload is where I changed it. I see now, handle_async_copy() in nfs42proc.c also has it. > By the way, I don't understand the create case in that code--if you get > a CB_OFFLOAD without already having a matching copy stateid, shouldn't > you just return an error and forget about it? Then how does the copy knows not to go wait for the callback? Copy checks the pending_callback list to see if received a callback. If not, it puts itself on the copy list and goes to sleep. The callback, checks the copy list and if it finds a copy signals it, if not it puts itself on the pending_callback list. a lock is held over checking one list and putting yourself on the other. > I also wonder if SERVERFAULT is really the best error for a memory > allocation failure there. I guess EIO or ENOMEM might be better. But I don't think this error gets returned anywhere to the main process. > > --b.