On Mon, Aug 12, 2019 at 4:00 PM J. Bruce Fields <bfields@xxxxxxxxxx> wrote: > > On Mon, Aug 12, 2019 at 03:16:47PM -0400, Olga Kornievskaia wrote: > > On Mon, Aug 12, 2019 at 12:19 PM Olga Kornievskaia > > <olga.kornievskaia@xxxxxxxxx> wrote: > > > While this passes my testing, in theory this allows for the race that > > > we get the copy notify size but then offload_cancel arrive and change > > > the value. Then refcount_sub_and test_check would have an incorrect > > > value (can subtract larger than an actual reference count). I have no > > > solution for that as there is no refcount_sub_and_lock() that will > > > allow to decrement by a multiple under a lock. Thoughts? > > > > I tried not to use the client's cl_lock but instead use a specific > > lock to protect the copy notifies stateid on the stateid list. But > > since stateid's reference counter (sc_count) is protected by it, I > > think by getting rid of the special lock and using cl_lock will solve > > the problem of coordinating access between the sc_count and the > > copy_notify stateid list. Are the any problems with using such a big > > lock? > > Probably not. But it can be confusing when a single lock is used for > several different things. A comment explaining why you need it might > help. While holding the client's cl_lock to manipulate the list of copy notify stateids solves the refcount problem. It generates a different problem for the laundromat thread. There, client list is traversed already holding the cl_lock, so I can't call routines to free copy_notify stateid because in turn it calls nfs4_put_stid() which wants to take the cl_lock. Putting the copy_notify stateid on the reaplist and then I lose a pointer to the client structure that I need to take the lock. Then it seems the nfs4_cpntf_state structure would need to keep a pointer to the client structure but then I get a problem of making sure the nfs4_client structure isn't going away and because it even a bigger mess. I think I need to remove the code in the laundromat that looks for the not referenced copy_notifies stateid and just rely on cleaning on the removal of the stateid (basically what I originally had). Or I need to rely on the client to always send FREE_STATEID. I don't see other options, do you? > > --b.