> On Oct 31, 2016, at 11:56, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > On 2016-10-31 15:30:02 [+0000], Trond Myklebust wrote: >>> On Oct 31, 2016, at 09:19, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: >>> The list_for_each_entry() in nfs4_reclaim_open_state: >>> It seems that this lock protects the ->so_states list among other >>> atomic_t & flags members. So at the begin of the loop we inc ->count >>> ensuring that this field is not removed while we use it. So we drop the >>> ->so_lock loc during the loop it seems. And after nfs4_reclaim_locks() >>> invocation we nfs4_put_open_state() and grab the ->so_lock again. So if >>> we were the last user of this struct and we remove it, then the >>> following list_next_entry() invocation is a use-after-free. Even if we >>> use list_for_each_entry_safe() there is no guarantee that the following >>> member is still valid because it might have been removed by someone >>> invoking nfs4_put_open_state() on it, right? >>> So there is this. >>> >>> However to address my initial problem I have here a patch :) So it uses >>> a rw_semaphore which ensures that there is only one writer at a time or >>> multiple reader. So it should be basically what is happening now plus a >>> tiny tiny tiny lock plus lockdep coverage. I tried to this myself but I >>> don't manage to get into this code path at all so I might be doing >>> something wrong. >>> >>> Could you please check if this patch is working for you and whether my >>> list_for_each_entry() observation is correct or not? >>> >>> v2…v3: replace the seqlock with a RW semaphore. >>> >> >> NACK. That will deadlock. The reason why we use a seqlock there is precisely because we cannot allow ordinary RPC calls to lock out the recovery thread. > Hmmm. So this is getting invoked if I reboot the server? A restart of > nfs-kernel-server is the same thing? > Is the list_for_each_entry() observation I made correct? Yes, and yes. We can’t rely on the list pointers remaining correct, so we restart the list scan and we use the ops->state_flag_bit to signal whether or not state has been recovered for the entry being scanned. > >> If the server reboots, then ordinary RPC calls will fail until the recovery thread has had a chance to re-establish the state. > > This means that the ordinary RPC call won't return and fail but wait > with the lock held until the recovery thread did its thing? It uses the seqcount_t to figure out if a recovery occurred while an OPEN or CLOSE was being processed. If so, it schedules a new recovery of the stateids in question.��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥