> On Oct 31, 2016, at 09:19, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > The raw_write_seqcount_begin() in nfs4_reclaim_open_state() bugs me > because it maps to preempt_disable() in -RT which I can't have at this > point. So I took a look at the code. > It the lockdep part was removed in commit abbec2da13f0 ("NFS: Use > raw_write_seqcount_begin/end int nfs4_reclaim_open_state") because > lockdep complained. The whole seqcount thing was introduced in commit > c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as > being recovered"). > Trond Myklebust confirms that there i only one writer thread at > nfs4_reclaim_open_state() > > The list_for_each_entry() in nfs4_reclaim_open_state: > It seems that this lock protects the ->so_states list among other > atomic_t & flags members. So at the begin of the loop we inc ->count > ensuring that this field is not removed while we use it. So we drop the > ->so_lock loc during the loop it seems. And after nfs4_reclaim_locks() > invocation we nfs4_put_open_state() and grab the ->so_lock again. So if > we were the last user of this struct and we remove it, then the > following list_next_entry() invocation is a use-after-free. Even if we > use list_for_each_entry_safe() there is no guarantee that the following > member is still valid because it might have been removed by someone > invoking nfs4_put_open_state() on it, right? > So there is this. > > However to address my initial problem I have here a patch :) So it uses > a rw_semaphore which ensures that there is only one writer at a time or > multiple reader. So it should be basically what is happening now plus a > tiny tiny tiny lock plus lockdep coverage. I tried to this myself but I > don't manage to get into this code path at all so I might be doing > something wrong. > > Could you please check if this patch is working for you and whether my > list_for_each_entry() observation is correct or not? > > v2…v3: replace the seqlock with a RW semaphore. > NACK. That will deadlock. The reason why we use a seqlock there is precisely because we cannot allow ordinary RPC calls to lock out the recovery thread. If the server reboots, then ordinary RPC calls will fail until the recovery thread has had a chance to re-establish the state. ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥