Re: [PATCH v3] NFSv4: replace seqcount_t with a rw_semaphore

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 31 Oct 2016 16:11:02 +0000

> On Oct 31, 2016, at 11:56, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
> 
> On 2016-10-31 15:30:02 [+0000], Trond Myklebust wrote:
>>> On Oct 31, 2016, at 09:19, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
>>> The list_for_each_entry() in nfs4_reclaim_open_state:
>>> It seems that this lock protects the ->so_states list among other
>>> atomic_t & flags members. So at the begin of the loop we inc ->count
>>> ensuring that this field is not removed while we use it. So we drop the
>>> ->so_lock loc during the loop it seems. And after nfs4_reclaim_locks()
>>> invocation we nfs4_put_open_state() and grab the ->so_lock again. So if
>>> we were the last user of this struct and we remove it, then the
>>> following list_next_entry() invocation is a use-after-free. Even if we
>>> use list_for_each_entry_safe() there is no guarantee that the following
>>> member is still valid because it might have been removed by someone
>>> invoking nfs4_put_open_state() on it, right?
>>> So there is this.
>>> 
>>> However to address my initial problem I have here a patch :) So it uses
>>> a rw_semaphore which ensures that there is only one writer at a time or
>>> multiple reader. So it should be basically what is happening now plus a
>>> tiny tiny tiny lock plus lockdep coverage. I tried to this myself but I
>>> don't manage to get into this code path at all so I might be doing
>>> something wrong.
>>> 
>>> Could you please check if this patch is working for you and whether my
>>> list_for_each_entry() observation is correct or not?
>>> 
>>> v2…v3: replace the seqlock with a RW semaphore.
>>> 
>> 
>> NACK. That will deadlock. The reason why we use a seqlock there is precisely because we cannot allow ordinary RPC calls to lock out the recovery thread. 
> Hmmm. So this is getting invoked if I reboot the server? A restart of
> nfs-kernel-server is the same thing?
> Is the list_for_each_entry() observation I made correct?

Yes, and yes. We can’t rely on the list pointers remaining correct, so we restart the list scan and we use the ops->state_flag_bit to signal whether or not state has been recovered for the entry being scanned.

> 
>> If the server reboots, then ordinary RPC calls will fail until the recovery thread has had a chance to re-establish the state.
> 
> This means that the ordinary RPC call won't return and fail but wait
> with the lock held until the recovery thread did its thing?

It uses the seqcount_t to figure out if a recovery occurred while an OPEN or CLOSE was being processed. If so, it schedules a new recovery of the stateids in question.��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥