Re: raid1: freeze_array/wait_all_barriers deadlock

Coly Li <colyli@xxxxxxx> · Tue, 17 Oct 2017 00:49:19 +0800

On 2017/10/16 下午11:34, Nate Dailey wrote:
> Hi Coly, I'm not 100% sure that's going to work.
> 
> What I'm imagining is that close_sync gets the lock, so now
> raid1d/handle_read_error/freeze_array is blocked waiting for it.
> 

Hi Nate,

Let me think about this code path. If freeze_array() is blocked by mutex
in handle_read_error(), it means conf->array_frozen is 0 still.
_wait_barrier() only waits if 1) barrier[idx] is >0, 2) array_frozen is
1. So this situation won't block _wait_barrier() or wait_read_bucket()
in all buckets.

Then the loop to wait barrier for all buckets won't be blocked if
freeze_array() waiting on regular I/O path. But if freeze_array()
happens on resync I/O code path, deadlock is possible.

Fortunately it seems no freeze_array() called on resync I/O code path.
So the mutex seems to be safe so far.

But adding a more lock always makes people nervous, and makes code
complicated.... I don't like this :(

> Is it possible that raid1d might need to complete some sync IO from the
> retry_list, that close_sync/wait_all_barriers is waiting for? In that
> case, there would still be a deadlock.
> 
> Actually, what if close_sync just did this:
> 
> static void close_sync(struct r1conf *conf)
> {
>         int idx;
> 
>         for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
>                 _wait_barrier(conf, idx);
>                 _allow_barrier(conf, idx);

And you may remove wait_all_barriers() and allow_all_barriers(), we have
less code, yeah!

>         }
> ...
> 
> Is there any reason all the _wait_barriers have to complete before
> starting on _allow_barrier? If not, then I think this change would work
> (but will think about it more).
> 

No reason that _allow_barrie() should wait for all _wait_barrier()
completed here. wait_all_barriers() in close_sync() is to make sure all
flying resync I/O finished. So when it is called, no new resync I/O will
be generated, it is used to only wait for flying resync I/O to be
completed. So you modification works, I feel it is safe.

And the above code is much simpler, I like it :-)

Thanks.

Coly Li

> 
> On 10/16/2017 10:43 AM, Coly Li wrote:
>> On 2017/10/16 下午8:58, Nate Dailey wrote:
>>> Hi Coly, I'm not sure I understand the change you're proposing. Would it
>>> be something like the following?
>>>
>>>          spin_lock_irq(&conf->resync_lock);
>>>          conf->array_frozen = 1;
>>>          raid1_log(conf->mddev, "wait freeze");
>>>          while (get_unqueued_pending(conf) != extra) {
>>>              wait_event_lock_irq_cmd_timeout(
>>>                  conf->wait_barrier,
>>>                  get_unqueued_pending(conf) == extra,
>>>                  conf->resync_lock,
>>>                  flush_pending_writes(conf),
>>>                  timeout);
>>>          }
>>>          spin_unlock_irq(&conf->resync_lock);
>>>
>>> On its own, I don't see how this would make any difference. Until
>>> array_frozen == 0, wait_all_barriers will continue to be blocked, which
>>> in turn will prevent the condition freeze_array is waiting on from ever
>>> becoming true.
>> Hi Nate,
>>
>> You are right, this idea does not help too much, we need to find another
>> way.
>>
>>> Or should something else be done inside the new freeze_array loop that
>>> would allow wait_all_barriers to make progress?
>> It seems wait_all_barriers() is only used in close_sync(), which is to
>> make sure all sync requests hit platters before raid1_sync_request()
>> returns.
>>
>> How about setting a critical section in close_sync() and protected by
>> another lock. It might be something like this,
>>
>> static void close_sync(struct r1conf *conf)
>> {
>> +    mutex_lock close_sync_lock;
>>          wait_all_barriers(conf);
>> +    mutex_unlock close_sync_lock;
>>          allow_all_barriers(conf);
>>     [snip]
>> }
>>
>>
>> static void freeze_array(struct r1conf *conf, int extra)
>> {
>> +    mutex_lock close_sync_lock
>>          spin_lock_irq(&conf->resync_lock);
>>     [snip]
>>          spin_unlock_irq(&conf->resync_lock);
>> +    mutex_unlock close_sync_lock
>> }
>>
>> Then conf->array_frozen won't be set when wait_all_barriers() partially
>> iterated on barrier buckets. Then a deadlock can be avoided.
>>
>> How do you think of this one ?
>>
>> Thanks.
>>
>> Coly Li

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html