Re: [PATCH V3 1/2] RAID1: a new I/O barrier implementation to remove resync window

NeilBrown <neilb@xxxxxxxx> · Mon, 20 Feb 2017 13:51:22 +1100

On Mon, Feb 20 2017, NeilBrown wrote:

> On Fri, Feb 17 2017, Coly Li wrote:
>
>> On 2017/2/16 下午3:04, NeilBrown wrote:
>>> I know you are going to change this as Shaohua wantsthe spitting to
>>> happen in a separate function, which I agree with, but there is 
>>> something else wrong here. Calling bio_split/bio_chain repeatedly
>>> in a loop is dangerous. It is OK for simple devices, but when one
>>> request can wait for another request to the same device it can
>>> deadlock. This can happen with raid1.  If a resync request calls
>>> raise_barrier() between one request and the next, then the next has
>>> to wait for the resync request, which has to wait for the first
>>> request. As the first request will be stuck in the queue in 
>>> generic_make_request(), you get a deadlock.
>>
>> For md raid1, queue in generic_make_request(), can I understand it as
>> bio_list_on_stack in this function? And queue in underlying device,
>> can I understand it as the data structures like plug->pending and
>> conf->pending_bio_list ?
>
> Yes, the queue in generic_make_request() is the bio_list_on_stack.  That
> is the only queue I am talking about.  I'm not referring to
> plug->pending or conf->pending_bio_list at all.
>
>>
>> I still don't get the point of deadlock, let me try to explain why I
>> don't see the possible deadlock. If a bio is split, and the first part
>> is processed by make_request_fn(), and then a resync comes and it will
>> raise a barrier, there are 3 possible conditions,
>> - the resync I/O tries to raise barrier on same bucket of the first
>> regular bio. Then the resync task has to wait to the first bio drops
>> its conf->nr_pending[idx]
>
> Not quite.
> First, the resync task (in raise_barrier()) will wait for ->nr_waiting[idx]
> to be zero.  We can assume this happens immediately.
> Then the resync_task will increment ->barrier[idx].
> Only then will it wait for the first bio to drop ->nr_pending[idx].
> The processing of that first bio will have submitted bios to the
> underlying device, and they will be in the bio_list_on_stack queue, and
> will not be processed until raid1_make_request() completes.
>
> The loop in raid1_make_request() will then call make_request_fn() which
> will call wait_barrier(), which will wait for ->barrier[idx] to be
> zero.

Thinking more carefully about this.. the 'idx' that the second bio will
wait for will normally be different, so there won't be a deadlock after
all.

However it is possible for hash_long() to produce the same idx for two
consecutive barrier_units so there is still the possibility of a
deadlock, though it isn't as likely as I thought at first.

NeilBrown

>
> So raid1_make_request is waiting for the resync to progress, and resync
> is waiting for a bio which is on bio_list_on_stack which won't be
> processed until raid1_make_request() completes.
>
> NeilBrown
Attachment:
signature.asc

Description: PGP signature