On Mon, Feb 20 2017, NeilBrown wrote: > On Fri, Feb 17 2017, Coly Li wrote: > >> On 2017/2/16 下午3:04, NeilBrown wrote: >>> I know you are going to change this as Shaohua wantsthe spitting to >>> happen in a separate function, which I agree with, but there is >>> something else wrong here. Calling bio_split/bio_chain repeatedly >>> in a loop is dangerous. It is OK for simple devices, but when one >>> request can wait for another request to the same device it can >>> deadlock. This can happen with raid1. If a resync request calls >>> raise_barrier() between one request and the next, then the next has >>> to wait for the resync request, which has to wait for the first >>> request. As the first request will be stuck in the queue in >>> generic_make_request(), you get a deadlock. >> >> For md raid1, queue in generic_make_request(), can I understand it as >> bio_list_on_stack in this function? And queue in underlying device, >> can I understand it as the data structures like plug->pending and >> conf->pending_bio_list ? > > Yes, the queue in generic_make_request() is the bio_list_on_stack. That > is the only queue I am talking about. I'm not referring to > plug->pending or conf->pending_bio_list at all. > >> >> I still don't get the point of deadlock, let me try to explain why I >> don't see the possible deadlock. If a bio is split, and the first part >> is processed by make_request_fn(), and then a resync comes and it will >> raise a barrier, there are 3 possible conditions, >> - the resync I/O tries to raise barrier on same bucket of the first >> regular bio. Then the resync task has to wait to the first bio drops >> its conf->nr_pending[idx] > > Not quite. > First, the resync task (in raise_barrier()) will wait for ->nr_waiting[idx] > to be zero. We can assume this happens immediately. > Then the resync_task will increment ->barrier[idx]. > Only then will it wait for the first bio to drop ->nr_pending[idx]. > The processing of that first bio will have submitted bios to the > underlying device, and they will be in the bio_list_on_stack queue, and > will not be processed until raid1_make_request() completes. > > The loop in raid1_make_request() will then call make_request_fn() which > will call wait_barrier(), which will wait for ->barrier[idx] to be > zero. Thinking more carefully about this.. the 'idx' that the second bio will wait for will normally be different, so there won't be a deadlock after all. However it is possible for hash_long() to produce the same idx for two consecutive barrier_units so there is still the possibility of a deadlock, though it isn't as likely as I thought at first. NeilBrown > > So raid1_make_request is waiting for the resync to progress, and resync > is waiting for a bio which is on bio_list_on_stack which won't be > processed until raid1_make_request() completes. > > NeilBrown
Attachment:
signature.asc
Description: PGP signature