2017-02-28 22:08 GMT+01:00 Shaohua Li <shli@xxxxxx>: > Neil Brown pointed out a potential deadlock in raid 10 code with > bio_split/chain. The raid1 code could have the same issue, but recent > barrier rework makes it less likely to happen. The deadlock happens in > below sequence: > > 1. generic_make_request(bio), this will set current->bio_list > 2. raid10_make_request will split bio to bio1 and bio2 > 3. __make_request(bio1), wait_barrer, add underlayer disk bio to > current->bio_list > 4. __make_request(bio2), wait_barrer > > If raise_barrier happens between 3 & 4, since wait_barrier runs at 3, > raise_barrier waits for IO completion from 3. And since raise_barrier > sets barrier, 4 waits for raise_barrier. But IO from 3 can't be > dispatched because raid10_make_request() doesn't finished yet. > > The solution is to adjust the IO ordering. Quotes from Neil: > " > It is much safer to: > > if (need to split) { > split = bio_split(bio, ...) > bio_chain(...) > make_request_fn(split); > generic_make_request(bio); > } else > make_request_fn(mddev, bio); > > This way we first process the initial section of the bio (in 'split') > which will queue some requests to the underlying devices. These > requests will be queued in generic_make_request. > Then we queue the remainder of the bio, which will be added to the end > of the generic_make_request queue. > Then we return. > generic_make_request() will pop the lower-level device requests off the > queue and handle them first. Then it will process the remainder > of the original bio once the first section has been fully processed. > " > > Cc: Coly Li <colyli@xxxxxxx> > Cc: 王金浦 <jinpuwang@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx (v3.14+, only the raid10 part) > Suggested-by: NeilBrown <neilb@xxxxxxxx> > Signed-off-by: Shaohua Li <shli@xxxxxx> > --- > drivers/md/raid1.c | 25 +++++++++++++++++++++++-- > drivers/md/raid10.c | 18 ++++++++++++++++++ > 2 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 551d654..3c5933b 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1584,9 +1584,30 @@ static void raid1_make_request(struct mddev *mddev, struct bio *bio) > split = bio; > } > > - if (bio_data_dir(split) == READ) > + if (bio_data_dir(split) == READ) { > raid1_read_request(mddev, split); > - else > + > + /* > + * If a bio is splitted, the first part of bio will > + * pass barrier but the bio is queued in > + * current->bio_list (see generic_make_request). If > + * there is a raise_barrier() called here, the second > + * part of bio can't pass barrier. But since the first > + * part bio isn't dispatched to underlaying disks yet, > + * the barrier is never released, hence raise_barrier > + * will alays wait. We have a deadlock. > + * Note, this only happens in read path. For write > + * path, the first part of bio is dispatched in a > + * schedule() call (because of blk plug) or offloaded > + * to raid10d. > + * Quitting from the function immediately can change > + * the bio order queued in bio_list and avoid the deadlock. > + */ > + if (split != bio) { > + generic_make_request(bio); > + break; > + } > + } else > raid1_write_request(mddev, split); > } while (split != bio); > } > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index c4db6d1..b1b1f98 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -1584,7 +1584,25 @@ static void raid10_make_request(struct mddev *mddev, struct bio *bio) > split = bio; > } > > + /* > + * If a bio is splitted, the first part of bio will pass > + * barrier but the bio is queued in current->bio_list (see > + * generic_make_request). If there is a raise_barrier() called > + * here, the second part of bio can't pass barrier. But since > + * the first part bio isn't dispatched to underlaying disks > + * yet, the barrier is never released, hence raise_barrier will > + * alays wait. We have a deadlock. > + * Note, this only happens in read path. For write path, the > + * first part of bio is dispatched in a schedule() call > + * (because of blk plug) or offloaded to raid10d. > + * Quitting from the function immediately can change the bio > + * order queued in bio_list and avoid the deadlock. > + */ > __make_request(mddev, split); > + if (split != bio && bio_data_dir(bio) == READ) { > + generic_make_request(bio); > + break; > + } > } while (split != bio); > > /* In case raid10d snuck in to freeze_array */ > -- > 2.9.3 > Looks good to me! Reviewed-by: Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx> Thanks!