Re: [PATCH] md/raid1/10: fix potential deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2017-02-28 22:08 GMT+01:00 Shaohua Li <shli@xxxxxx>:
> Neil Brown pointed out a potential deadlock in raid 10 code with
> bio_split/chain. The raid1 code could have the same issue, but recent
> barrier rework makes it less likely to happen. The deadlock happens in
> below sequence:
>
> 1. generic_make_request(bio), this will set current->bio_list
> 2. raid10_make_request will split bio to bio1 and bio2
> 3. __make_request(bio1), wait_barrer, add underlayer disk bio to
> current->bio_list
> 4. __make_request(bio2), wait_barrer
>
> If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
> raise_barrier waits for IO completion from 3. And since raise_barrier
> sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
> dispatched because raid10_make_request() doesn't finished yet.
>
> The solution is to adjust the IO ordering. Quotes from Neil:
> "
> It is much safer to:
>
>     if (need to split) {
>         split = bio_split(bio, ...)
>         bio_chain(...)
>         make_request_fn(split);
>         generic_make_request(bio);
>    } else
>         make_request_fn(mddev, bio);
>
> This way we first process the initial section of the bio (in 'split')
> which will queue some requests to the underlying devices.  These
> requests will be queued in generic_make_request.
> Then we queue the remainder of the bio, which will be added to the end
> of the generic_make_request queue.
> Then we return.
> generic_make_request() will pop the lower-level device requests off the
> queue and handle them first.  Then it will process the remainder
> of the original bio once the first section has been fully processed.
> "
>
> Cc: Coly Li <colyli@xxxxxxx>
> Cc: 王金浦 <jinpuwang@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx (v3.14+, only the raid10 part)
> Suggested-by: NeilBrown <neilb@xxxxxxxx>
> Signed-off-by: Shaohua Li <shli@xxxxxx>
> ---
>  drivers/md/raid1.c  | 25 +++++++++++++++++++++++--
>  drivers/md/raid10.c | 18 ++++++++++++++++++
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 551d654..3c5933b 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1584,9 +1584,30 @@ static void raid1_make_request(struct mddev *mddev, struct bio *bio)
>                         split = bio;
>                 }
>
> -               if (bio_data_dir(split) == READ)
> +               if (bio_data_dir(split) == READ) {
>                         raid1_read_request(mddev, split);
> -               else
> +
> +                       /*
> +                        * If a bio is splitted, the first part of bio will
> +                        * pass barrier but the bio is queued in
> +                        * current->bio_list (see generic_make_request). If
> +                        * there is a raise_barrier() called here, the second
> +                        * part of bio can't pass barrier. But since the first
> +                        * part bio isn't dispatched to underlaying disks yet,
> +                        * the barrier is never released, hence raise_barrier
> +                        * will alays wait. We have a deadlock.
> +                        * Note, this only happens in read path. For write
> +                        * path, the first part of bio is dispatched in a
> +                        * schedule() call (because of blk plug) or offloaded
> +                        * to raid10d.
> +                        * Quitting from the function immediately can change
> +                        * the bio order queued in bio_list and avoid the deadlock.
> +                        */
> +                       if (split != bio) {
> +                               generic_make_request(bio);
> +                               break;
> +                       }
> +               } else
>                         raid1_write_request(mddev, split);
>         } while (split != bio);
>  }
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index c4db6d1..b1b1f98 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1584,7 +1584,25 @@ static void raid10_make_request(struct mddev *mddev, struct bio *bio)
>                         split = bio;
>                 }
>
> +               /*
> +                * If a bio is splitted, the first part of bio will pass
> +                * barrier but the bio is queued in current->bio_list (see
> +                * generic_make_request). If there is a raise_barrier() called
> +                * here, the second part of bio can't pass barrier. But since
> +                * the first part bio isn't dispatched to underlaying disks
> +                * yet, the barrier is never released, hence raise_barrier will
> +                * alays wait. We have a deadlock.
> +                * Note, this only happens in read path. For write path, the
> +                * first part of bio is dispatched in a schedule() call
> +                * (because of blk plug) or offloaded to raid10d.
> +                * Quitting from the function immediately can change the bio
> +                * order queued in bio_list and avoid the deadlock.
> +                */
>                 __make_request(mddev, split);
> +               if (split != bio && bio_data_dir(bio) == READ) {
> +                       generic_make_request(bio);
> +                       break;
> +               }
>         } while (split != bio);
>
>         /* In case raid10d snuck in to freeze_array */
> --
> 2.9.3
>
Looks good to me!
Reviewed-by:  Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx>

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux