Re: [PATCH 1/1] md/raid10: avoid deadlock on recovery.

Nigel Croxon <ncroxon@xxxxxxxxxx> · Tue, 21 Jul 2020 10:26:35 -0400

> On Mar 3, 2020, at 1:14 PM, Vitaly Mayatskikh <vmayatskikh@xxxxxxxxxxxxxxxx> wrote:
> 
> When disk failure happens and the array has a spare drive, resync thread
> kicks in and starts to refill the spare. However it may get blocked by
> a retry thread that resubmits failed IO to a mirror and itself can get
> blocked on a barrier raised by the resync thread.
> 
> Signed-off-by: Vitaly Mayatskikh <vmayatskikh@xxxxxxxxxxxxxxxx>
> ---
> drivers/md/raid10.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index ec136e4..f1a8e26 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -980,6 +980,7 @@ static void wait_barrier(struct r10conf *conf)
> {
> 	spin_lock_irq(&conf->resync_lock);
> 	if (conf->barrier) {
> +		struct bio_list *bio_list = current->bio_list;
> 		conf->nr_waiting++;
> 		/* Wait for the barrier to drop.
> 		 * However if there are already pending
> @@ -994,9 +995,16 @@ static void wait_barrier(struct r10conf *conf)
> 		wait_event_lock_irq(conf->wait_barrier,
> 				    !conf->barrier ||
> 				    (atomic_read(&conf->nr_pending) &&
> -				     current->bio_list &&
> -				     (!bio_list_empty(&current->bio_list[0]) ||
> -				      !bio_list_empty(&current->bio_list[1]))),
> +				     bio_list &&
> +				     (!bio_list_empty(&bio_list[0]) ||
> +				      !bio_list_empty(&bio_list[1]))) ||
> +				     /* move on if recovery thread is
> +				      * blocked by us
> +				      */
> +				     (conf->mddev->thread->tsk == current &&
> +				      test_bit(MD_RECOVERY_RUNNING,
> +					       &conf->mddev->recovery) &&
> +				      conf->nr_queued > 0),
> 				    conf->resync_lock);
> 		conf->nr_waiting--;
> 		if (!conf->nr_waiting)
> — 
> 1.8.3.1
> 

Song, Have you had a chance to look at this patch?
We would like to have it pulled in to the kernel.

-Nigel