On Tue, Jul 21, 2020 at 7:26 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > > > On Mar 3, 2020, at 1:14 PM, Vitaly Mayatskikh <vmayatskikh@xxxxxxxxxxxxxxxx> wrote: > > > > When disk failure happens and the array has a spare drive, resync thread > > kicks in and starts to refill the spare. However it may get blocked by > > a retry thread that resubmits failed IO to a mirror and itself can get > > blocked on a barrier raised by the resync thread. > > > > Signed-off-by: Vitaly Mayatskikh <vmayatskikh@xxxxxxxxxxxxxxxx> > > --- > > drivers/md/raid10.c | 14 +++++++++++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > > index ec136e4..f1a8e26 100644 > > --- a/drivers/md/raid10.c > > +++ b/drivers/md/raid10.c > > @@ -980,6 +980,7 @@ static void wait_barrier(struct r10conf *conf) > > { > > spin_lock_irq(&conf->resync_lock); > > if (conf->barrier) { > > + struct bio_list *bio_list = current->bio_list; > > conf->nr_waiting++; > > /* Wait for the barrier to drop. > > * However if there are already pending > > @@ -994,9 +995,16 @@ static void wait_barrier(struct r10conf *conf) > > wait_event_lock_irq(conf->wait_barrier, > > !conf->barrier || > > (atomic_read(&conf->nr_pending) && > > - current->bio_list && > > - (!bio_list_empty(¤t->bio_list[0]) || > > - !bio_list_empty(¤t->bio_list[1]))), > > + bio_list && > > + (!bio_list_empty(&bio_list[0]) || > > + !bio_list_empty(&bio_list[1]))) || > > + /* move on if recovery thread is > > + * blocked by us > > + */ > > + (conf->mddev->thread->tsk == current && > > + test_bit(MD_RECOVERY_RUNNING, > > + &conf->mddev->recovery) && > > + conf->nr_queued > 0), > > conf->resync_lock); > > conf->nr_waiting--; > > if (!conf->nr_waiting) > > — > > 1.8.3.1 > > > > Song, Have you had a chance to look at this patch? > We would like to have it pulled in to the kernel. I am sorry I missed this one. This looks good to me. Nigel, would you like to add your Reviewed-by, or Acked-by, or Tested-by tag? Thanks, Song