On Mon, Apr 09, 2018 at 09:50:44AM +0800, Yufen Yu wrote: > We met a sync thread stuck as follows: > > raid1_sync_request+0x2c9/0xb50 > md_do_sync+0x983/0xfa0 > md_thread+0x11c/0x160 > kthread+0x111/0x130 > ret_from_fork+0x35/0x40 > 0xffffffffffffffff > > At the same time, there is a stuck mdadm thread (mdadm --manage > /dev/md2 --add /dev/sda). It is trying to stop the sync thread: > > kthread_stop+0x42/0xf0 > md_unregister_thread+0x3a/0x70 > md_reap_sync_thread+0x15/0x160 > action_store+0x142/0x2a0 > md_attr_store+0x6c/0xb0 > kernfs_fop_write+0x102/0x180 > __vfs_write+0x33/0x170 > vfs_write+0xad/0x1a0 > SyS_write+0x52/0xc0 > do_syscall_64+0x6e/0x190 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > Debug tools show that the sync thread is waiting in raise_barrier(), > until raid1d() end all normal IO bios into bio_end_io_list(introduced > in commit 55ce74d4bfe1). But, raid1d() cannot end these bios if > MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock > and then clear the bit in md_check_recovery(). > However, the lock is holding by mdadm in action_store(). > > Thus, there is a loop: > mdadm waiting for sync thread to stop, sync thread waiting for > raid1d() to end bios, raid1d() waiting for mdadm to release > mddev->reconfig_mutex lock and then it can end bios. > > Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(), > so that sync thread can exit while mdadm is stoping the sync thread. > > Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") > Signed-off-by: Jason Yan <yanaijie@xxxxxxxxxx> > Signed-off-by: Yufen Yu <yuyufen@xxxxxxxxxx> Applied, thanks! > --- > drivers/md/raid1.c | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index fe872dc6712e..df486bd7ddf5 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -854,7 +854,7 @@ static void flush_pending_writes(struct r1conf *conf) > * there is no normal IO happeing. It must arrange to call > * lower_barrier when the particular background IO completes. > */ > -static void raise_barrier(struct r1conf *conf, sector_t sector_nr) > +static sector_t raise_barrier(struct r1conf *conf, sector_t sector_nr) > { > int idx = sector_to_idx(sector_nr); > > @@ -885,13 +885,23 @@ static void raise_barrier(struct r1conf *conf, sector_t sector_nr) > * max resync count which allowed on current I/O barrier bucket. > */ > wait_event_lock_irq(conf->wait_barrier, > - !conf->array_frozen && > + (!conf->array_frozen && > !atomic_read(&conf->nr_pending[idx]) && > - atomic_read(&conf->barrier[idx]) < RESYNC_DEPTH, > + atomic_read(&conf->barrier[idx]) < RESYNC_DEPTH) || > + test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery), > conf->resync_lock); > > + if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) { > + atomic_dec(&conf->barrier[idx]); > + spin_unlock_irq(&conf->resync_lock); > + wake_up(&conf->wait_barrier); > + return -EINTR; > + } > + > atomic_inc(&conf->nr_sync_pending); > spin_unlock_irq(&conf->resync_lock); > + > + return 0; > } > > static void lower_barrier(struct r1conf *conf, sector_t sector_nr) > @@ -2662,9 +2672,12 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, > > bitmap_cond_end_sync(mddev->bitmap, sector_nr, > mddev_is_clustered(mddev) && (sector_nr + 2 * RESYNC_SECTORS > conf->cluster_sync_high)); > - r1_bio = raid1_alloc_init_r1buf(conf); > > - raise_barrier(conf, sector_nr); > + > + if (raise_barrier(conf, sector_nr)) > + return 0; > + > + r1_bio = raid1_alloc_init_r1buf(conf); > > rcu_read_lock(); > /* > -- > 2.13.6 > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html