From: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> release_stripe_plug adds lru to unplug list, then raid5_unplug iterates unplug list and release the stripe in the list. But sh->lru could exist in another list as well since there is no protection of the race since release_stripe_plug is lock free. For example, the same sh could be handled by raid5_release_stripe which is lock free too, it does two things in case "sh->count == 1". 1. add sh to released_stripes. Or 2. go to slow path if sh is already set with ON_RELEASE_LIST. Either 1 or 2 could trigger do_release_stripe finally, and this function mainly move sh->lru to different lists such as delayed_list, handle_list or temp_inactive_list etc. Then the same node could be in different lists, which causes raid5_unplug sticks with "while (!list_empty(&cb->list))", and since spin_lock_irq(&conf->device_lock) is called before, it causes: 1. hard lock up in [1], [2] and [3] since irq is disabled. 2. raid5_get_active_stripe can't get device_lock and calltrace happens as follows. [<ffffffff81598060>] _raw_spin_lock+0x20/0x30 [<ffffffffa0230b0a>] raid5_get_active_stripe+0x1da/0x5250 [raid456] [<ffffffff8112d165>] ? mempool_alloc_slab+0x15/0x20 [<ffffffffa0231174>] raid5_get_active_stripe+0x844/0x5250 [raid456] [<ffffffff812d5574>] ? generic_make_request+0x24/0x2b0 [<ffffffff810938b0>] ? wait_woken+0x90/0x90 [<ffffffff814a2adc>] md_make_request+0xfc/0x250 [<ffffffff812d5867>] submit_bio+0x67/0x150 So add two more checkings before add sh->lru to cb->list to avoid potential list corruption. 1. the sh should not be handling by do_release_stripe. 2. ensure the sh is not release list. Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> --- drivers/md/raid5.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index d4d3b67ffbba..70ef2367fa64 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5481,7 +5481,9 @@ static void release_stripe_plug(struct mddev *mddev, INIT_LIST_HEAD(cb->temp_inactive_list + i); } - if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state)) + if (!atomic_read(&sh->count) == 0 && + !test_bit(STRIPE_ON_RELEASE_LIST, &sh->state) && + !test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state)) list_add_tail(&sh->lru, &cb->list); else raid5_release_stripe(sh); -- 2.17.1