Re: About raid5 lock up

Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> · Fri, 29 Nov 2019 12:15:34 +0100

On 11/21/19 3:14 PM, Guoqing Jiang wrote:

My understanding is that there could be two possible reasons for lockup:

1. There is deadlock inside raid5 code somewhere which should be fixed.
2. Since spin_lock/unlock_irq are called in raid5_get_active_stripe, 
then if the function need to
handle massive IOs, could it possible hard lockup was triggered due to 
IRQs are disabled more
than 10s? If so, maybe we need to touch nmi watchdog before disable irq.

Coly and Dennis had reported raid5 lock up issues with different 
kernel versions (4.2.8, 4.7.0-rc7
and 4.8-rc5), not sure it they are related or not, but I guess it is 
not fixed in latest code.

Any thoughts?

[1]. https://marc.info/?l=linux-raid&m=150348807422853&w=2
[2]. https://marc.info/?l=linux-raid&m=146883211430999&w=2

Hmm, looks release_stripe_plug is lock free, means it could add sh to 
plug list whether it is already
in other lists. For example, the same sh could be handled by 
raid5_release_stripe which is also lock
free, it does two things in case "sh->count == 1":

1.  add sh to released_stripes

Or

2. go to slow path if sh is already set with ON_RELEASE_LIST.

Either 1 or 2 could trigger do_release_stripe finally, this function 
mainly move sh->lru to different lists
such as delayed_list, handle_list or temp_inactive_list etc.

So list is corrupted since one node could in different lists, which 
makes raid5_unplug could stick in the
while loop with hold device_lock, means raid5_get_active_stripe doesn't 
have chance to get device_lock,
then calltrace happens.

Call Trace:
[<ffffffff81598060>] _raw_spin_lock+0x20/0x30
[<ffffffffa0230b0a>] raid5_get_active_stripe+0x1da/0x5250 [raid456]
[<ffffffff8112d165>] ? mempool_alloc_slab+0x15/0x20
[<ffffffffa0231174>] raid5_get_active_stripe+0x844/0x5250 [raid456]
[<ffffffff812d5574>] ? generic_make_request+0x24/0x2b0
[<ffffffff810938b0>] ? wait_woken+0x90/0x90
[<ffffffff814a2adc>] md_make_request+0xfc/0x250
[<ffffffff812d5867>] submit_bio+0x67/0x150


If the analysis is correct, we at least need to add more checks like:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 09d4caf674e0..caab91ce60ec 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5482,7 +5482,9 @@ static void release_stripe_plug(struct mddev *mddev,
                        INIT_LIST_HEAD(cb->temp_inactive_list + i);
        }

-       if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
+       if (atomic_read(&sh->count) != 0 &&
+           !test_bit(STRIPE_ON_RELEASE_LIST, &sh->state) &&
+           !test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
                list_add_tail(&sh->lru, &cb->list);
        else
                raid5_release_stripe(sh);

Thoughts?

Thanks,
Guoqing