Re: About raid5 lock up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/21/19 3:14 PM, Guoqing Jiang wrote:

My understanding is that there could be two possible reasons for lockup:

1. There is deadlock inside raid5 code somewhere which should be fixed.
2. Since spin_lock/unlock_irq are called in raid5_get_active_stripe, then if the function need to handle massive IOs, could it possible hard lockup was triggered due to IRQs are disabled more
than 10s? If so, maybe we need to touch nmi watchdog before disable irq.

Coly and Dennis had reported raid5 lock up issues with different kernel versions (4.2.8, 4.7.0-rc7 and 4.8-rc5), not sure it they are related or not, but I guess it is not fixed in latest code.

Any thoughts?

[1]. https://marc.info/?l=linux-raid&m=150348807422853&w=2
[2]. https://marc.info/?l=linux-raid&m=146883211430999&w=2

Hmm, looks release_stripe_plug is lock free, means it could add sh to plug list whether it is already in other lists. For example, the same sh could be handled by raid5_release_stripe which is also lock
free, it does two things in case "sh->count == 1":

1.  add sh to released_stripes

Or

2. go to slow path if sh is already set with ON_RELEASE_LIST.

Either 1 or 2 could trigger do_release_stripe finally, this function mainly move sh->lru to different lists
such as delayed_list, handle_list or temp_inactive_list etc.

So list is corrupted since one node could in different lists, which makes raid5_unplug could stick in the while loop with hold device_lock, means raid5_get_active_stripe doesn't have chance to get device_lock,
then calltrace happens.

Call Trace:
[<ffffffff81598060>] _raw_spin_lock+0x20/0x30
[<ffffffffa0230b0a>] raid5_get_active_stripe+0x1da/0x5250 [raid456]
[<ffffffff8112d165>] ? mempool_alloc_slab+0x15/0x20
[<ffffffffa0231174>] raid5_get_active_stripe+0x844/0x5250 [raid456]
[<ffffffff812d5574>] ? generic_make_request+0x24/0x2b0
[<ffffffff810938b0>] ? wait_woken+0x90/0x90
[<ffffffff814a2adc>] md_make_request+0xfc/0x250
[<ffffffff812d5867>] submit_bio+0x67/0x150


If the analysis is correct, we at least need to add more checks like:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 09d4caf674e0..caab91ce60ec 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5482,7 +5482,9 @@ static void release_stripe_plug(struct mddev *mddev,
                        INIT_LIST_HEAD(cb->temp_inactive_list + i);
        }

-       if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
+       if (atomic_read(&sh->count) != 0 &&
+           !test_bit(STRIPE_ON_RELEASE_LIST, &sh->state) &&
+           !test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
                list_add_tail(&sh->lru, &cb->list);
        else
                raid5_release_stripe(sh);

Thoughts?

Thanks,
Guoqing



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux