Re: BUG: soft lockup in [md4_raid5:21137]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 29, 2009 at 2:24 AM, Holger Kiehl <Holger.Kiehl@xxxxxx> wrote:
> On Fri, 18 Sep 2009, Dan Williams wrote:
> __async_schedule+0x10e/0x130
>   Sep 29 09:02:15 apollo kernel: [<ffffffff8108ad7e>] ?
> async_schedule_domain+0x1c/0x32
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81379852>] ? raid5d+0x3f8/0x44c
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81446b68>] ?
> _spin_unlock_irqrestore+0x21/0x3c
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81383db1>] ?
> md_thread+0x100/0x132
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81084113>] ?
> autoremove_wake_function+0x0/0x5a
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81383cb1>] ? md_thread+0x0/0x132
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81083d04>] ? kthread+0x89/0x91
>   Sep 29 09:02:15 apollo kernel: [<ffffffff8102f36a>] ? child_rip+0xa/0x20
>   Sep 29 09:02:15 apollo kernel: [<ffffffff81083c7b>] ? kthread+0x0/0x91
>   Sep 29 09:02:15 apollo kernel: [<ffffffff8102f360>] ? child_rip+0x0/0x20
>
> The system also becomes very unresponsive. How can I fix this since it
> looks your patch does not apply to 2.6.32-rc1. Or is this another bug,
> since I have enabled CONFIG_MULTICORE_RAID456.

This is a new issue with the (experimental) multicore implementation.
If you turn that off then you will be using the same single threaded
flow as 2.6.31.  If you want to play with the multicore option a bit
more the patch below should squelch the softlockup.  However, I
suspect we will need our own md specific thread pool because the
current implementation spends too much effort bouncing stripes between
the async thread pool and raid5d.

--
Dan

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 1898eda..733d658 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4357,6 +4357,7 @@ static void __process_stripe(void *param,
async_cookie_t cookie)
 static void process_stripe(struct stripe_head *sh, struct list_head *domain)
 {
        async_schedule_domain(__process_stripe, sh, domain);
+       cond_resched();
 }

 static void synchronize_stripe_processing(struct list_head *domain)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux