Re: BUG: soft lockup in [md4_raid5:21137]

Holger Kiehl <Holger.Kiehl@xxxxxx> · Thu, 1 Oct 2009 13:41:49 +0000 (GMT)

On Tue, 29 Sep 2009, Dan Williams wrote:

On Tue, Sep 29, 2009 at 2:24 AM, Holger Kiehl <Holger.Kiehl@xxxxxx> wrote:
On Fri, 18 Sep 2009, Dan Williams wrote:
__async_schedule+0x10e/0x130
  Sep 29 09:02:15 apollo kernel: [<ffffffff8108ad7e>] ?
async_schedule_domain+0x1c/0x32
  Sep 29 09:02:15 apollo kernel: [<ffffffff81379852>] ? raid5d+0x3f8/0x44c
  Sep 29 09:02:15 apollo kernel: [<ffffffff81446b68>] ?
_spin_unlock_irqrestore+0x21/0x3c
  Sep 29 09:02:15 apollo kernel: [<ffffffff81383db1>] ?
md_thread+0x100/0x132
  Sep 29 09:02:15 apollo kernel: [<ffffffff81084113>] ?
autoremove_wake_function+0x0/0x5a
  Sep 29 09:02:15 apollo kernel: [<ffffffff81383cb1>] ? md_thread+0x0/0x132
  Sep 29 09:02:15 apollo kernel: [<ffffffff81083d04>] ? kthread+0x89/0x91
  Sep 29 09:02:15 apollo kernel: [<ffffffff8102f36a>] ? child_rip+0xa/0x20
  Sep 29 09:02:15 apollo kernel: [<ffffffff81083c7b>] ? kthread+0x0/0x91
  Sep 29 09:02:15 apollo kernel: [<ffffffff8102f360>] ? child_rip+0x0/0x20

The system also becomes very unresponsive. How can I fix this since it
looks your patch does not apply to 2.6.32-rc1. Or is this another bug,
since I have enabled CONFIG_MULTICORE_RAID456.

This is a new issue with the (experimental) multicore implementation.
If you turn that off then you will be using the same single threaded
flow as 2.6.31.  If you want to play with the multicore option a bit
more the patch below should squelch the softlockup.  However, I
suspect we will need our own md specific thread pool because the
current implementation spends too much effort bouncing stripes between
the async thread pool and raid5d.

Thanks, for the patch. Then these softlockups go away, but performance
is terrible (factor 6 slower with CONFIG_MULTICORE_RAID456).

Holger