On Sun, 01 Aug 2010 14:57:56 +0400 Michael Tokarev <mjt@xxxxxxxxxx> wrote: > Hello. > > It is the second time we come across this issue > after switching from 2.6.27 to 2.6.32 about 3 > months ago. > > At some point, an md-raid10 array hungs - that > is, all the processes that tries to access it, > either read or write, hungs forever. Thanks for the report. This is the same problem that has been reported recently in a thread with subject "Raid10 device hangs during resync and heavy I/O." I have just posted a patch which should address it - I will include it here was well. Note that you need to be careful when reading the stack traces. A "?" means that the function named make not be in the actual call trace - it may just be an old address that happens to still be on the stack. In this case, the "mempool_alloc" was stray - nothing was actually blocking on that. This is the patch that I have proposed. Thanks, NeilBrown diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 42e64e4..d1d6891 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio) */ bp = bio_split(bio, chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); + + /* Each of these 'make_request' calls will call 'wait_barrier'. + * If the first succeeds but the second blocks due to the resync + * thread raising the barrier, we will deadlock because the + * IO to the underlying device will be queued in generic_make_request + * and will never complete, so will never reduce nr_pending. + * So increment nr_waiting here so no new raise_barriers will + * succeed, and so the second wait_barrier cannot block. + */ + spin_lock_irq(&conf->resync_lock); + conf->nr_waiting++; + spin_unlock_irq(&conf->resync_lock); + if (make_request(mddev, &bp->bio1)) generic_make_request(&bp->bio1); if (make_request(mddev, &bp->bio2)) generic_make_request(&bp->bio2); + spin_lock_irq(&conf->resync_lock); + conf->nr_waiting--; + wake_up(&conf->wait_barrier); + spin_unlock_irq(&conf->resync_lock); + bio_pair_release(bp); return 0; bad_map: > > Here's a typical set of messages found in kern.log: > > INFO: task oracle:7602 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > oracle D ffff8801a8837148 0 7602 1 0x00000000 > ffffffff813bc480 0000000000000082 0000000000000000 0000000000000001 > ffff8801a8b7fdd8 000000000000e1c8 ffff88003b397fd8 ffff88003f47d840 > ffff88003f47dbe0 000000012416219a ffff88002820e1c8 ffff88003f47dbe0 > Call Trace: > [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10] > [<ffffffff8104f570>] ? default_wake_function+0x0/0x10 > [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10] > [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod] > [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140 > [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410 > [<ffffffff8112eee4>] ? bio_alloc_bioset+0x54/0xf0 > [<ffffffff8112e28b>] ? __bio_add_page+0x12b/0x240 > [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0 > [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90 > [<ffffffff81131d63>] ? __blockdev_direct_IO+0x5a3/0xcd0 > [<ffffffffa01f66ed>] ? xfs_vm_direct_IO+0x11d/0x140 [xfs] > [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs] > [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs] > [<ffffffff810c3738>] ? generic_file_direct_write+0xc8/0x1b0 > [<ffffffffa01fef18>] ? xfs_write+0x458/0x950 [xfs] > [<ffffffff8106317b>] ? try_to_del_timer_sync+0x9b/0xd0 > [<ffffffff810f9251>] ? cache_alloc_refill+0x221/0x5e0 > [<ffffffffa01fafe0>] ? xfs_file_aio_write+0x0/0x60 [xfs] > [<ffffffff8113a6ac>] ? aio_rw_vect_retry+0x7c/0x210 > [<ffffffff8113be02>] ? aio_run_iocb+0x82/0x150 > [<ffffffff8113c747>] ? sys_io_submit+0x2b7/0x6b0 > [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b > > INFO: task oracle:7654 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > oracle D ffff8801a8837148 0 7654 1 0x00000000 > ffff8800614ac7c0 0000000000000086 0000000000000000 0000000000000206 > 0000000000000000 000000000000e1c8 ffff88018c175fd8 ffff88005c9ba040 > ffff88005c9ba3e0 ffffffff810c4722 000000038c175810 ffff88005c9ba3e0 > Call Trace: > [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140 > [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10] > [<ffffffff8104f570>] ? default_wake_function+0x0/0x10 > [<ffffffff8112ddd1>] ? __bio_clone+0x21/0x70 > [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10] > [<ffffffff8112d765>] ? bio_split+0x25/0x2a0 > [<ffffffffa0191ce1>] ? make_request+0x511/0x5f0 [raid10] > [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod] > [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410 > [<ffffffff8112da4a>] ? bvec_alloc_bs+0x6a/0x120 > [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0 > [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90 > [<ffffffff81131480>] ? dio_send_cur_page+0x70/0xc0 > [<ffffffff8113151e>] ? submit_page_section+0x4e/0x140 > [<ffffffff8113215a>] ? __blockdev_direct_IO+0x99a/0xcd0 > [<ffffffffa01f666e>] ? xfs_vm_direct_IO+0x9e/0x140 [xfs] > [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs] > [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs] > [<ffffffff810c4357>] ? generic_file_aio_read+0x607/0x620 > [<ffffffffa023fae8>] ? rpc_run_task+0x38/0x80 [sunrpc] > [<ffffffffa01ff83b>] ? xfs_read+0x11b/0x270 [xfs] > [<ffffffff81103453>] ? do_sync_read+0xe3/0x130 > [<ffffffff8113c32c>] ? sys_io_getevents+0x39c/0x420 > [<ffffffff810706b0>] ? autoremove_wake_function+0x0/0x30 > [<ffffffff8113adc0>] ? timeout_func+0x0/0x10 > [<ffffffff81104138>] ? vfs_read+0xc8/0x180 > [<ffffffff81104291>] ? sys_pread64+0xa1/0xb0 > [<ffffffff8100c2db>] ? device_not_available+0x1b/0x20 > [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b > > INFO: task md11_resync:11976 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > md11_resync D ffff88017964d140 0 11976 2 0x00000000 > ffff8801af879880 0000000000000046 0000000000000000 0000000000000001 > ffff8801a8b7fdd8 000000000000e1c8 ffff8800577d1fd8 ffff88017964d140 > ffff88017964d4e0 000000012416219a ffff88002828e1c8 ffff88017964d4e0 > Call Trace: > [<ffffffffa018e696>] ? raise_barrier+0xb6/0x1e0 [raid10] > [<ffffffff8104f570>] ? default_wake_function+0x0/0x10 > [<ffffffff8103b263>] ? enqueue_task+0x53/0x60 > [<ffffffffa018f525>] ? sync_request+0x715/0xae0 [raid10] > [<ffffffffa007dc76>] ? md_do_sync+0x606/0xc70 [md_mod] > [<ffffffff8104ca4a>] ? finish_task_switch+0x3a/0xc0 > [<ffffffffa007ec47>] ? md_thread+0x67/0x140 [md_mod] > [<ffffffffa007ebe0>] ? md_thread+0x0/0x140 [md_mod] > [<ffffffff81070376>] ? kthread+0x96/0xb0 > [<ffffffff8100c52a>] ? child_rip+0xa/0x20 > [<ffffffff810702e0>] ? kthread+0x0/0xb0 > [<ffffffff8100c520>] ? child_rip+0x0/0x20 > > (All 3 processes shown are reported at the same time). > A few more processes are waiting in wait_barrier like the > first mentioned above does. Note the 3 different places > it is waiting: > > o raise_barrier > o wait_barrier > o mempool_alloc called from wait_barrier > > the whole thing look suspicious - smells like a deadlock > somewhere. > > >From this point on, the array is completely dead, with many > processes (like the above) blocked, with no way to umount the > filesystem in question. Only forced reboot of the system > helps. > > This is 2.6.32.15. I see there were a few patches for md > after that, but it looks like they aren't relevant for this > issue. > > Note that this is not a trivially-triggerable problem. The > array survived several verify rounds (even during current > uptime) without problems. But today the array had quite some > load during verify. > > Thanks! > > /mjt > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html