On Wed, Nov 30, 2016 at 1:08 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > On Mon, Nov 28, 2016 at 09:45:07AM +0100, Jinpu Wang wrote: >> Hi folks, >> >> We hit another hung task on bitmap_startwrite with our test machines, this time >> it's hung in bitmap_startwrite. >> >> We build MD/RAID1 over 2 block devices exported via IB, >> bitmap=internal. KVM build on top of >> RAID1 on compute node, disks are on remote storage node, one storage >> node crash/reboot, multiple raid1 on multiple compute node KVM run >> into hung task: >> >> [106204.343870] INFO: task kvm:37669 blocked for more than 180 seconds. >> >> [106204.344138] Tainted: G IO 4.4.28-1-pserver #1 >> >> [106204.344385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> >> [106204.344798] kvm D ffff882037723710 0 37669 1 0x00000000 >> >> [106204.344805] ffff882037723710 ffff882038f08d00 ffff882029770d00 >> ffff8820377236d8 >> >> [106204.344809] ffff8820377236d8 ffff882037724000 0000000000308648 >> 0000000000000008 >> >> [106204.344813] ffff880f9bd9e8c0 ffff882037723768 ffff882037723728 >> ffffffff81811c60 >> >> [106204.344818] Call Trace: >> >> [106204.344831] [<ffffffff81811c60>] schedule+0x30/0x80 >> >> [106204.344841] [<ffffffffa09d31a2>] bitmap_startwrite+0x122/0x190 [md_mod] >> >> [106204.344848] [<ffffffff813f660b>] ? bio_clone_bioset+0x11b/0x310 >> >> [106204.344853] [<ffffffff810956b0>] ? wait_woken+0x80/0x80 >> >> [106204.344859] [<ffffffffa0cc5127>] 0xffffffffa0cc5127 >> >> [106204.344865] [<ffffffffa09c4863>] md_set_array_sectors+0xac3/0xe20 [md_mod] >> >> [106204.344871] [<ffffffff813faa94>] ? generic_make_request_checks+0x234/0x4c0 >> >> [106204.344875] [<ffffffff813fdb91>] blk_prologue_bio+0x91/0xc0 >> >> [106204.344879] [<ffffffff813fd54e>] generic_make_request+0xfe/0x1e0 >> >> [106204.344883] [<ffffffff813fd692>] submit_bio+0x62/0x150 >> >> [106204.344892] [<ffffffff811d3257>] do_blockdev_direct_IO+0x2317/0x2ba0 >> >> [106204.344897] [<ffffffff810b9999>] ? __remove_hrtimer+0x89/0xa0 >> >> [106204.344903] [<ffffffff8173c08f>] ? udp_poll+0x1f/0xb0 >> >> [106204.344908] [<ffffffff816b71c7>] ? sock_poll+0x57/0x120 >> >> [106204.344913] [<ffffffff811cdbf0>] ? I_BDEV+0x10/0x10 >> >> [106204.344918] [<ffffffff811d3b1e>] __blockdev_direct_IO+0x3e/0x40 >> >> [106204.344922] [<ffffffff811ce287>] blkdev_direct_IO+0x47/0x50 >> >> [106204.344930] [<ffffffff81132c60>] generic_file_direct_write+0xb0/0x170 >> >> [106204.344934] [<ffffffff81132ded>] __generic_file_write_iter+0xcd/0x1f0 >> >> [106204.344943] [<ffffffff81184ff8>] ? kmem_cache_free+0x78/0x190 >> >> [106204.344948] [<ffffffff811ce4c0>] ? bd_unlink_disk_holder+0xf0/0xf0 >> >> [106204.344952] [<ffffffff811ce547>] blkdev_write_iter+0x87/0x110 >> >> [106204.344956] [<ffffffff811ce4c0>] ? bd_unlink_disk_holder+0xf0/0xf0 >> >> [106204.344962] [<ffffffff811dec56>] aio_run_iocb+0x236/0x2a0 >> >> [106204.344966] [<ffffffff811dd183>] ? eventfd_ctx_read+0x53/0x200 >> >> [106204.344973] [<ffffffff811b3bbf>] ? __fget_light+0x1f/0x60 >> >> [106204.344976] [<ffffffff811b3c0e>] ? __fdget+0xe/0x10 >> >> [106204.344980] [<ffffffff811dfb5a>] do_io_submit+0x23a/0x4d0 >> >> [106204.344985] [<ffffffff811dfdfb>] SyS_io_submit+0xb/0x10 >> >> [106204.344989] [<ffffffff818154d7>] entry_SYSCALL_64_fastpath+0x12/0x6a >> >> [106384.345330] INFO: task kvm:37669 blocked for more than 180 seconds. >> >> [106384.345621] Tainted: G IO 4.4.28-1-pserver #1 >> >> [106384.345866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> >> [106384.346275] kvm D ffff882037723710 0 37669 1 0x00000000 >> >> [106384.346282] ffff882037723710 ffff882038f08d00 ffff882029770d00 >> ffff8820377236d8 >> >> [106384.346286] ffff8820377236d8 ffff882037724000 0000000000308648 >> 0000000000000008 >> >> [106384.346290] ffff880f9bd9e8c0 ffff882037723768 ffff882037723728 >> ffffffff81811c60 >> >> [106384.346294] Call Trace: >> >> [106384.346308] [<ffffffff81811c60>] schedule+0x30/0x80 >> >> [106384.346317] [<ffffffffa09d31a2>] bitmap_startwrite+0x122/0x190 [md_mod] >> >> [106384.346325] [<ffffffff813f660b>] ? bio_clone_bioset+0x11b/0x310 >> >> [106384.346330] [<ffffffff810956b0>] ? wait_woken+0x80/0x80 >> >> [106384.346336] [<ffffffffa0cc5127>] 0xffffffffa0cc5127 >> >> [106384.346341] [<ffffffffa09c4863>] md_set_array_sectors+0xac3/0xe20 [md_mod] >> >> [106384.346347] [<ffffffff813faa94>] ? generic_make_request_checks+0x234/0x4c0 >> >> [106384.346352] [<ffffffff813fdb91>] blk_prologue_bio+0x91/0xc0 >> >> [106384.346356] [<ffffffff813fd54e>] generic_make_request+0xfe/0x1e0 >> >> [106384.346360] [<ffffffff813fd692>] submit_bio+0x62/0x150 >> >> [106384.346369] [<ffffffff811d3257>] do_blockdev_direct_IO+0x2317/0x2ba0 >> >> >> (gdb) l *bitmap_startwrite+0x122 >> >> 0x121d2 is in bitmap_startwrite (drivers/md/bitmap.c:1396). >> >> >> >> 1394 if (unlikely(COUNTER(*bmc) == COUNTER_MAX)) { >> >> 1395 DEFINE_WAIT(__wait); >> >> 1396 /* note that it is safe to do the prepare_to_wait >> >> 1397 * after the test as long as we do it >> before dropping >> >> 1398 * the spinlock. >> >> 1399 */ >> >> 1400 prepare_to_wait(&bitmap->overflow_wait, &__wait, >> >> 1401 TASK_UNINTERRUPTIBLE); >> >> 1402 spin_unlock_irq(&bitmap->counts.lock); >> >> 1403 schedule(); >> >> 1404 finish_wait(&bitmap->overflow_wait, &__wait); >> >> 1405 continue; >> >> 1406 } >> >> so seem KVM is waiting on overflow_wait queue, but somehow no one wake >> him up. During reboot one storage, raid1 has a lot IO errors in that >> time, I guess some error handle part is broken. >> >> I haven't have a reproducer, just want to report to community, in case >> this is known bug, or anyone has patch already :) > > Does the kernel report the raid disk is faulty and gets removed? Is this a > real hang, eg maybe we are waitting for IO error reported. Thanks Shaohua for reply. I checked the log again, the hung task was there for 10+ hours. I found something wrong with the testcase, it create MD/RAID1 on two drives from 2 remote storage servers, and reboot one storage before some MDs finished recovery, and then the other storage also rebooted, this led to both legs were broken, and somehow led to the hung tasks. As RAID1 can't handle 2 broken legs at same time, I think we will change our test case for more practical case. -- Jinpu Wang Linux Kernel Developer ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 30 577 008 042 Fax: +49 30 577 008 299 Email: jinpu.wang@xxxxxxxxxxxxxxxx URL: https://www.profitbricks.de Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Geschäftsführer: Achim Weiss -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html