On Wed, Aug 22 2018, Jinpu Wang wrote: >> > I was reply still too fast. My colleague triggered hung task also > directly running IO on multiple raid5. > It's upstream 4.15.7, > > [ 617.690530] INFO: task fio:6440 blocked for more than 120 seconds. > [ 617.690706] Tainted: G O 4.15.7-1-storage #1 > [ 617.690864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 617.691153] fio D 0 6440 6369 0x00000000 > [ 617.691310] Call Trace: > [ 617.691469] ? __schedule+0x2ac/0x7e0 > [ 617.691630] schedule+0x32/0x80 > [ 617.691811] raid5_make_request+0x1c3/0xab0 [raid456] > [ 617.691969] ? wait_woken+0x90/0x90 > [ 617.692120] md_handle_request+0xa4/0x110 > [ 617.692270] md_make_request+0x64/0x160 > [ 617.692421] generic_make_request+0x10d/0x2d0 > [ 617.692573] ? submit_bio+0x5c/0x120 > [ 617.692722] submit_bio+0x5c/0x120 > [ 617.692871] ? bio_iov_iter_get_pages+0xbf/0xf0 > [ 617.693049] blkdev_direct_IO+0x394/0x3d0 > [ 617.693202] ? generic_file_direct_write+0xc9/0x170 > [ 617.693355] generic_file_direct_write+0xc9/0x170 > [ 617.693507] __generic_file_write_iter+0xb6/0x1d0 > [ 617.693659] blkdev_write_iter+0x98/0x110 > [ 617.693809] ? aio_write+0xeb/0x140 > [ 617.693958] aio_write+0xeb/0x140 > [ 617.694107] ? _cond_resched+0x15/0x30 > [ 617.694284] ? mutex_lock+0xe/0x30 > [ 617.694433] ? _copy_to_user+0x22/0x30 > [ 617.694581] ? aio_read_events+0x2ea/0x320 > [ 617.694731] ? do_io_submit+0x1f3/0x680 > [ 617.694881] ? do_io_submit+0x1f3/0x680 > [ 617.695032] ? do_io_submit+0x37b/0x680 > [ 617.695180] do_io_submit+0x37b/0x680 > [ 617.695330] ? do_syscall_64+0x5a/0x120 > [ 617.695509] do_syscall_64+0x5a/0x120 > [ 617.695666] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > [ 617.695823] RIP: 0033:0x7f6362428737 > [ 617.695972] RSP: 002b:00007ffe5daeb808 EFLAGS: 00000246 ORIG_RAX: > 00000000000000d1 > [ 617.696217] RAX: ffffffffffffffda RBX: 00000000016be080 RCX: 00007f6362428737 > [ 617.696376] RDX: 0000000001ed94e8 RSI: 0000000000000067 RDI: 00007f6352a68000 > [ 617.696534] RBP: 00000000000000c8 R08: 0000000000000067 R09: 00000000016c2760 > [ 617.696716] R10: 0000000001804000 R11: 0000000000000246 R12: 00007f63454b3350 > [ 617.696874] R13: 0000000001ed9830 R14: 0000000000000000 R15: 00007f63454c0808 > > raid5_make_request+0x1c3 is sleeping at following code path: > if (test_bit(STRIPE_EXPANDING, &sh->state) || > !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { > /* Stripe is busy expanding or > * add failed due to overlap. Flush everything > * and wait a while > */ > md_wakeup_thread(mddev->thread); > raid5_release_stripe(sh); > schedule(); > do_prepare = true; > goto retry; > } > Looks no one is scheduling it back. > No reshape, just fresh created 60+ raid5 devices. Pretty easy/fast to > reproduce. Presumably it is an overlap, so R5_Overlap should be set. do_prepare is set, so prepare_to_wait() should have been called on wait_for_overlap. So maybe some code patch isn't checking R5_Overlap and so isn't doing the wakeup. NeilBrown > > Is this a known bug, even better if you can point me the fix? > > Thanks, > Jack
Attachment:
signature.asc
Description: PGP signature