NeilBrown <neilb@xxxxxxxx> 于2018年8月22日周三 下午10:51写道: > > On Wed, Aug 22 2018, Jinpu Wang wrote: > >> > > I was reply still too fast. My colleague triggered hung task also > > directly running IO on multiple raid5. > > It's upstream 4.15.7, > > > > [ 617.690530] INFO: task fio:6440 blocked for more than 120 seconds. > > [ 617.690706] Tainted: G O 4.15.7-1-storage #1 > > [ 617.690864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [ 617.691153] fio D 0 6440 6369 0x00000000 > > [ 617.691310] Call Trace: > > [ 617.691469] ? __schedule+0x2ac/0x7e0 > > [ 617.691630] schedule+0x32/0x80 > > [ 617.691811] raid5_make_request+0x1c3/0xab0 [raid456] > > [ 617.691969] ? wait_woken+0x90/0x90 > > [ 617.692120] md_handle_request+0xa4/0x110 > > [ 617.692270] md_make_request+0x64/0x160 > > [ 617.692421] generic_make_request+0x10d/0x2d0 > > [ 617.692573] ? submit_bio+0x5c/0x120 > > [ 617.692722] submit_bio+0x5c/0x120 > > [ 617.692871] ? bio_iov_iter_get_pages+0xbf/0xf0 > > [ 617.693049] blkdev_direct_IO+0x394/0x3d0 > > [ 617.693202] ? generic_file_direct_write+0xc9/0x170 > > [ 617.693355] generic_file_direct_write+0xc9/0x170 > > [ 617.693507] __generic_file_write_iter+0xb6/0x1d0 > > [ 617.693659] blkdev_write_iter+0x98/0x110 > > [ 617.693809] ? aio_write+0xeb/0x140 > > [ 617.693958] aio_write+0xeb/0x140 > > [ 617.694107] ? _cond_resched+0x15/0x30 > > [ 617.694284] ? mutex_lock+0xe/0x30 > > [ 617.694433] ? _copy_to_user+0x22/0x30 > > [ 617.694581] ? aio_read_events+0x2ea/0x320 > > [ 617.694731] ? do_io_submit+0x1f3/0x680 > > [ 617.694881] ? do_io_submit+0x1f3/0x680 > > [ 617.695032] ? do_io_submit+0x37b/0x680 > > [ 617.695180] do_io_submit+0x37b/0x680 > > [ 617.695330] ? do_syscall_64+0x5a/0x120 > > [ 617.695509] do_syscall_64+0x5a/0x120 > > [ 617.695666] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > [ 617.695823] RIP: 0033:0x7f6362428737 > > [ 617.695972] RSP: 002b:00007ffe5daeb808 EFLAGS: 00000246 ORIG_RAX: > > 00000000000000d1 > > [ 617.696217] RAX: ffffffffffffffda RBX: 00000000016be080 RCX: 00007f6362428737 > > [ 617.696376] RDX: 0000000001ed94e8 RSI: 0000000000000067 RDI: 00007f6352a68000 > > [ 617.696534] RBP: 00000000000000c8 R08: 0000000000000067 R09: 00000000016c2760 > > [ 617.696716] R10: 0000000001804000 R11: 0000000000000246 R12: 00007f63454b3350 > > [ 617.696874] R13: 0000000001ed9830 R14: 0000000000000000 R15: 00007f63454c0808 > > > > raid5_make_request+0x1c3 is sleeping at following code path: > > if (test_bit(STRIPE_EXPANDING, &sh->state) || > > !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { > > /* Stripe is busy expanding or > > * add failed due to overlap. Flush everything > > * and wait a while > > */ > > md_wakeup_thread(mddev->thread); > > raid5_release_stripe(sh); > > schedule(); > > do_prepare = true; > > goto retry; > > } > > Looks no one is scheduling it back. > > No reshape, just fresh created 60+ raid5 devices. Pretty easy/fast to > > reproduce. > > Presumably it is an overlap, so R5_Overlap should be set. > do_prepare is set, so prepare_to_wait() should have been called on > wait_for_overlap. > > So maybe some code patch isn't checking R5_Overlap and so isn't doing > the wakeup. > > NeilBrown > > > > > Is this a known bug, even better if you can point me the fix? Thanks Neil, We applied 448ec638c6bc ("md/raid5: Assigning NULL to sh->batch_head before testing bit R5_Overlap of a stripe"), We can no longer trigger the IO hung, still testing, but looks promissing. Will report back, if we still see problem. Regards, Jack