On Thu, Nov 11, 2021 at 10:09 PM Vishal Verma <vverma@xxxxxxxxxxxxxxxx> wrote: > > Yes, with raid10 the task hung happened when doing write IO using FIO where FIO just gets stuck after like 30s or so and no I/O happens afterwards. > This was on a test nvme based raid10: (tried with both io_uring and aio, same issue) > > [ 1818.677686] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 1818.685512] task:fio state:D stack: 0 pid:14314 ppid: 1 flags:0x00020004 > [ 1818.685516] Call Trace: > [ 1818.685519] __schedule+0x295/0x840 > [ 1818.685525] ? wbt_cleanup_cb+0x20/0x20 > [ 1818.685528] schedule+0x4e/0xb0 > [ 1818.685529] io_schedule+0x3f/0x70 > [ 1818.685531] rq_qos_wait+0xb9/0x130 > [ 1818.685535] ? sysv68_partition+0x280/0x280 > [ 1818.685537] ? wbt_cleanup_cb+0x20/0x20 > [ 1818.685538] wbt_wait+0x92/0xc0 > [ 1818.685539] __rq_qos_throttle+0x25/0x40 > [ 1818.685541] blk_mq_submit_bio+0xc6/0x5d0 > [ 1818.685544] ? submit_bio_checks+0x39e/0x5f0 > [ 1818.685547] __submit_bio+0x1bc/0x1d0 > [ 1818.685549] submit_bio_noacct+0x256/0x2a0 > [ 1818.685550] ? bio_associate_blkg+0x29/0x70 > [ 1818.685553] 0xffffffffc028d38a > [ 1818.685555] blk_flush_plug+0xc3/0x130 > [ 1818.685558] blk_finish_plug+0x26/0x40 > [ 1818.685560] blkdev_write_iter+0xf8/0x160 > [ 1818.685561] io_write+0x153/0x2e0 > [ 1818.685564] ? blk_mq_put_tags+0x1d/0x20 > [ 1818.685566] ? blk_mq_end_request_batch+0x295/0x2e0 > [ 1818.685568] ? sysvec_apic_timer_interrupt+0x46/0x80 > [ 1818.685570] io_issue_sqe+0x579/0x1990 > [ 1818.685571] ? io_req_prep+0x6a9/0xe60 > [ 1818.685573] ? __fget_files+0x56/0x80 > [ 1818.685576] ? fget+0x2a/0x30 > [ 1818.685577] io_submit_sqes+0x28c/0x930 > [ 1818.685578] ? __io_submit_flush_completions+0xdc/0x150 > [ 1818.685580] ? ctx_flush_and_put+0x4b/0x70 > [ 1818.685581] __x64_sys_io_uring_enter+0x1db/0x8e0 > [ 1818.685583] ? exit_to_user_mode_prepare+0x3e/0x1e0 > [ 1818.685586] ? exit_to_user_mode_prepare+0x3e/0x1e0 > [ 1818.685588] do_syscall_64+0x38/0x90 > [ 1818.685591] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 1818.685593] RIP: 0033:0x7f8a41c1889d > [ 1818.685594] RSP: 002b:00007ffe390d5af8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa > [ 1818.685596] RAX: ffffffffffffffda RBX: 00007ffe390d5b20 RCX: 00007f8a41c1889d > [ 1818.685597] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000006 > [ 1818.685597] RBP: 000055de073b6ef0 R08: 0000000000000000 R09: 0000000000000000 > [ 1818.685598] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f8a38400000 > [ 1818.685599] R13: 0000000000000001 R14: 0000000000875bc1 R15: 0000000000000000 > > For raid456, running into this as soon as I try to create a raid5 volume: > > [ 5338.620661] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.627457] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.634250] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.641043] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.647836] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.654632] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.661424] Dev md5: unable to read RDB block 0 > [ 5338.665957] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.672746] Buffer I/O error on dev md5, logical block 0, async page read > [ 5338.679540] Buffer I/O error on dev md5, logical block 3, async page read I am sorry that I haven't got time to look into this, and I will be on vacation again from tomorrow. If you make progress, please share your finding and/or updated version. I will try to look into this after Thanksgiving. Song