On 11/18/21 9:07 PM, Song Liu wrote:
On Thu, Nov 11, 2021 at 10:09 PM Vishal Verma <vverma@xxxxxxxxxxxxxxxx> wrote:
Yes, with raid10 the task hung happened when doing write IO using FIO where FIO just gets stuck after like 30s or so and no I/O happens afterwards.
This was on a test nvme based raid10: (tried with both io_uring and aio, same issue)
[ 1818.677686] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1818.685512] task:fio state:D stack: 0 pid:14314 ppid: 1 flags:0x00020004
[ 1818.685516] Call Trace:
[ 1818.685519] __schedule+0x295/0x840
[ 1818.685525] ? wbt_cleanup_cb+0x20/0x20
[ 1818.685528] schedule+0x4e/0xb0
[ 1818.685529] io_schedule+0x3f/0x70
[ 1818.685531] rq_qos_wait+0xb9/0x130
[ 1818.685535] ? sysv68_partition+0x280/0x280
[ 1818.685537] ? wbt_cleanup_cb+0x20/0x20
[ 1818.685538] wbt_wait+0x92/0xc0
[ 1818.685539] __rq_qos_throttle+0x25/0x40
[ 1818.685541] blk_mq_submit_bio+0xc6/0x5d0
[ 1818.685544] ? submit_bio_checks+0x39e/0x5f0
[ 1818.685547] __submit_bio+0x1bc/0x1d0
[ 1818.685549] submit_bio_noacct+0x256/0x2a0
[ 1818.685550] ? bio_associate_blkg+0x29/0x70
[ 1818.685553] 0xffffffffc028d38a
[ 1818.685555] blk_flush_plug+0xc3/0x130
[ 1818.685558] blk_finish_plug+0x26/0x40
[ 1818.685560] blkdev_write_iter+0xf8/0x160
[ 1818.685561] io_write+0x153/0x2e0
[ 1818.685564] ? blk_mq_put_tags+0x1d/0x20
[ 1818.685566] ? blk_mq_end_request_batch+0x295/0x2e0
[ 1818.685568] ? sysvec_apic_timer_interrupt+0x46/0x80
[ 1818.685570] io_issue_sqe+0x579/0x1990
[ 1818.685571] ? io_req_prep+0x6a9/0xe60
[ 1818.685573] ? __fget_files+0x56/0x80
[ 1818.685576] ? fget+0x2a/0x30
[ 1818.685577] io_submit_sqes+0x28c/0x930
[ 1818.685578] ? __io_submit_flush_completions+0xdc/0x150
[ 1818.685580] ? ctx_flush_and_put+0x4b/0x70
[ 1818.685581] __x64_sys_io_uring_enter+0x1db/0x8e0
[ 1818.685583] ? exit_to_user_mode_prepare+0x3e/0x1e0
[ 1818.685586] ? exit_to_user_mode_prepare+0x3e/0x1e0
[ 1818.685588] do_syscall_64+0x38/0x90
[ 1818.685591] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1818.685593] RIP: 0033:0x7f8a41c1889d
[ 1818.685594] RSP: 002b:00007ffe390d5af8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
[ 1818.685596] RAX: ffffffffffffffda RBX: 00007ffe390d5b20 RCX: 00007f8a41c1889d
[ 1818.685597] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000006
[ 1818.685597] RBP: 000055de073b6ef0 R08: 0000000000000000 R09: 0000000000000000
[ 1818.685598] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f8a38400000
[ 1818.685599] R13: 0000000000000001 R14: 0000000000875bc1 R15: 0000000000000000
For raid456, running into this as soon as I try to create a raid5 volume:
[ 5338.620661] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.627457] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.634250] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.641043] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.647836] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.654632] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.661424] Dev md5: unable to read RDB block 0
[ 5338.665957] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.672746] Buffer I/O error on dev md5, logical block 0, async page read
[ 5338.679540] Buffer I/O error on dev md5, logical block 3, async page read
I am sorry that I haven't got time to look into this, and I will be on
vacation again from
tomorrow. If you make progress, please share your finding and/or
updated version.
I will try to look into this after Thanksgiving.
Song
Hi Song,
Did you get chance to look into this? It looks like I am bit stuck here. The other option I am thinking is if we just add a flag for enabling nowait and enable it by default for raid1?
Thanks,
Vishal