On Mon, May 20, 2024 at 10:55 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, Changhui > > 在 2024/05/20 8:39, Changhui Zhong 写道: > > [czhong@vm linux-block]$ git bisect bad > > 060406c61c7cb4bbd82a02d179decca9c9bb3443 is the first bad commit > > commit 060406c61c7cb4bbd82a02d179decca9c9bb3443 > > Author: Yu Kuai<yukuai3@xxxxxxxxxx> > > Date: Thu May 9 20:38:25 2024 +0800 > > > > block: add plug while submitting IO > > > > So that if caller didn't use plug, for example, __blkdev_direct_IO_simple() > > and __blkdev_direct_IO_async(), block layer can still benefit from caching > > nsec time in the plug. > > > > Signed-off-by: Yu Kuai<yukuai3@xxxxxxxxxx> > > Link:https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@xxxxxxxxxxxxxxx > > Signed-off-by: Jens Axboe<axboe@xxxxxxxxx> > > > > block/blk-core.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > Thanks for the test! > > I was surprised to see this blamed commit, and after taking a look at > raid1 barrier code, I found that there are some known problems, fixed in > raid10, while raid1 still unfixed. So I wonder this patch maybe just > making the exist problem easier to reporduce. > > I'll start cooking patches to sync raid10 fixes to raid1, meanwhile, > can you change your script to test raid10 as well, if raid10 is fine, > I'll give you these patches later to test raid1. > > Thanks, > Kuai > Hi, Kuai I tested raid10 and trigger this issue too, [ 332.435340] Create raid10 [ 332.573160] device-mapper: raid: Superblocks created for new raid set [ 332.595273] md/raid10:mdX: not clean -- starting background reconstruction [ 332.595277] md/raid10:mdX: active with 4 out of 4 devices [ 332.597017] mdX: bitmap file is out of date, doing full recovery [ 332.603712] md: resync of RAID array mdX [ 492.173892] INFO: task mdX_resync:3092 blocked for more than 122 seconds. [ 492.180694] Not tainted 6.9.0+ #1 [ 492.184536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.192365] task:mdX_resync state:D stack:0 pid:3092 tgid:3092 ppid:2 flags:0x00004000 [ 492.192368] Call Trace: [ 492.192370] <TASK> [ 492.192371] __schedule+0x222/0x670 [ 492.192377] schedule+0x2c/0xb0 [ 492.192381] raise_barrier+0xc3/0x190 [raid10] [ 492.192387] ? __pfx_autoremove_wake_function+0x10/0x10 [ 492.192392] raid10_sync_request+0x2c3/0x1ae0 [raid10] [ 492.192397] ? __schedule+0x22a/0x670 [ 492.192398] ? prepare_to_wait_event+0x5f/0x190 [ 492.192401] md_do_sync+0x660/0x1040 [ 492.192405] ? __pfx_autoremove_wake_function+0x10/0x10 [ 492.192408] md_thread+0xad/0x160 [ 492.192410] ? __pfx_md_thread+0x10/0x10 [ 492.192411] kthread+0xdc/0x110 [ 492.192414] ? __pfx_kthread+0x10/0x10 [ 492.192416] ret_from_fork+0x2d/0x50 [ 492.192420] ? __pfx_kthread+0x10/0x10 [ 492.192421] ret_from_fork_asm+0x1a/0x30 [ 492.192424] </TASK> Thanks, Changhui