Re: [bug report] INFO: task mdX_resync:42168 blocked for more than 122 seconds

Changhui Zhong <czhong@xxxxxxxxxx> · Mon, 20 May 2024 18:38:59 +0800



On Mon, May 20, 2024 at 10:55 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi, Changhui
>
> 在 2024/05/20 8:39, Changhui Zhong 写道:
> > [czhong@vm linux-block]$ git bisect bad
> > 060406c61c7cb4bbd82a02d179decca9c9bb3443 is the first bad commit
> > commit 060406c61c7cb4bbd82a02d179decca9c9bb3443
> > Author: Yu Kuai<yukuai3@xxxxxxxxxx>
> > Date:   Thu May 9 20:38:25 2024 +0800
> >
> >      block: add plug while submitting IO
> >
> >      So that if caller didn't use plug, for example, __blkdev_direct_IO_simple()
> >      and __blkdev_direct_IO_async(), block layer can still benefit from caching
> >      nsec time in the plug.
> >
> >      Signed-off-by: Yu Kuai<yukuai3@xxxxxxxxxx>
> >      Link:https://lore.kernel.org/r/20240509123825.3225207-1-yukuai1@xxxxxxxxxxxxxxx
> >      Signed-off-by: Jens Axboe<axboe@xxxxxxxxx>
> >
> >   block/blk-core.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
>
> Thanks for the test!
>
> I was surprised to see this blamed commit, and after taking a look at
> raid1 barrier code, I found that there are some known problems, fixed in
> raid10, while raid1 still unfixed. So I wonder this patch maybe just
> making the exist problem easier to reporduce.
>
> I'll start cooking patches to sync raid10 fixes to raid1, meanwhile,
> can you change your script to test raid10 as well, if raid10 is fine,
> I'll give you these patches later to test raid1.
>
> Thanks,
> Kuai
>

Hi， Kuai

I tested raid10 and trigger this issue too，

[  332.435340] Create raid10
[  332.573160] device-mapper: raid: Superblocks created for new raid set
[  332.595273] md/raid10:mdX: not clean -- starting background reconstruction
[  332.595277] md/raid10:mdX: active with 4 out of 4 devices
[  332.597017] mdX: bitmap file is out of date, doing full recovery
[  332.603712] md: resync of RAID array mdX
[  492.173892] INFO: task mdX_resync:3092 blocked for more than 122 seconds.
[  492.180694]       Not tainted 6.9.0+ #1
[  492.184536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  492.192365] task:mdX_resync      state:D stack:0     pid:3092
tgid:3092  ppid:2      flags:0x00004000
[  492.192368] Call Trace:
[  492.192370]  <TASK>
[  492.192371]  __schedule+0x222/0x670
[  492.192377]  schedule+0x2c/0xb0
[  492.192381]  raise_barrier+0xc3/0x190 [raid10]
[  492.192387]  ? __pfx_autoremove_wake_function+0x10/0x10
[  492.192392]  raid10_sync_request+0x2c3/0x1ae0 [raid10]
[  492.192397]  ? __schedule+0x22a/0x670
[  492.192398]  ? prepare_to_wait_event+0x5f/0x190
[  492.192401]  md_do_sync+0x660/0x1040
[  492.192405]  ? __pfx_autoremove_wake_function+0x10/0x10
[  492.192408]  md_thread+0xad/0x160
[  492.192410]  ? __pfx_md_thread+0x10/0x10
[  492.192411]  kthread+0xdc/0x110
[  492.192414]  ? __pfx_kthread+0x10/0x10
[  492.192416]  ret_from_fork+0x2d/0x50
[  492.192420]  ? __pfx_kthread+0x10/0x10
[  492.192421]  ret_from_fork_asm+0x1a/0x30
[  492.192424]  </TASK>

Thanks，
Changhui