Re: [bug report] INFO: task mdX_resync:42168 blocked for more than 122 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 16, 2024 at 7:42 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2024/05/16 19:21, Ming Lei 写道:
> > Cc raid and dm list.
> >
> > On Thu, May 16, 2024 at 06:24:18PM +0800, Changhui Zhong wrote:
> >> Hello,
> >>
> >> when create lvm raid1, the command hang on for a long time.
> >> please help check it and let me know if you need any info/testing for
> >> it, thanks.
>
> Is this a new test, or a new problem?

it is a new problem, I am not hit this issue on 6.9.0-rc4+

> >>
> >> repo:https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
> >> branch:for-next
> >> commit: 59ef8180748269837975c9656b586daa16bb9def
> >>
> >> reproducer:
> >> dd if=/dev/zero bs=1M count=2000 of=file0.img
> >> dd if=/dev/zero bs=1M count=2000 of=file1.img
> >> dd if=/dev/zero bs=1M count=2000 of=file2.img
> >> dd if=/dev/zero bs=1M count=2000 of=file4.img
> >> losetup -fP --show file0.img
> >> losetup -fP --show file1.img
> >> losetup -fP --show file2.img
> >> losetup -fP --show file3.img
>
> above dd creat file4, here is file3.

yeah,this is my spelling mistake, I created 4 files, file0/1/2/3

>
> >> pvcreate -y  /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> >> vgcreate  black_bird  /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> >> lvcreate --type raid1 -m 3 -n non_synced_primary_raid_3legs_1   -L 1G
> >> black_bird        /dev/loop0:0-300     /dev/loop1:0-300
> >> /dev/loop2:0-300  /dev/loop3:0-300
>
> I don't understand what /dev/loopx:0-300 means, and I remove them, fix
> the above file4 typo, test on a xfs filesystem, and I can't reporduce
> the problem.
>

I want to specify the space from disk blocks 0 to 300 of the loop
device to create raid1,not all space of loop device,
follow reproducer setps I can reproduced it 100%

> >>
> >>
> >> console log:
> >> May 21 21:57:41 dell-per640-04 journal: Create raid1
> >> May 21 21:57:41 dell-per640-04 kernel: device-mapper: raid:
> >> Superblocks created for new raid set
> >> May 21 21:57:42 dell-per640-04 kernel: md/raid1:mdX: not clean --
> >> starting background reconstruction
> >> May 21 21:57:42 dell-per640-04 kernel: md/raid1:mdX: active with 4 out
> >> of 4 mirrors
> >> May 21 21:57:42 dell-per640-04 kernel: mdX: bitmap file is out of
> >> date, doing full recovery
> >> May 21 21:57:42 dell-per640-04 kernel: md: resync of RAID array mdX
> >> May 21 21:57:42 dell-per640-04 systemd[1]: Started Device-mapper event daemon.
> >> May 21 21:57:42 dell-per640-04 dmeventd[42170]: dmeventd ready for processing.
> >> May 21 21:57:42 dell-per640-04 dmeventd[42170]: Monitoring RAID device
> >> black_bird-non_synced_primary_raid_3legs_1 for events.
> >> May 21 21:57:45 dell-per640-04 restraintd[1446]: *** Current Time: Tue
> >> May 21 21:57:45 2024  Localwatchdog at: Tue May 21 22:56:45 2024
> >> May 21 21:58:45 dell-per640-04 restraintd[1446]: *** Current Time: Tue
> >> May 21 21:58:45 2024  Localwatchdog at: Tue May 21 22:56:45 2024
> >> May 21 21:59:45 dell-per640-04 restraintd[1446]: *** Current Time: Tue
> >> May 21 21:59:45 2024  Localwatchdog at: Tue May 21 22:56:45 2024
> >> May 21 21:59:53 dell-per640-04 kernel: INFO: task mdX_resync:42168
> >> blocked for more than 122 seconds.
> >> May 21 21:59:53 dell-per640-04 kernel:      Not tainted 6.9.0+ #1
> >> May 21 21:59:53 dell-per640-04 kernel: "echo 0 >
> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> May 21 21:59:53 dell-per640-04 kernel: task:mdX_resync      state:D
> >> stack:0     pid:42168 tgid:42168 ppid:2      flags:0x00004000
> >> May 21 21:59:53 dell-per640-04 kernel: Call Trace:
> >> May 21 21:59:53 dell-per640-04 kernel: <TASK>
> >> May 21 21:59:53 dell-per640-04 kernel: __schedule+0x222/0x670
> >> May 21 21:59:53 dell-per640-04 kernel: ? blk_mq_flush_plug_list+0x5/0x20
> >> May 21 21:59:53 dell-per640-04 kernel: schedule+0x2c/0xb0
> >> May 21 21:59:53 dell-per640-04 kernel: raise_barrier+0x107/0x200 [raid1]
>
> Unless this is a deadlock, raise_barrier() should be waiting for normal
> IO that is issued to underlying disk to return. If you can reporduce the
> problem, can you check IO from underlying loop disks?
>
> cat /sys/block/loopx/inflight

when this issue was triggered, the log I collected:

[root@storageqe-103 ~]# cat /sys/block/loop0/inflight
       0        0
[root@storageqe-103 ~]# cat /sys/block/loop1/inflight
       0        0
[root@storageqe-103 ~]# cat /sys/block/loop2/inflight
       0        0
[root@storageqe-103 ~]# cat /sys/block/loop3/inflight
       0        0
[root@storageqe-103 ~]#


and the command "lvs" hang on also,

[root@storageqe-103 ~]# lvs
^C  Interrupted...
  Giving up waiting for lock.
  Can't get lock for black_bird.
  Cannot process volume group black_bird
  LV   VG                 Attr       LSize    Pool Origin Data%  Meta%
 Move Log Cpy%Sync Convert
  home rhel_storageqe-103 -wi-ao---- <368.43g
  root rhel_storageqe-103 -wi-ao----   70.00g
  swap rhel_storageqe-103 -wi-ao----    7.70g
[root@storageqe-103 ~]#

[ 1352.761630] INFO: task mdX_resync:1547 blocked for more than 1105 seconds.
[ 1352.769336]       Not tainted 6.9.0+ #1
[ 1352.773629] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 1352.782372] task:mdX_resync      state:D stack:0     pid:1547
tgid:1547  ppid:2      flags:0x00004000
[ 1352.782380] Call Trace:
[ 1352.782382]  <TASK>
[ 1352.782386]  __schedule+0x222/0x670
[ 1352.782396]  schedule+0x2c/0xb0
[ 1352.782402]  raise_barrier+0x107/0x200 [raid1]
[ 1352.782415]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 1352.782423]  raid1_sync_request+0x12d/0xa50 [raid1]
[ 1352.782435]  ? prepare_to_wait_event+0x5f/0x190
[ 1352.782442]  md_do_sync+0x660/0x1040
[ 1352.782449]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 1352.782457]  md_thread+0xad/0x160
[ 1352.782462]  ? __pfx_md_thread+0x10/0x10
[ 1352.782465]  kthread+0xdc/0x110
[ 1352.782470]  ? __pfx_kthread+0x10/0x10
[ 1352.782474]  ret_from_fork+0x2d/0x50
[ 1352.782481]  ? __pfx_kthread+0x10/0x10
[ 1352.782485]  ret_from_fork_asm+0x1a/0x30
[ 1352.782491]  </TASK>

Thanks,
Changhui

>
> Thanks,
> Kuai
>
> >> May 21 21:59:53 dell-per640-04 kernel: ?
> >> __pfx_autoremove_wake_function+0x10/0x10
> >> May 21 21:59:53 dell-per640-04 kernel: raid1_sync_request+0x12d/0xa50 [raid1]
> >> May 21 21:59:53 dell-per640-04 kernel: ?
> >> __pfx_raid1_sync_request+0x10/0x10 [raid1]
> >> May 21 21:59:53 dell-per640-04 kernel: md_do_sync+0x660/0x1040
> >> May 21 21:59:53 dell-per640-04 kernel: ?
> >> __pfx_autoremove_wake_function+0x10/0x10
> >> May 21 21:59:53 dell-per640-04 kernel: md_thread+0xad/0x160
> >> May 21 21:59:53 dell-per640-04 kernel: ? __pfx_md_thread+0x10/0x10
> >> May 21 21:59:53 dell-per640-04 kernel: kthread+0xdc/0x110
> >> May 21 21:59:53 dell-per640-04 kernel: ? __pfx_kthread+0x10/0x10
> >> May 21 21:59:53 dell-per640-04 kernel: ret_from_fork+0x2d/0x50
> >> May 21 21:59:53 dell-per640-04 kernel: ? __pfx_kthread+0x10/0x10
> >> May 21 21:59:53 dell-per640-04 kernel: ret_from_fork_asm+0x1a/0x30
> >> May 21 21:59:53 dell-per640-04 kernel: </TASK>
> >>
> >>
> >> --
> >> Best Regards,
> >>       Changhui
> >>
> >
>
>






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux