Re: mdadm grow raid 5 to 6 failure (crash)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2023/05/08 13:54, David Gilmour 写道:
I'm not sure what I'm looking for here but here is the output of the
inflight file immediately after the mdadm assemble hangs. Does this
indicate something accessing the array?

#cat /sys/block/md127/inflight
        1        0


Yes, something is accessing the array. Do you try to grep all the task
that is "D" state?

ps -elf | grep " D "

Is there any task stuck in raid5_make_request?

cat /proc/$pid/stack

Also attached is an strace of my mdadm command that hung in case that
reveals something relevant:
strace mdadm --assemble --verbose
--backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
/dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
tee mdadm_strace_output.txt

I don't think this will be helpful, mdadm is unlikely the task that
is accessing the array.

Thanks,
Kuai





On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2023/05/06 21:19, David Gilmour 写道:
>From what I can tell it does look very similar. I stopped the
systemd-udevd service and renamed it to systemd-udevd.bak. My system
still hung on the assemble command. I'm not savvy enough to decode the
details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
the process stack output suggest the same i/o block the other post
indicates?

× systemd-udevd.service - Rule-based Manager for Device Events and Files
       Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
       Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
MDT; 1min 27s ago
     Duration: 1d 20h 16min 29.633s
TriggeredBy: × systemd-udevd-kernel.socket
               × systemd-udevd-control.socket
         Docs: man:systemd-udevd.service(8)
               man:udev(7)
      Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
(code=exited, status=203/EXEC)
     Main PID: 27440 (code=exited, status=203/EXEC)
          CPU: 5ms

----------------------
#mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
--invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
/dev/sdb /dev/sdf --force
mdadm: looking for devices for /dev/md127
mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
mdadm: /dev/md127 has an active reshape - checking if critical section
needs to be restored
mdadm: No backup metadata on /root/mdadm5-6_backup_md127
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/sdh to /dev/md127 as 1
mdadm: added /dev/sdg to /dev/md127 as 2
mdadm: added /dev/sdc to /dev/md127 as 3
mdadm: added /dev/sdb to /dev/md127 as 4
mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sda to /dev/md127 as 0

#hangs indefinitely at this point in the output

------------------------------------------


root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
--invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
/dev/sdb /dev/sdf --force
root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]

#cat /proc/27454/stack
[<0>] mddev_suspend.part.0+0xdf/0x150
[<0>] suspend_lo_store+0xc5/0xf0
[<0>] md_attr_store+0x83/0xf0
[<0>] kernfs_fop_write_iter+0x124/0x1b0
[<0>] new_sync_write+0xff/0x190
[<0>] vfs_write+0x1ef/0x280
[<0>] ksys_write+0x5f/0xe0
[<0>] do_syscall_64+0x5c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

#cat /proc/27457/stack
[<0>] md_thread+0x122/0x160
[<0>] kthread+0xe0/0x100
[<0>] ret_from_fork+0x22/0x30


Is there any thread stuck at raid5_make_request? something like below:

Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
pid: 8121 ppid:   706 flags:0x00000006
Apr 23 19:17:22 atom kernel: Call Trace:
Apr 23 19:17:22 atom kernel:  <TASK>
Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90

By the way, cat /sys/block/mdxx/inflight can prove this as well.

If this is the case, can you find out who is accessing the array?

Thanks,
Kuai





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux