Re: linux mdadm assembly error: md: cannot handle concurrent replacement and reshape. (reboot while reshaping)

Peter Neuwirth <reddunur@xxxxxxxxx> · Thu, 4 May 2023 12:49:31 +0200

Hello Kuai,

meanwhile I managed to stop devices and recreate, but it seems, with little success for my data:
After recreation, xfs_repair now could not even find any defect super block :(

regards,

Peter

mdadm  --create --verbose /dev/md0 -c 256K --level=5 --raid-devices=6  /dev/sde /dev/sdc /dev/sdb /dev/sda /dev/sdi /dev/sdj --assume-clean
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sde appears to be part of a raid array:
      level=raid6 devices=7 ctime=Mon Mar  6 18:17:30 2023
mdadm: partition table exists on /dev/sde but will be lost or
      meaningless after creating array
mdadm: /dev/sdc appears to be part of a raid array:
      level=raid6 devices=7 ctime=Mon Mar  6 18:17:30 2023
mdadm: partition table exists on /dev/sdc but will be lost or
      meaningless after creating array
mdadm: /dev/sdb appears to be part of a raid array:
      level=raid6 devices=7 ctime=Mon Mar  6 18:17:30 2023
mdadm: partition table exists on /dev/sdb but will be lost or
      meaningless after creating array
mdadm: /dev/sda appears to be part of a raid array:
      level=raid6 devices=7 ctime=Mon Mar  6 18:17:30 2023
mdadm: partition table exists on /dev/sda but will be lost or
      meaningless after creating array
mdadm: /dev/sdi appears to be part of a raid array:
      level=raid6 devices=7 ctime=Mon Mar  6 18:17:30 2023
mdadm: partition table exists on /dev/sdi but will be lost or
      meaningless after creating array
mdadm: /dev/sdj appears to be part of a raid array:
      level=raid6 devices=7 ctime=Mon Mar  6 18:17:30 2023
mdadm: partition table exists on /dev/sdj but will be lost or
      meaningless after creating array
mdadm: size set to 976630272K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
srv11:~# mdadm --detail /dev/md0
/dev/md0:
          Version : 1.2
    Creation Time : Thu May  4 12:38:27 2023
       Raid Level : raid5
       Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
    Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
     Raid Devices : 6
    Total Devices : 6
      Persistence : Superblock is persistent

    Intent Bitmap : Internal

      Update Time : Thu May  4 12:38:30 2023
            State : clean
   Active Devices : 6
  Working Devices : 6
   Failed Devices : 0
    Spare Devices : 0

           Layout : left-symmetric
       Chunk Size : 256K

Consistency Policy : bitmap

             Name : srv11:0  (local to host srv11)
             UUID : eda34f0b:3453c7d6:35b9fdf4:37784433
           Events : 1

   Number   Major   Minor   RaidDevice State
      0       8       64        0      active sync   /dev/sde
      1       8       32        1      active sync   /dev/sdc
      2       8       16        2      active sync   /dev/sdb
      3       8        0        3      active sync   /dev/sda
      4       8      128        4      active sync   /dev/sdi
      5       8      144        5      active sync   /dev/sdj
srv11:~# xfs_repair -n /dev/md0
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
..............................................

Am 04.05.23 um 11:08 schrieb Yu Kuai:
Hi,

在 2023/05/04 16:36, Peter Neuwirth 写道:
Thank you, Kuai!
So my gut instinct was not that bad. Now as I could reassemble my raid set (it tried to recontinue the rebuild, I stopped it)
I have a /dev/md0 but it seems that no sensible data is stored on it. Not even a partition table could be found.

 From your investigations, what would you say : is there hope I could rescue some of the data from the raidset with a tool
like testdisk, when I "recreate" my old gpt partition table ? Or is it likely that the restarted reshape/grow process made
minced meat out of my whole raid data ?
It seemed interesting to me, that the first grow/shape process seemed to not even touch the two added discs, shown as
spare now, their partition tables had not been touched. The process seems to deal only with my legacy raid 5 set with
six plates and seemed to move it to a transient raid5/6 architecture, therefore operating atleast on the disc (3) of legacy
set, that is now missing..
I'm not sure, how much time to spend in this data is sensible,
your advice could be very helpful.

During my test, I'm able to recreat the md0 and mount, but it's just for
reference only...

Test procedure:
mdadm --create --run --verbose /dev/md0 -c 256K --level=5 --raid-devices=6  /dev/sd[abcdef] --size=100M
mdadm -W /dev/md0
mkfs.xfs -f /dev/md0
echo 1024 > /sys/block/md0/md/sync_speed_max

mdadm --add /dev/md0 /dev/sdg /dev/sdh
sudo mdadm --grow /dev/md0 --level=6
sleep 2

echo frozen > /sys/block/md0/md/sync_action

echo system > /sys/block/md0/md/sync_speed_max
echo reshape > /sys/block/md0/md/sync_action
mdadm -W /dev/md0

xfs_repair -n /dev/md0

Above test will reporduce that md0 is corrupted, and this is just
because layout is changed. If I recreated md0 with original disks
with --assume-clean, xfs_repair won't complain and mount will succeed:

[root@fedora ~]# mdadm --create --run --verbose /dev/md0 -c 256K --level=5 --raid-devices=6  /dev/sd[abcdef] --size=100M --assume-clean
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda appears to contain an ext2fs file system
       size=10485760K  mtime=Mon Apr  3 06:18:17 2023
mdadm: /dev/sda appears to be part of a raid array:
       level=raid5 devices=6 ctime=Thu May  4 09:00:08 2023
mdadm: /dev/sdb appears to be part of a raid array:
       level=raid5 devices=6 ctime=Thu May  4 09:00:08 2023
mdadm: /dev/sdc appears to be part of a raid array:
       level=raid5 devices=6 ctime=Thu May  4 09:00:08 2023
mdadm: /dev/sdd appears to be part of a raid array:
       level=raid5 devices=6 ctime=Thu May  4 09:00:08 2023
mdadm: /dev/sde appears to be part of a raid array:
       level=raid5 devices=6 ctime=Thu May  4 09:00:08 2023
mdadm: /dev/sdf appears to be part of a raid array:
       level=raid5 devices=6 ctime=Thu May  4 09:00:08 2023
mdadm: largest drive (/dev/sda) exceeds size (102400K) by more than 1%
mdadm: creation continuing despite oddities due to --run
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
[root@fedora ~]# xfs_repair -n /dev/md0
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - 09:05:33: zeroing log - 4608 of 4608 blocks done
        - scan filesystem freespace and inode maps...
        - 09:05:33: scanning filesystem freespace - 8 of 8 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - 09:05:33: scanning agi unlinked lists - 8 of 8 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 7
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - 09:05:33: process known inodes and inode discovery - 64 of 64 inodes done
        - process newly discovered inodes...
        - 09:05:33: process newly discovered inodes - 8 of 8 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 09:05:33: setting up duplicate extent list - 8 of 8 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 6
        - agno = 4
        - agno = 3
        - agno = 7
        - agno = 2
        - agno = 5
        - 09:05:33: check for inodes claiming duplicate blocks - 64 of 64 inodes done
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
        - 09:05:33: verify and correct link counts - 8 of 8 allocation groups done
No modify flag set, skipping filesystem flush and exiting.

Thanks,
Kuai

regards

Peter

Am 04.05.23 um 10:16 schrieb Yu Kuai:
Hi,

在 2023/04/28 5:09, Peter Neuwirth 写道:
Hello linux-raid group.

I have an issue with my linux raid setup and I hope somebody here
could help me get my raid active again without data loss.

I have a debian 11 system with one raid array (6x 1TB hdd drives, raid level 5 )
that was active running till today, when I added two more 1TB hdd drives
and also changed the raid level to 6.

Note: For completition:

My raid setup month ago was

mdadm --create --verbose /dev/md0 -c 256K --level=5 --raid-devices=6  /dev/sdd /dev/sdc /dev/sdb /dev/sda /dev/sdg /dev/sdf

mkfs.xfs -d su=254k,sw=6 -l version=2,su=256k -s size=4k /dev/md0

mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf

update-initramfs -u

echo '/dev/md0 /mnt/data ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab

Today I did:

mdadm --add /dev/md0 /dev/sdg /dev/sdh

sudo mdadm --grow /dev/md0 --level=6

This started a growth process, I could observe with
watch -n 1 cat /proc/mdstat
and md0 was still usable all the day.
Due to speedy file access reasons I paused the grow and insertion
process today at about 50% by issue

echo "frozen" > /sys/block/md0/md/sync_action

After the file access was done, I restarted the
process with

echo reshape > /sys/block/md0/md/sync_action

After look into this problem, I figure out that this is how the problem
(corrupted data) triggered in the first place, while the problem that
kernel log about "md: cannot handle concurrent replacement and reshape"
is not fatal.

"echo reshape" will restart the whole process, while recorded reshape
position should be used. This is a seriously kernel bug, I'll try to fix
this soon.

By the way, "echo idle" should avoid this problem.

Thanks,
Kuai

but I saw in mdstat that it started form the scratch.
After about 5 min I noticed, that /dev/dm0 mount was gone with
an input/output error in syslog and I rebooted the computer, to see the
kernel would reassemble dm0 correctly. Maybe the this was a problem,
because the dm0 was still reshaping, I do not know..

.