Re: linux mdadm assembly error: md: cannot handle concurrent replacement and reshape. (reboot while reshaping)

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Thu, 4 May 2023 10:10:46 +0800

Hi,

在 2023/05/04 9:57, Yu Kuai 写道:
Hi,

在 2023/05/02 19:30, Peter Neuwirth 写道:
Hello Kuai,

thank you for your suggestion!
It is true, as I read the source of error message in drivers/md/raid5.c,
I saw that growing and replacement is to much to handle.
So I did what you suggested and started the raid 5 (that was in a
raid 6 transformation with addition of two more drives) with only the
5 members, that should run a degraded raid 5.

mdadm --assemble --run   /dev/md0 /dev/sdd /dev/sdc /dev/sdb /dev/sdi 
/dev/sdj

this worked and it was assembled.

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] 
[raid1] [raid10]
md0 : active (auto-read-only) raid6 sdd[0] sdi[6] sdj[4] sdb[2] sdc[1]
      4883151360 blocks super 1.2 level 6, 256k chunk, algorithm 18 
[7/5] [UUU_UU_]
      bitmap: 0/8 pages [0KB], 65536KB chunk

unused devices: <none>

mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Mon Mar  6 18:17:30 2023
        Raid Level : raid6
        Array Size : 4883151360 (4656.94 GiB 5000.35 GB)
     Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
      Raid Devices : 7
     Total Devices : 5
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Fri Apr 28 04:21:03 2023
             State : clean, degraded
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric-6
        Chunk Size : 256K

Consistency Policy : bitmap

        New Layout : left-symmetric

              Name : solidsrv11:0  (local to host solidsrv11)
              UUID : 1a87479e:7513dd65:37c61ca1:43184f65
            Events : 6336

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd
       1       8       32        1      active sync   /dev/sdc
       2       8       16        2      active sync   /dev/sdb
       -       0        0        3      removed
       4       8      144        4      active sync   /dev/sdj
       6       8      128        5      active sync   /dev/sdi
       -       0        0        6      removed

But when I try to mount it as xfs fs:

mount: /mnt/image: mount(2) system call failed: Structure needs cleaning.

When I try to repair the xfs fs, it tells me, that there was no 
superblock
found..

Sorry to hear that, it seems like data is corrupted already, and this
really is a kernel issue that somehow replacement（resync?) and reshape
is messed. And I suspect that reboot while reshape is in progress and
replacement exist can trigger this...

I have no idea for now, but I'll try to repoduce this problem and fix
it.

Hi,

I can reporduce this based on the steps you described:

mdadm --assemble --run /dev/md0 /dev/sd[abcdefgh] will fail:
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

kernel will complain:
[  186.133231] md: cannot handle concurrent replacement and reshape.
[  186.179587] md/raid:md0: failed to run raid set.
[  186.180851] md: pers->run() failed ...

mdadm -D shows:
 Number   Major   Minor   RaidDevice State
    -       0        0        0      removed
    -       0        0        1      removed
    -       0        0        2      removed
    -       0        0        3      removed
    -       0        0        4      removed
    -       0        0        5      removed
    -       0        0        6      removed

    -       8       64        4      sync   /dev/sde
    -       8       32        2      sync   /dev/sdc
    -       8        0        0      sync   /dev/sda
    -       8      112        6      spare rebuilding   /dev/sdh
    -       8       80        5      sync   /dev/sdf
    -       8       48        3      sync   /dev/sdd
    -       8       16        1      sync   /dev/sdb
    -       8       96        0      spare rebuilding   /dev/sdg

I'll try to come up with a solution.

Thanks,
Kuai