On Tue, Nov 16, 2021 at 2:39 AM Markus Hochholdinger <Markus@xxxxxxxxxxxxxxxxx> wrote: > > Hi Xiao, > > for 1.0 the super block is at the end of the device, out of "man mdadm": > -e, --metadata= > [..] > The different sub-versions > store the superblock at different locations on the de- > vice, either at the end (for 1.0), at the start (for 1.1) > or 4K from the start (for 1.2). "1" is equivalent to > "1.2" (the commonly preferred 1.x format). "default" is > equivalent to "1.2". > (Because of other reasons, we have intentionally choosen the superblock at the > end of the device.) > > We change the device size of raid1 arrays, which are inside a VM, on a regular > basis. And afterwards we grow the raid1 while the raid1 is online. Therefore, > the superblock has to be moved. I c. Thanks for giving the test case. > This is very neat, because we can grow the raid1 and the filesystem size in a > very short time frame and don't have to rebuild the raid1 twice (remove one > device, resize and add with full rebuild because the old superblock is > somewhere inbetween, then the same for the other device) before we can grow > the raid1 and the filesystem. > If this explanation is not enough why we need this feature, I can explain in > more detail why someone would do the software raid1 within a VM if you like. It's enough. But could you talk more about the reason why create a raid1 in a vm? I want to know more scenarios that use raid devices. > > As I understand, if the superblock isn't moved and we have grown the array and > the filesystem on it, the superblock will now be updated inbetween the > filesystem and may corrupt the filesystem and data. > > Funny thing, after both devices are resized, the raid1 is still online and the > grow does work. But afterwards, one can't do anything with the raid1, you get > errors about the superblock, e.g. mdadm -D .. works, but mdadm -E .. for both > devices doesn't. You can remove one device from the raid1, but you can't add > it anymore, mdadm --add .. says: "mdadm: cannot load array metadata from /dev/ > md0". And you can't re-assemble the raid1 after it is stopped. mdadm -D reads information from files under /sys/block/md. mdadm -E reads data from disk. So one works and the other doesn't. And in kernel space, it doesn't update the superblock offset, and it still reads superblock from the old position. But in userspace it calculates the superblock position based on the disk size. It's in a mess now. > > I can reproduce this: With kernel version <= 5.8.18 the above works as > expected. Since kernel version 5.9.x it doesn't anymore. > I tested this patch with kernel 5.15.1 and 5.10.46 and the above works again. > > > Here is a minimal setup to test this (but in real life we use it in a VM with > virtual disks which can be resized online): > # truncate -s 1G /var/tmp/rdev1 > # truncate -s 1G /var/tmp/rdev2 > # losetup -f /var/tmp/rdev1 > # losetup -f /var/tmp/rdev2 > # losetup -j /var/tmp/rdev1 > /dev/loop0: [2304]:786663 (/var/tmp/rdev1) > # losetup -j /var/tmp/rdev2 > /dev/loop1: [2304]:788462 (/var/tmp/rdev2) > # mdadm --create --assume-clean /dev/md9 --metadata=1.0 --level=1 --raid- > disks=2 /dev/loop0 /dev/loop1 > mdadm: array /dev/md9 started. > # mdadm -E /dev/loop0 > /dev/loop0: > Magic : a92b4efc > Version : 1.0 > Feature Map : 0x0 > [..] > # # grow the first loop device by 100MB > # dd if=/dev/zero bs=1M count=100 >> /var/tmp/rdev1 > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB, 100 MiB) copied, 0.0960313 s, 1.1 GB/s > # losetup -c /dev/loop0 > > ### with kernel <= 5.8.18 ### > # mdadm -E /dev/loop0 > mdadm: No md superblock detected on /dev/loop0. > # echo 0 > /sys/block/md9/md/rd0/size > # mdadm -E /dev/loop0 > /dev/loop0: > Magic : a92b4efc > Version : 1.0 > Feature Map : 0x0 > [..] > # > > ### with kernel >= 5.9.x ### > # mdadm -E /dev/loop0 > mdadm: No md superblock detected on /dev/loop0. > # echo 0 > /sys/block/md9/md/rd0/size > # mdadm -E /dev/loop0 > mdadm: No md superblock detected on /dev/loop0. > # Thanks again for those detail steps. Xiao