Re: raid5 crash on system which PAGE_SIZE is 64KB

Xiao Ni <xni@xxxxxxxxxx> · Wed, 24 Mar 2021 16:02:27 +0800

I can also reproduce this problem on my qemu vm system, with 3 10G disks.
But, there is no problem when I change mkfs.xfs option 'agcount' (default
value is 16 for my system). For example, if I set agcount=15, there is no
problem when mount xfs, likely:

mkfs.xfs -d agcount=15 -f /dev/md0
mount /dev/md0 /mnt/test

Hi Yufen

I did test with agcount=15, this problem exists too in my environment.

Test1:
[root@ibm-p8-11 ~]# mdadm -CR /dev/md0 -l5 -n3 /dev/sd[b-d]1 --size=20G
[root@ibm-p8-11 ~]# mkfs.xfs /dev/md0 -f
meta-data=/dev/md0               isize=512    agcount=16, agsize=655232 blks
...
[root@ibm-p8-11 ~]# mount /dev/md0 /mnt/test
mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.

Test2:
[root@ibm-p8-11 ~]# mkfs.xfs /dev/md0 -f -d agcount=15
Warning: AG size is a multiple of stripe width.  This can cause performance
problems by aligning all AGs on the same disk.  To avoid this, run mkfs with
an AG size that is one stripe unit smaller or larger, for example 699008.
meta-data=/dev/md0               isize=512    agcount=15, agsize=699136 blks
...
[root@ibm-p8-11 ~]# mount /dev/md0 /mnt/test
mount: /mnt/test: mount(2) system call failed: Structure needs cleaning.

In addition, I try to write a 128MB file to /dev/md0 and then read it out
during md resync, they are same by checking md5sum, likely:

dd if=randfile of=/dev/md0 bs=1M count=128 oflag=direct seek=10240
dd if=/dev/md0 of=out.randfile bs=1M count=128 oflag=direct skip=10240

BTW, I found mkfs.xfs have some options related to raid device, such as
sunit, su, swidth, sw. I guess this problem may be caused by data 
alignment.
But, I have no idea how it happen. More time may needed.

The problem doesn't happen if mkfs without resync. Is there a 
possibility that resync and mkfs
write to the same page?

Regards
Xiao