Hi,
在 2023/04/28 5:09, Peter Neuwirth 写道:
Hello linux-raid group.
I have an issue with my linux raid setup and I hope somebody here
could help me get my raid active again without data loss.
I have a debian 11 system with one raid array (6x 1TB hdd drives, raid
level 5 )
that was active running till today, when I added two more 1TB hdd drives
and also changed the raid level to 6.
Note: For completition:
My raid setup month ago was
mdadm --create --verbose /dev/md0 -c 256K --level=5 --raid-devices=6
/dev/sdd /dev/sdc /dev/sdb /dev/sda /dev/sdg /dev/sdf
mkfs.xfs -d su=254k,sw=6 -l version=2,su=256k -s size=4k /dev/md0
mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf
update-initramfs -u
echo '/dev/md0 /mnt/data ext4 defaults,nofail,discard 0 0' | sudo tee -a
/etc/fstab
Today I did:
mdadm --add /dev/md0 /dev/sdg /dev/sdh
sudo mdadm --grow /dev/md0 --level=6
This started a growth process, I could observe with
watch -n 1 cat /proc/mdstat
and md0 was still usable all the day.
Due to speedy file access reasons I paused the grow and insertion
process today at about 50% by issue
echo "frozen" > /sys/block/md0/md/sync_action
After the file access was done, I restarted the
process with
echo reshape > /sys/block/md0/md/sync_action
After look into this problem, I figure out that this is how the problem
(corrupted data) triggered in the first place, while the problem that
kernel log about "md: cannot handle concurrent replacement and reshape"
is not fatal.
"echo reshape" will restart the whole process, while recorded reshape
position should be used. This is a seriously kernel bug, I'll try to fix
this soon.
By the way, "echo idle" should avoid this problem.
Thanks,
Kuai
but I saw in mdstat that it started form the scratch.
After about 5 min I noticed, that /dev/dm0 mount was gone with
an input/output error in syslog and I rebooted the computer, to see the
kernel would reassemble dm0 correctly. Maybe the this was a problem,
because the dm0 was still reshaping, I do not know..