Hi,
在 2023/05/04 16:36, Peter Neuwirth 写道:
Thank you, Kuai!
So my gut instinct was not that bad. Now as I could reassemble my raid
set (it tried to recontinue the rebuild, I stopped it)
I have a /dev/md0 but it seems that no sensible data is stored on it.
Not even a partition table could be found.
From your investigations, what would you say : is there hope I could
rescue some of the data from the raidset with a tool
like testdisk, when I "recreate" my old gpt partition table ? Or is it
likely that the restarted reshape/grow process made
minced meat out of my whole raid data ?
It seemed interesting to me, that the first grow/shape process seemed to
not even touch the two added discs, shown as
spare now, their partition tables had not been touched. The process
seems to deal only with my legacy raid 5 set with
six plates and seemed to move it to a transient raid5/6 architecture,
therefore operating atleast on the disc (3) of legacy
set, that is now missing..
I'm not sure, how much time to spend in this data is sensible,
your advice could be very helpful.
During my test, I'm able to recreat the md0 and mount, but it's just for
reference only...
Test procedure:
mdadm --create --run --verbose /dev/md0 -c 256K --level=5
--raid-devices=6 /dev/sd[abcdef] --size=100M
mdadm -W /dev/md0
mkfs.xfs -f /dev/md0
echo 1024 > /sys/block/md0/md/sync_speed_max
mdadm --add /dev/md0 /dev/sdg /dev/sdh
sudo mdadm --grow /dev/md0 --level=6
sleep 2
echo frozen > /sys/block/md0/md/sync_action
echo system > /sys/block/md0/md/sync_speed_max
echo reshape > /sys/block/md0/md/sync_action
mdadm -W /dev/md0
xfs_repair -n /dev/md0
Above test will reporduce that md0 is corrupted, and this is just
because layout is changed. If I recreated md0 with original disks
with --assume-clean, xfs_repair won't complain and mount will succeed:
[root@fedora ~]# mdadm --create --run --verbose /dev/md0 -c 256K
--level=5 --raid-devices=6 /dev/sd[abcdef] --size=100M --assume-clean
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda appears to contain an ext2fs file system
size=10485760K mtime=Mon Apr 3 06:18:17 2023
mdadm: /dev/sda appears to be part of a raid array:
level=raid5 devices=6 ctime=Thu May 4 09:00:08 2023
mdadm: /dev/sdb appears to be part of a raid array:
level=raid5 devices=6 ctime=Thu May 4 09:00:08 2023
mdadm: /dev/sdc appears to be part of a raid array:
level=raid5 devices=6 ctime=Thu May 4 09:00:08 2023
mdadm: /dev/sdd appears to be part of a raid array:
level=raid5 devices=6 ctime=Thu May 4 09:00:08 2023
mdadm: /dev/sde appears to be part of a raid array:
level=raid5 devices=6 ctime=Thu May 4 09:00:08 2023
mdadm: /dev/sdf appears to be part of a raid array:
level=raid5 devices=6 ctime=Thu May 4 09:00:08 2023
mdadm: largest drive (/dev/sda) exceeds size (102400K) by more than 1%
mdadm: creation continuing despite oddities due to --run
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
[root@fedora ~]# xfs_repair -n /dev/md0
Phase 1 - find and verify superblock...
- reporting progress in intervals of 15 minutes
Phase 2 - using internal log
- zero log...
- 09:05:33: zeroing log - 4608 of 4608 blocks done
- scan filesystem freespace and inode maps...
- 09:05:33: scanning filesystem freespace - 8 of 8 allocation
groups done
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- 09:05:33: scanning agi unlinked lists - 8 of 8 allocation
groups done
- process known inodes and perform inode discovery...
- agno = 7
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- 09:05:33: process known inodes and inode discovery - 64 of 64
inodes done
- process newly discovered inodes...
- 09:05:33: process newly discovered inodes - 8 of 8 allocation
groups done
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- 09:05:33: setting up duplicate extent list - 8 of 8
allocation groups done
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 6
- agno = 4
- agno = 3
- agno = 7
- agno = 2
- agno = 5
- 09:05:33: check for inodes claiming duplicate blocks - 64 of
64 inodes done
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
- 09:05:33: verify and correct link counts - 8 of 8 allocation
groups done
No modify flag set, skipping filesystem flush and exiting.
Thanks,
Kuai
regards
Peter
Am 04.05.23 um 10:16 schrieb Yu Kuai:
Hi,
在 2023/04/28 5:09, Peter Neuwirth 写道:
Hello linux-raid group.
I have an issue with my linux raid setup and I hope somebody here
could help me get my raid active again without data loss.
I have a debian 11 system with one raid array (6x 1TB hdd drives,
raid level 5 )
that was active running till today, when I added two more 1TB hdd drives
and also changed the raid level to 6.
Note: For completition:
My raid setup month ago was
mdadm --create --verbose /dev/md0 -c 256K --level=5 --raid-devices=6
/dev/sdd /dev/sdc /dev/sdb /dev/sda /dev/sdg /dev/sdf
mkfs.xfs -d su=254k,sw=6 -l version=2,su=256k -s size=4k /dev/md0
mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf
update-initramfs -u
echo '/dev/md0 /mnt/data ext4 defaults,nofail,discard 0 0' | sudo tee
-a /etc/fstab
Today I did:
mdadm --add /dev/md0 /dev/sdg /dev/sdh
sudo mdadm --grow /dev/md0 --level=6
This started a growth process, I could observe with
watch -n 1 cat /proc/mdstat
and md0 was still usable all the day.
Due to speedy file access reasons I paused the grow and insertion
process today at about 50% by issue
echo "frozen" > /sys/block/md0/md/sync_action
After the file access was done, I restarted the
process with
echo reshape > /sys/block/md0/md/sync_action
After look into this problem, I figure out that this is how the problem
(corrupted data) triggered in the first place, while the problem that
kernel log about "md: cannot handle concurrent replacement and reshape"
is not fatal.
"echo reshape" will restart the whole process, while recorded reshape
position should be used. This is a seriously kernel bug, I'll try to fix
this soon.
By the way, "echo idle" should avoid this problem.
Thanks,
Kuai
but I saw in mdstat that it started form the scratch.
After about 5 min I noticed, that /dev/dm0 mount was gone with
an input/output error in syslog and I rebooted the computer, to see the
kernel would reassemble dm0 correctly. Maybe the this was a problem,
because the dm0 was still reshaping, I do not know..
.