On 04/17/2014 06:22 AM, L.M.J wrote:
For the third time, I had to change a failed drive from my home linux RAID5
box. Previous time went right and this time, I don't know what I did wrong,
but I broke my RAID5. Well, at least, he won't start.
/dev/sdb was the failed drive
/dev/sdc and /dev/sdd are OK.
I tried to reassamble the RAID with this command after I replace sdb and
create a new partition :
~# mdadm -Cv /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
Well, I gues I did a mistake here, I should have done this instead :
~# mdadm -Cv /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/sdc1 /dev/sdd1 missing
Maybe this wipe out my data...
This is not an LVM problem, but an mdadm usage problem.
You told mdadm to create a new empty md device! (-C means create a new
array!) You should have just started the old degraded md array, remove
the failed drive, and add the new drive.
But I don't think your data is gone yet... (because of assume-clean).
Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty
information :-(
Google helped me, and I did this :
~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt
[..]
physical_volumes {
pv0 {
id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
device = "/dev/md0"
status = ["ALLOCATABLE"]
flags = []
dev_size = 7814047360
pe_start = 384
pe_count = 953863
}
}
logical_volumes {
lvdata {
id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
[..]
Since I saw lvm information, I guess I haven't lost all information yet...
nothing is lost ... yet
What you needed to do was REMOVE the blank drive before you write
anything to the RAID5! You didn't add it as a missing drive to be restored,
as you noted.
I tried an unhoped command :
~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW" --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0
*Now* you are writing to the md and destroying your data!
Then,
~# vgcfgrestore lvm-raid
Overwriting your LVM metadata. But maybe not the end of the world YET...
~# lvs -a -o +devices
LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices
lvdata lvm-raid -wi-a- 450,00g /dev/md0(148480)
lvmp lvm-raid -wi-a- 80,00g /dev/md0(263680)
Then :
~# lvchange -ay /dev/lvm-raid/lv*
I was quite happy until now.
Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as ext4 partition :
~# mount /home/foo/RAID_mp/
~# mount | grep -i mp
/dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)
~# df -h /home/foo/RAID_mp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/lvm--raid-lvmp 79G 61G 19G 77% /home/foo/RAID_mp
Here is the big problem
~# ls -la /home/foo/RAID_mp
total 0
Worst on the other LVM :
~# mount /home/foo/RAID_data
mount: wrong fs type, bad option, bad superblock on /dev/mapper/lvm--raid-lvdata,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
Yes, you told md that the drive with random/blank data was good data!
If ONLY you had mounted those filesystems
READ ONLY while checking things out, you would still be ok. But now,
you have overwritten stuff!
I bet I recover the LVM structure but the data are wiped out, don't you think ?
~# fsck -n -v /dev/mapper/lvm--raid-lvdata
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: Bad magic number in super-block when using the backup blocks
fsck.ext4: going back to original superblock
fsck.ext4: Device or resource busy while trying to open /dev/mapper/lvm--raid-lvdata
Filesystem mounted or opened exclusively by another program?
Any help is welcome if you have any idea how to rescue me pleassse !
Fortunately, your fsck was read only. At this point, you need to
crash/halt your system with no shutdown (to avoid further writes to the
mounted filesystems).
Then REMOVE the new drive. Start up again, and add the new drive properly.
You should check stuff out READ ONLY. You will need fsck (READ ONLY at
first), and at least some data has been destroyed.
If the data is really important, you need to copy the two old drives
somewhere before you do ANYTHING else. Buy two more drives! That will
let you recover from any more mistakes typing Create instead of Assemble
or Manage. (Note that --assume-clean warns you that you really need to
know what you are doing!)
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/