Re: LVM issues after replacing linux mdadm RAID5 drive

Stuart Gathman <stuart@gathman.org> · Thu, 17 Apr 2014 15:33:48 -0400

On 04/17/2014 06:22 AM, L.M.J wrote:
   For the third time, I had to change a failed drive from my home linux RAID5
   box. Previous time went right and this time, I don't know what I did wrong,
   but I broke my RAID5. Well, at least, he won't start.
   /dev/sdb was the failed drive
   /dev/sdc and /dev/sdd are OK.

   I tried to reassamble the RAID with this command after I replace sdb and
   create a new partition :
   ~# mdadm -Cv /dev/md0 --assume-clean --level=5
   --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1

   Well, I gues I did a mistake here, I should have done this instead :
   ~# mdadm -Cv /dev/md0 --assume-clean --level=5
   --raid-devices=3 /dev/sdc1 /dev/sdd1 missing

   Maybe this wipe out my data...
This is not an LVM problem, but an mdadm usage problem.

You told mdadm to create a new empty md device!  (-C means create a new 
array!)  You should have just started the old degraded md array, remove 
the failed drive, and add the new drive.

But I don't think your data is gone yet... (because of assume-clean).
   Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty
   information :-(

   Google helped me, and I did this :
   ~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt

	[..]
	physical_volumes {
		pv0 {
			id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
			device = "/dev/md0"
			status = ["ALLOCATABLE"]
			flags = []
			dev_size = 7814047360
			pe_start = 384
			pe_count = 953863
		}
	}
	logical_volumes {

		lvdata {
			id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
			status = ["READ", "WRITE", "VISIBLE"]
			flags = []
			segment_count = 1
	[..]

   Since I saw lvm information, I guess I haven't lost all information yet...
nothing is lost ... yet

What you needed to do was REMOVE the blank drive before you write 
anything to the RAID5!  You didn't add it as a missing drive to be restored,
as you noted.
   I tried an unhoped command :
   ~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW" --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0

*Now* you are writing to the md and destroying your data!
   Then,
   ~# vgcfgrestore lvm-raid
Overwriting your LVM metadata.  But maybe not the end of the world YET...
   ~# lvs -a -o +devices
   LV     VG       Attr   LSize   Origin Snap%  Move Log Copy%  Convert  Devices
   lvdata lvm-raid -wi-a- 450,00g                                       /dev/md0(148480)
   lvmp   lvm-raid -wi-a-  80,00g                                       /dev/md0(263680)

   Then :
   ~# lvchange -ay /dev/lvm-raid/lv*

   I was quite happy until now.
   Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as ext4 partition :
   ~# mount /home/foo/RAID_mp/

   ~# mount | grep -i mp
      /dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)

   ~# df -h /home/foo/RAID_mp
      Filesystem                  Size  Used Avail Use% Mounted on
      /dev/mapper/lvm--raid-lvmp   79G   61G   19G  77% /home/foo/RAID_mp

   Here is the big problem
   ~# ls -la /home/foo/RAID_mp
      total 0

   Worst on the other LVM :
   ~# mount /home/foo/RAID_data
      mount: wrong fs type, bad option, bad superblock on /dev/mapper/lvm--raid-lvdata,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so
Yes, you told md that the drive with random/blank data was good data!  
If ONLY you had mounted those filesystems
READ ONLY while checking things out, you would still be ok.  But now, 
you have overwritten stuff!

   I bet I recover the LVM structure but the data are wiped out, don't you think ?

   ~# fsck -n -v /dev/mapper/lvm--raid-lvdata
      fsck from util-linux-ng 2.17.2
      e2fsck 1.41.11 (14-Mar-2010)
      fsck.ext4: Group descriptors look bad... trying backup blocks...
      fsck.ext4: Bad magic number in super-block when using the backup blocks
      fsck.ext4: going back to original superblock
      fsck.ext4: Device or resource busy while trying to open /dev/mapper/lvm--raid-lvdata
      Filesystem mounted or opened exclusively by another program?

   Any help is welcome if you have any idea how to rescue me pleassse !
Fortunately, your fsck was read only.  At this point, you need to 
crash/halt your system with no shutdown (to avoid further writes to the 
mounted filesystems).
Then REMOVE the new drive.  Start up again, and add the new drive properly.

You should check stuff out READ ONLY.  You will need fsck (READ ONLY at 
first), and at least some data has been destroyed.
If the data is really important, you need to copy the two old drives 
somewhere before you do ANYTHING else.  Buy two more drives!  That will 
let you recover from any more mistakes typing Create instead of Assemble 
or Manage.  (Note that --assume-clean warns you that you really need to 
know what you are doing!)

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/