RE: device with newer data added as spare - data now gone?

"Leslie Rhorer" <lrhorer@xxxxxxxxxxx> · Sun, 28 Jun 2009 14:04:54 -0500



> Hi all
> 
> I've lost quite a lot of data on my /home raid partition and I'm wondering
> what exactly I did to make it happen. I'd like to know so something
> similar
> won't happen in the future.

	Well, first of all, of course learning from your mistakes will
prevent them from happening in the future.  If you ask me, a second very
important utility to enable is mdadm's ability to notify you via e-mail
whenever a significant event transpires.  You will then be notified quickly
of any significant changes to any RAID array, such as losing a hard drive.

	Finally, more important than anything else: BACK UP YOUR IMPORTANT
DATA.  If it is data that can be recovered through some external process,
but takes a bit of doing, back it up once, and keep it handy.  A different
drive or array on the same machine or a different machine in the same room
is fine for this level of backup.  If it is data that cannot be recovered
and would cause some heartache if lost, then include it in the local handy
backup, but also include it in an off-premise backup.  If it is critical
data - like financial information, then back it up 16 ways from Sunday.  I
keep all critical data backed up on two different servers with independent
RAID arrays, DVD-ROM backups offsite, and independent multi-generation
backups on every workstation which accesses the data.  If it is a commercial
application and the revenue supports it, or if it is important enough to you
and you can personally afford it, I suggest you might look into an online
storage solution.

	Remember, RAID arrays are fault tolerant, not fault-free, and while
hard drives are frail, the most likely source of data failure by far is user
error.

> * Some time ago I did something to have one device fail which resulted md3
> in having only 1 device.

	I presume md3 is the /home array and this was a 2 drive RAID1 array,
yes?

> * Time went by without me noticing (because I suck)

	See above.  We human beings all tend to suck from time to time.
Computers can help by reminding or notifying us of things - if we bother to
set them up to do so.

> * An update broke my raid setup and gave me a kernel panic (because I
> suck).
> Didn't put the mdadm and raid hooks in mkinitcpio.conf
> * Booted a live-cd, mounted the drives and chrooted back into the system
> and
> fixed the mkinitcpio.conf

	This all sounds like lessons learned.

> * Rebooted and noticed that md3 was running with only 1 device
> * Added sdb4 to md3 and it then read 1 device with 1 spare
> * cat /proc/mdstat started to say "recovery"
> * All data from approx. 1 year is gone

	Was sdb4 originally the second partition in the array?  What is the
first partition?  What was the apparent cause of the failure?

> I guessing that the old (not updated) device was set as "master" and the
> data on the drive (containing newer data) was overwritten by data on the
> old
> device - is this plausible?

	Well, I suppose, yeah.

> If not what exactly did I do to delete all of the data?

	What command did you use to add the partition back to the array?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html