Re: What happens if raid gets broken?

Molle Bestefich <molle.bestefich@xxxxxxxxx> · Fri, 25 Mar 2005 02:12:33 -0800

Roland wrote:
> What happend if for example 1 disk of a mirror raid gets broken, has bad sectors?
> Will there be an appropriate error message

Probably up to the device mapper mirror target (dm-mirror.(k)o), not
dmraid.  Hopefully you will get an error message in the syslog, which
you can use your favorite syslog daemon to direct to somewhere useful.
 Not sure what HighPoint's proprietary driver does.  I think it tries
to write the faulted sector back from the working drive immediately,
thus freezing the operating system meanwhile.  All personal guesswork,
though.  If you can't get an answer here, you could try the
device-mapper mailing list.

> and will the metadata be changed?
Probably ought to be, but I think it won't.
AFAICT, dmraid currently only tells the device-mapper how to assemble
RAID arrays, it doesn't stay alive in any way in order to reflect
drive status to array metadata or such.  And I'm pretty sure that
dm-mirror doesn't do it.  As I remember it, dmraid comes with good
concise documentation, should be mentioned there.

> I am running dmraid with a hpt37x mirror (raid 1) on 2.6.10 debian amd64.

> When I copy some large files onto the raid, my computer "freezes" and I dont
> get any message in syslog or dmesg. I load dmraid in verbose mode and also
> have enabled debug symbols but dont see any
> error message.

> Whats wrong?
Not sure.  I've seen the exact same thing happen with HPT37x's with
proprietary drivers, so perhaps it's a hardware kink that occurs under
specific circumstances.  Then again, maybe it's not, I've also seen
numerous bugs in the Linux IDE layer.

> Is this a problem of the device mapper?
Could be.  That or the HighPoint driver.  How reproducible is the
problem?  If you have a backup or your data is expendable, your could
try running parallel dd's to write out a large amount of data to each
drive in parallel.  If it still freezes, it's not the device-mapper
;-).

> Any Idea?
  Try upgrading to kernel 2.6.11, and upgrade the device-mapper too..
  I think the next step then is probably to enable SysRq support in
your kernel, read a kernel debugging tutorial and see if you can find
out where it's frozen / deadlocked / infinte-loop'ed / what not.
  If you really want to know what's happened, in order to make 100%
sure that it doesn't occur again, you should of course debug against
your current kernel version.  Find the bug, and check for it's
existance in newer versions of kernel / whatever.  But if you go this
path, you probably can't expect any help whatsoever from the kernel
hackers or any such.

HTH...