Re: Linux Software RAID a bit of a weakness?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Colin Simpson wrote:
Hi,
We had a small server here that was configured with a RAID 1 mirror,
using two IDE disks.
Last week one of the drives failed in this. So we replaced the drive and
set the array to rebuild. The "good" disk then found a bad block and the
mirror failed.

Now I presume that the "good" disk must have had an underlying bad block
in either unallocated space or a file I never access. Now as RAID works
at the block level you only ever see this on an array rebuild when it's
often catastrophic. Is this a bit of a flaw?
I know there is the definite probability of two drives failing within a
short period of time. But this is a bit different as it's the
probability of two drives failing but over a much larger time scale if
one of the flaws is hidden in unallocated space (maybe a dirt particle
finds it's way onto the surface or something). This would make RAID buy
you a lot less in reliability, I'd have thought.
I seem to remember seeing in the log file for a Dell perc something
about scavenging for bad blocks. Do hardware RAID systems have a
mechanism that at times of low activity search the disks for bad blocks
to help guard against this sort of failure (so a disk error is reported
early)?

On Software RAID, I was thinking apart from a three way mirror, which I
don't think is at present supported. Is there any merit in say, cat'ing
the whole disk devices to /dev/null every so often to check that the
whole surface is readable (I presume just reading the raw device won't
upset thing, don't worry I don't plan on trying it on a production
system).
Any thoughts? As I presume people have thought of this before and I must
be missing something.

Yes, this is an important thing to keep on top of, both for hardware RAID and software RAID. For md:

	echo check > /sys/block/md0/md/sync_action

This should be done regularly. I have cron do it once a week.

Check out: http://neil.brown.name/blog/20050727141521-002

Good luck,

Steve
--
______________________________________________________________________
 Steve Cousins, Ocean Modeling Group    Email: cousins@xxxxxxxxxxxxxx
 Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux