entire array lost when some blocks unreadable?

Tom Eicher <tom.eicher@xxxxxx> · Tue, 07 Jun 2005 22:56:58 +0200

Hi list,

I might be missing the point here... I lost my first Raid-5 array 
(apparently) because one drive was kicked out after a DriveSeek error. 
When reconstruction startet at full speed, some blocks on another drive 
appeared to have uncorrectable errors, resulting in that drive also 
being kicked... you get it.

Now here is my question: On a normal drive, I would expect that a drive 
seek error or uncorrectable blocks would typically not take out the 
entire drive, but rather just corrupt the files that happen to be on 
those blocks. With RAID, a local error seems to render the entire array 
unusable. This would seem like an extreme measure to take just for some 
corrupt blocks.

- Is it correct that a relatively small corrupt area on a drive can 
cause the raid manager to kick out a drive?
- How does one prevent the scenario above?
 - periodically run drive tests (smart -t...) to early detect problems 
before multiple drives fail?
 - periodically run over the entire drives and copy the data around so 
the drives can sort out the bad blocks?

Thanks for any insight, tom
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html