On Monday May 1, Dick.Snippe@xxxxxxxxxxxxxx wrote: > Hello, > > Suppose a read action on a disk which is member of a raid5 (or raid1 or any > other raid where there's data redundancy) fails. > What ahppens next is that the entire disk is marked as "failed" and a raid5 > rebuild is initiated. > > However, that seems like overkill to me. If only one sector on one disk > failed, that sector could be re-calculated (using parity calculations) > AND written back to the original disk (i.e. the disk with the bad sector). > Any modern disk will do sector remapping, so the bad sector will simply be > replaced by a good one and there's no need to fail the entire disk. > ... and any modern linux kernel (since about 2.6.15) will to exactly what you suggest. > The reason I bring this up is that I think raid5 rebuilds are 'scary' > things. Suppose a raid5 rebuild is initiated while other members of the > raid5 set have bad -but yet undetected- sectors scattered around the disc > (Current_Pending_Sector in smartd speak). Now this raid5 rebuild would fail, > losing the entire raid5 set. While each and every bit in the raid5 set might > still be salvagable! (I've seen this happen on 5x250Gb raid5 sets.) > For this reason it is good to regularly do a background read check of the entire array. echo check > /sys/block/mdX/md/sync_action Any read errors will trigger and attempt to overwrite the bad block with good data. Do this regularly, *before* any drive really failed. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html