bit-rot, crc errors, etc question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quick question:

Been running a large ext3 filesystem on an LVM set with multiple linux /dev/mdX raid5 arrays underneath. Recently, upon trying to do full identical rewrites of every bit (literally) of data, I'm starting to find cases where the server locks up/reboots, and the culprit seems to be tracked to a first failure of one of the ATA drives having a bad CRC. Replacing the single bad drive fixes the issue.

My best guess is this: the filesystem is built on the LVM, composed of extents. The extents reside on physical volumes. The physical volumes are developing uncorrectable errors through natural use/time/heat/secret alien plot. These silent failures sit around until I try to access those pieces of those drives, at which point big catastrophic failures occur, incurring downtime, potential data loss, and expense.

How can I 1) prevent this, 2) detect this, 3) correct this without tossing the drive for a single small bad area?

Is the md driver set smart enough to correct around such physical media errors? Are there ways via mdadm/other tools to actively scan for such bad areas (obviously in this case filesystem tools to do this are useless, right)? Can I potentially continue using this "bad" drive by somehow applying a correction?

Regards-
Michael Stumpf








-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux