Greg Freemyer wrote:
All, On the mdraid list, there was a recent thread about using raid functionality to detect / repair silent corruption. The issues brought up were that a lot of silent data corruption occurs when cables, controllers, power supplies, ram, cache, etc. goes bad. It made me think about another option for detecting silent corruption I have not seen discussed, but maybe I missed it. Aiui, the ATA spec allows for the reading of a long sector as well as the normal 512 byte sector. When you get a long sector you also get the CRC (or whatever checksum data there is on the disk that allows the drive itself to detect media errors). I don't have any idea how easy or hard it would be to do, but I would like to see the entire block subsystem enhanced to optionally allow long sector reads to be used in a "paranoid" fashion. Effectively it would be: 1) Read long sector from drive: verify CRC in kernel. This tests most everything on the i/o path. 2) maintain CRC type information in block subsystem. Verify no corruption just before handing off to userspace. This would potentially identify CPU/cache/RAM failures. Mark Lord has implemented long sector reads via hdparm. Mark can you comment on the feasibility of this idea?
.. The ATA READ/WRITE LONG commands have been obsoleted in the past few ATA specs, even though most drives continue to implement them. But not a good avenue. There's a separate effort, involving drive vendors and kernel hackers, to provide end-to-end CRC protection of data. I forget what it was called, but that's the future of this stuff for high-reliability requirements. Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html