[ ... ] > In my opinion, any corruption noticed in a non-ECC system is > most likely due to the RAM. That's pretty common, but many disk drive models also have bugs, and most hw RAID host adapters have many (terrible) bugs. > You really need to run memtest86 on your system, preferably > for 24 hours or more. Even that is not conclusive. Some "memory" errors are due to activity/noise spikes on the PCI/PCIe bus due to hw bugs or poorly electrically designed cards. >>> Hard drives write extensive ECC payloads to catch >>> corruptions there; SATA and SAS protocols have CRC checks on >>> every frame transferred; A warning to the masses: USB mass storage is weak as to this and in particular as to error recovery, and most USB chipsets (especially USB-drive ones, but also motherboard ones) are massively buggy. >>> the PCIe bus uses CRC checks on each lane, with low-level >>> encoding very similar to SATA. Even modern processors are >>> using PCIe-style encoded [ ... ] > [ ... ] machine handling data you really care about ... should have end-to-end verification, that is the data itself should be checksummed at least to detect corruption. For example by putting it into checksummed containers (even just ZIP without compression). > should have ECC ram. Oh yes, and any machine should have ECC RAM as the cost is really modest. Unfortunately the usual evil marketers like to segment artificially the market into cheap stuff without ECC and premium stuff with ECC, and will not put ECC into cheap stuff to avoid tempting business customers to buy it instead of the premium stuff. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html