> Date: Mon, 8 Feb 2016 17:57:48 +0100 > From: Arno Wagner <arno@xxxxxxxxxxx> > From my experience shoveling a few hundred TBs of research data > around when 200GB disks where standard, the only undetected errors > I ever found were due to memory corruption due to a weak RAM bit > in one server that did not have ECC memory. Those amounted to > 3 errors in 30TBs of recorded data. I never had undetected read > errors from disk (and since all data was bzip2 compressed, > errors would have been found), so I tend to view these as not > a disk problem, but likely happening someplace after the data > leaves the disk. I can confirm scenarios like this. Some years ago, I moved a couple TB from one machine to another and was paranoid enought to individually checksum the files and discovered a few that weren't right. Since both the source and destination disks were LUKS, I could immediately rule out large swaths of the disk subsystems, since that would have lead to entire blocks of corruption due to the operation of the cipher, whereas the errors I found were a handful of incorrect bits in each case. I narrowed this down reasonably quickly to the source machine getting rare and data-dependent read errors from its RAM, but -only- when the machine had automatically reduced its CPU clock rate because it was unloaded. If I nailed the CPU at 100% in some other process, the RAM errors went away, as they did if I simply disabled CPU clock rate adjustment at all. _______________________________________________ dm-crypt mailing list dm-crypt@xxxxxxxx http://www.saout.de/mailman/listinfo/dm-crypt