put a lot of raid1 devices eheheh, but i don´t know if it´s a good idea... maybe... change your hardware and try another more mature filesystem maybe a solution of raid+lvm+filesystem is a better solution, with lvm you can online backup 2011/2/18 NeilBrown <neilb@xxxxxxx>: > On Thu, 17 Feb 2011 20:04:48 -0600 Steve Costaras <stevecs@xxxxxxxxxx> wrote: > >> >> >> I'm looking at alternatives to ZFS since it still has some time to go >> for large scale deployment as a kernel-level file system (and brtfs has >> years to go). I am running into problems with silent data corruption >> with large deployments of disks. Currently no hardware raid vendor >> supports T10 DIF (which even if supported would only work w/ SAS/FC >> drives anyway) nor does read parity checking. > > Maybe I'm just naive, but find it impossible to believe that "silent data > corruption" is ever acceptable. You should fix or replace your hardware. > > Yes, I know silent data corruption is theoretically possible at a very low > probability and that as you add more and more storage, that probability gets > higher and higher. > > But my point is that the probability of unfixable but detectable corruption > will ALWAYS be much (much much) higher than the probability of silent data > corruption (on a correctly working system). > > So if you are getting unfixable errors reported on some component, replace > that component. And if you aren't then ask your vender to replace the > system, because it is broken. > > >> >> I am hoping that either there is a way that I don't know of to enable >> mdadm to read the data plus p+q parity blocks for every request and >> compare them for accuracy (simlar to what you need to do for a scrub but >> /ALWAYS/) or have the functionality added as an option. > > No, it is not currently possible to do this, nor have I plan to implement > it. I guess it would be possible in theory though. > > NeilBrown > > >> >> With the current large capacity drives we have today getting bit errors >> is quite common (I have some scripts that I do complete file checks >> every two weeks across 50TB arrays and come up with errros every >> month). I'm looking at expanding to 200-300TB volumes shortly so the >> problem will only get that much more frequent. Being able to check >> the data against parity will be able to find/notify and correct errors >> at read time before they get to user space. This fixes bit rot as well >> as torn/wild reads/writes and mitigates transmission issues. >> >> I searched the list but couldn't find this benig discussed before, is >> this possible? >> >> Steve Costaras >> stevecs@xxxxxxxxxx >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html