} -----Original Message----- } From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- } owner@xxxxxxxxxxxxxxx] On Behalf Of David Lethe } Sent: Saturday, May 17, 2008 3:10 PM } To: LinuxRaid; linux-kernel@xxxxxxxxxxxxxxx } Subject: Mechanism to safely force repair of single md stripe w/o hurting } data integrity of file system } } I'm trying to figure out a mechanism to safely repair a stripe of data } when I know a particular disk has a unrecoverable read error at a } certain physical block (for 2.6 kernels) } } My original plan was to figure out the range of blocks in md device that } utilizes the known bad block and force a raw read on physical device } that covers the entire chunk and let the md driver do all of the work. } } Well, this didn't pan out. Problems include issues where if bad block } maps to the parity block in a stripe then md won't necessarily } read/verify parity, and in cases where you are running RAID1, then load } balancing might result in the kernel reading the bad block from the good } disk. } } So the degree of difficulty is much higher than I expected. I prefer } not to patch kernels due to maintenance issues as well as desire for the } technique to work across numerous kernels and patch revisions, and } frankly, the odds are I would screw it up. An application-level program } that can be invoked as necessary would be ideal. } } As such, anybody up to the challenge of writing the code? I want it } enough to paypal somebody $500 who can write it, and will gladly open } source the solution. } } (And to clarify why, I know physical block x on disk y is bad before the } O/S reads the block, and just want to rebuild the stripe, not the entire } md device when this happens. I must not compromise any file system data, } cached or non-cached that is built on the md device. I have system with } >100TB and if I did a rebuild every time I discovered a bad block } somewhere, then a full parity repair would never complete before another } physical bad block is discovered.) } } Contact me offline for the financial details, but I would certainly } appreciate some thread discussion on an appropriate architecture. At } least it is my opinion that such capability should eventually be native } Linux, but as long as there is a program that can be run on demand that } doesn't require rebuilding or patching kernels then that is all I need. } } David @ santools.com I thought this would cause md to read all blocks in an array: echo repair > /sys/block/md0/md/sync_action And rewrite any blocks that can't be read. In the old days, md would kick out a disk on a read error. When you added it back, md would rewrite everything on that disk, which corrected read errors. Guy -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html