2 things about your comments: 1. You said: "no one should be using md in an RT-critical application" I am sorry to hear that! What do recommend? Windows 2000 maybe? 2. You said: "but the md-level approach might be better. But I'm not sure I see the point of it---unless you have raid 6 with multiple parity blocks, if a disk actually has the wrong information recorded on it I don't think you can detect which drive is bad, just that one of them is." If there is a parity block that does not match the data, true you do not know which device has the wrong data. However, if you do not "correct" the parity, when a device fails, it will be constructed differently than it was before it failed. This will just cause more corrupt data. The parity must be made consistent with whatever data is on the data blocks to prevent this corrosion of data. With RAID6 it should be possible to determine which block is wrong. It would be a pain in the @$$, but I think it would be doable. I will explain my theory if someone asks. Guy -----Original Message----- From: Bruce Lowekamp [mailto:brucelowekamp@xxxxxxxxx] Sent: Wednesday, November 17, 2004 4:58 PM To: Neil Brown Cc: Guy Watkins; linux-raid@xxxxxxxxxxxxxxx Subject: Re: Bad blocks are killing us! 2: Thanks for devoting the time for getting this done. Personally, for the PATA arrays I use, this approach is a bit overkill---if the rewrite succeeds, it's ok (unless I start to see repeated errors, in which case I yank the drive), if the rewrite doesn't succeed, it's dead and I have to yank the drive. I don't have any useful diagnostic tools at linux user-level other than smart badblocks scans, which would just confirm the bad sectors. Personally, I wouldn't go to the effort to keep (parts of) the drive in the array if it can't be rewritten successfully---I've never seen a drive last long in that situation, and I think that drive is really dead. The only problems I've had in practice have been with mutliple accumulated read errors---and rewriting those would make them go away quickly. I would just want the data rewritten at user level, and log the event so I can monitor the array for failures and look at the smart output or take a drive offline for testing (with vendor diag tools) if it starts to have frequent errors. Naturally, as long as the more complex approach of kicking to user level allows the user-level to return immediately to let the kernel rewrite the stripe, I think it's fine. I agree that writing several megabytes is not an issue in any way. IMHO, feel free to hang the whole system for a few seconds if necessary---no one should be using md in an RT-critical application, and bad blocks are relatively rare. 3: The data scans is an interesting idea. Right now I run daily smart short scans and weekly smart long scans to try to catch any bad blocks before I get multiple errors. Assuming there aren't any uncaught CRC errors, I feel comfortable with that approach, but the md-level approach might be better. But I'm not sure I see the point of it---unless you have raid 6 with multiple parity blocks, if a disk actually has the wrong information recorded on it I don't think you can detect which drive is bad, just that one of them is. So I don't think you gain anything beyond what a standard smart long scan or just cat'ing the raw device would give you in terms of forcing the whole drive to be read. Bruce On Tue, 16 Nov 2004 09:27:17 +1100, Neil Brown <neilb@xxxxxxxxxxxxxxx> wrote: > 2/ Look at recovering from failed reads that can be fixed by a > write. I am considering leveraging the "bitmap resync" stuff for > this. With the bitmap stuff in place, you can let the kernel kick > out a drive that has a read error, let user-space have a quick > look at the drive and see if it might be a recoverable error, and > then give the drive back to the kernel. It will then do a partial > resync based on the bitmap information, thus writing the bad > blocks, and all should be fine. This would mean re-writing > several megabytes instead of a few sectors, but I don't think that > is a big cost. There are a few issues that make it a bit less > trivial than that, but it will probably be my starting point. > The new "faulty" personality will allow this to be tested easily. -- Bruce Lowekamp (lowekamp@xxxxxxxxx) Computer Science Dept, College of William and Mary - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html