On Fri Mar 21, 2008 at 01:13:48PM -0400, Theodore Tso wrote: > On Fri, Mar 21, 2008 at 03:52:31PM +0100, Peter Rabbitson wrote: > > I was actually specifically advocating that md must _not_ do anything on > > its own. Just provide the hooks to get information (what is the current > > stripe state) and update information (the described repair extension). The > > logic that you are describing can live only in an external app, it has no > > place in-kernel. > > Why not? If md doesn't do anything on its own, then when it detects a > disagreement between the data and the two parity blocks, it has two > choices (a) return possibly incorrect data to the application, or (b) > return an I/O error and cause the application to blow up. > > Sure, it could then give the information so that the external repair > tool can fix it up after the fact, but that seems like a really lousy > thing to do as far as the original application is concerned. (Or I > suppose you could try to block the userspace application until the > repair tool has a chance to do automatically what md could have done > automatically in the kernel anyway, but that has other problems.) > > So what's the harm in having an option where md does exactly what ECC > memory does, which is when it can fix things up, to do so? I bet most > system administrators would turn it on in a heartbeat. > Depends on how you look at things. ECC memory is designed to deal with occasional mismatches caused by such obscure and rare events as cosmic radiation. RAID subsytems, on the other hand, are designed to deal with catastrophic failures of one (or more) drives. There's no trivially explaianable reason why a drive would sporadically suffer from incorrect data reading/writing (unlike with ECC memory) so there's no recovery case. Admittedly, it would be possible to do this, but that would mean adding an extra read penalty on every RAID read (and, in some situations, throwing away the advantages of parallelism) in order to cover the exceptionally rare case where a drive has (for unknown reason) written the wrong data. Personally, this would be an option I'd avoid like the plague. If I know there's an issue then I replace the hardware, otherwise I expect the system to work as fast as possible in the assumption that all is correct. Admittedly, a check/repair option to view/select how the blocks are recovered might be useful, but I'd also see this sitting well outside the md code. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@xxxxxxxxxxxxxxx> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" |
Attachment:
pgpLOq9yXH8V9.pgp
Description: PGP signature