David Greaves <david@xxxxxxxxxxxx> wrote: > >(If you want to say "but the fs is journalled", then consider what if > >the write is to the journal ...). > Hmm. > In neither case would a journalling filesystem be corrupted. A joournalled file system is always _consistent_. That does no mean it is correct! > The md driver (somehow) gets to decide which half of the mirror is 'best'. Yep - and which is correct? > If the journal uses the fully written half of the mirror then it's replayed. > If the journal uses the partially written half of the mirror then it's > not replayed. Which is correct? > It's just the same as powering off a normal non-resilient device. Well, I see what you mean - yes, it is the same in terms of the total event space. It's just that with a single disk, the possible outcomes are randomized only over time, as you repeat the experiment. Here you have randomization of outcomes over space as well, depending on which disk you test (or how you interleave the test across the disks). And the question remains - which outcome is correct? Well, I'll answer that. Assuming that the fs layer is only notified when BOTH journal writes have happened, and tcp signals can be sent off-machine or something like that, then the correct result is the rollback, not the completion, as the world does not expect there to have been a completion given the data it has got. It's as I said. One always wants to rollback. So one doesn't want the journal to bother with data at all. > (Is your point here back to the failure to guarantee write ordering? I > thought Neil answered that?) I don't see what that has to do with anything (Neil said that write ordering is not preserved, but that writes are not acked until they have occurred - which would allow write order to be preserved if you were interested in doing so; you simply have to choose "synchronous write"). > >No. I made no such assumption. I don't know or care what you do with a > >detectable error. I only say that whatever your test is, it detects it! > >IF it looks at the right spot, of course. And on raid the chances of > >doing that are halved, because it has to choose which disk to read. > I did when I defined detectable.... tentative definitions: > detectable = noticed by normal OS I/O. ie CRC sector failure etc > undetectable = noticed by special analysis (fsck, md5sum verification etc) A detectable error is one you detect with whatever your test is. If your test is fsck, then that's the kind of error that is detected by the detection that you do ... the only condition I imposed for the analysis was that the test be conducted on the raid array, not on its underlying components. > And a detectable error occurs on the underlying non-raid device - so the > chances are not halved since we're talking about write errors which go > to both disks. Detectable read errors are retried until they succeed - > if they fail then I submit that a "write (or after)" corruption occured. I don't understand you here - you seem to be confusing hardware mechanisms with ACTUAL errors/outcomes. It is the business of your hardware to do something for you: how and what it does is immaterial to the analysis. The question is whether that something ends up being CORRECT or INCORRECT, in terms of YOUR wishes. Whether the hardware consisders something an error or not and what it does about it is immaterial here. It may go back in time and ask your grandmother what is your favorite colour, as far as I care - all that is important is what ENDS UP on the disk, and whether YOU consider that an error or not. So you are on some wild goose chase of your own here, I am afraid! > It also occurs to me that undetectable errors are likely to be temporary You are again on a trip of your own :( undetectable errors are errors you cannot detect with your test, and that is all! There is no implication. > - nothing's broken but a bit flipped during the write/store process (or > the power went before it hit the media). Detectable errors are more > likely to be permanent (since most detection algorithms probably have a > retry). I think that for some reason you are considering that a test (a detection test) is carried out at every moment of time. No. Only ONE test is ever carried out. It is the test you apply when you do the observation: the experiment you run decides at that single point wether the disk (the raid array) has errors or not. In practical terms, you do it usualy when you boot the raid array, and run fsck on its file system. OK? You simply leave an experiment running for a while (leave the array up, let monkeys play on it, etc.) and then you test it. That test detects some errors. However, there are two types of errors - those you can detect with your test, and those you cannot detect. My analysis simply gave the probabilities for those on the array, in terms of basic parameters for the probabilities per an individual disk. I really do not see why people make such a fuss about this! > >>However, we need to carry out risk analysis to decide if the increase in > >>susceptibility to certain kinds of corruption (cosmic rays) is > >> > > > >Ahh. Yes you do. No I don't! This is your own invention, and I said no > >such thing. By "errors", I meant anything at all that you consider to be > >an error. It's up to you. And I see no reason to restrict the term to > >what is produced by something like "cosmic rays". "People hitting the > >off switch at the wrong time" counts just as much, as far as I know. > > > > > You're talking about causes - I'm talking about classes of error. No, I'm talking about classes of error! You're talking about causes. :) > > Hitting the power off switch doesn't cause a physical failure - it > causes inconsistency in the data. I don't understand you - it causes errors just like cosmic rays do (and we can even set out and describe the mechanisms involved). The word "failure" is meaningless to me here. > >I would guess that you are trying to classify errors by the way their > >probabilities scale with number of disks. > > > Nope - detectable vs undetectable. Then what's the problem? An undetectable error is one you cannot detect via your test. Those scale with real estate. A detectible error is one you can spot with your test (on the array, not its components). The missed detectible errors scale as n-1, where n is the number of disks in the array. Thus a single disk suffers from no missed detectible errors, and a 2-disk raid array does. That's all. No fuss, no muss! > Also, it strikes me that raid can actually find undetectable errors by > doing a bit-comparison scan. No, it can't, by definition. Undetectible errors are undetectible. If you change your test, you change the class of errors that are undetectible. That's all. > Non-resilient devices with only one copy of each bit can't do that. > raid 6 could even fix undetectable errors. Then they are not "undetectible". The analisis in not affected by your changing the definition of what is in the undetectible class of error and what is not. It stands. I have made no assumption at all on what they are. I simply pointed out how the probabilities scale for a raid array. Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html