On 25.07.2011 01:01, Andi Kleen wrote: >> I wasn't clear enough on that: We only track read errors, here. Ans >> error correction can only happen on the read path. So if the write >> attempt fails, we can't go into a loop. > > Not in a loop, but you trigger more IO errors, which can be nasty > if the IO error logging triggers more IO (pretty common because > syslogd calls fsync). And then your code does even more IO, floods > more etc.etc. And the user will be unhappy if their > console gets flooded. Okay, I see your point now. Thanks for pointing that out. > We've have a similar problems in the past with readahead causing > error flooding. > > Any time where an error can cause more IO you have to be extremly > careful. > > Right now this seems rather risky to me. Hum. This brings up a lot of questions. Would you consider throttling an appropriate solution to prevent error flooding? What would you use as a base? A per device counter (which might be misleading if there are more layers below)? A per filesystem counter (which might need configurability)? Should those "counters" regenerate over time? Any other approaches? -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html