Theodore Ts'o <tytso@xxxxxxx> writes: > > It's actually pretty easy to test this particular one, Note the error can happen at any time. > and certainly > one of the things I'd strongly encourage in this patch series is the > introduction of an interface via madvise It already exists of course. I would suggest to study the existing framework before more suggestions. > simulate an ECC hard error event. So I don't think "it's hard to > test" is a reason not to do the right thing. Let's make it easy to What you can't test doesn't work. It's that simple. And memory error handling is extremly hard to test. The errors can happen at any time. It's not a well defined event. There are test suites for it of course (mce-test, mce-inject[1]), but they needed a lot of engineering effort to be at where they are. [1] despite the best efforts of some current RAS developers at breaking it. > Note that the problem that we're dealing with is buffered writes; so > it's quite possible that the process which wrote the file, thus > dirtying the page cache, has already exited; so there's no way we can > guarantee we can inform the process which wrote the file via a signal > or a error code return. Is that any different from other IO errors? It doesn't need to be better. > Also, if you're going to keep this state in memory, what happens if > the inode gets pushed out of memory? You lose the error, just like you do today with any other IO error. We had a lot of discussions on this when the memory error handling was originally introduced, that was the conclusuion. I don't think a special panic knob for this makes sense either. We already have multiple panic knobs for memory errors, that can be used. -Andi -- ak@xxxxxxxxxxxxxxx -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html