On Thu, May 26, 2011 at 8:08 AM, Ted Ts'o <tytso@xxxxxxx> wrote: > On Thu, May 26, 2011 at 04:49:56PM +0200, Jan Kara wrote: >> No need to do this. If you make JBD2 use a separate slab for transaction >> structures (trivial and makes some sense anyway), you can use >> fault-injection framework to do exactly what you describe above (see >> Documentation/fault-injection/fault-injection.txt and look for failslab). > > Thanks for pointing me at the fault-injection framework; it's not > something I've used before. I'll have to take a look at it. > >> But if we just fail all transaction allocations with say 10% probability, >> it should work as well, shouldn't it? We'd just retry those allocations >> whose failure we cannot handle and eventually succeed. Or do I miss >> something? > > The reason why I only wanted to fail the transactions relating to the > writeback path is because other failures will get reflected back to > userspace, and would thus change the behavior of the stress test. (If > we used fsstress, it would cause fsstress to immediately stop and > fail, for example.) > > That is the one thing that worries me a little about this patch series > in general. If we suddenly start failing open() or rename() or > chmod() syscalls with ENOMEM in low memory situations, what of > programs that aren't doing adequate error checking? Sure, other file > systems will do this, but the bulk of the users use ext3/ext4, and > remember how much kvetching and complaining when xfs was the first > file system to require user space applications to actually use fsync() > if they wanted their files to be safe after a power failure. > > I worry that there are a lot of incompetently written editors out > there that aren't doing error checking, or worse yet, package managers > or other security-critical programs that aren't doing error checking, > and which won't notice when an syscall fails in a low-memory > situation, leading to either (a) user data loss (which the application > programers will lay at the feet of the file system developers, don't > doubt it), or (b) security holes. > > I'm not sure there's a way to address this concern, and I'm going not > NACK'ing this patch series on that basis --- but I do worry that it > might not improve the situation by a whole lot, and may in fact cause > some problems, at the end of the day. If there are applications which don't expect this behaviour or can't handle such cases, we can always add a sysfs flag to turn of this behavior completely and always retry like old behavior. This may be ugly, but would help till ppl start realising this behavior and writing applications appropriately. -- Thanks - Manish -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html