On Wed, 2011-09-21 at 10:56 +1000, Dave Chinner wrote: > On Tue, Sep 20, 2011 at 08:56:55AM -0500, Chandra Seetharaman wrote: > > Ran the xfstests (auto) overnight and didn't see any new issues. > > Sure, but xfstests won't be triggering the new failure paths. > > It looks to me like any failure to get a buffer will now result in a > cancelled transaction and a filesystem shutdown - the new failure > paths really need to be tested to ensure that failures are handled > gracefully and don't result in filesystem corruption. This is true. > As it is, I'm not sure we want to do this. The only reason we can > fail to get a buffer is allocation failures in extremely low memory > conditions. However, the last thing we want is for filesystem > shutdowns to be triggered by transient low memory conditions. But a failure to get a buffer, not checked, can't be good can it? In other words, the patch now adds handling for a xfs_buf_get() failure, which avoids a kernel-mode null pointer dereference in the off chance that would happen. That's worse than a filesystem shutdown. Now I grant you your earlier statement, namely that it's conceivable that the new error return could lead to a previously un-exercised path that leads to file system corruption. But I do believe that all these places handle *an* error (not the specific ENOMEM error), so those spots already are generally prepared for something to go wrong. > The current state of the code is that the xfs_buf_get() code tries > really, really hard to allocate memory, and we don't have any > evidence to point to the fact that is it failing to allocate memory. > We'd be seeing asserts firing and/or NULL pointer deref panics if > xfs_buf_get() was failing, and neither of these are happening. > > As it is, before we can gracefully handle memory allocation failures > in the xfs_buf layer, we need to be able to roll back dirty > transactions so that memory allocation failure does not result > filesystem shutdowns. That's actually possible to do now with the > delayed logging infrastructure (because the CIL keeps a copy of the > previous in memory modifications prior to the bad transaction), so > we should look towards implementing transaction rollback first > before allowing memory allocation to fail inside transaction > contexts.... Now that's a great thing to do--use the CIL to facilitate rolling back dirty transactions. Very cool. But this patch doesn't "allow" memory allocation to fail, it simply avoids a sudden panic if it ever did (which you point out is only slightly less likely than impossible). I didn't anticipate this, and the patch has already been committed. I need to know whether people think this is critical enough for me to revert the patch. -Alex _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs