On Wed, 2011-09-21 at 10:56 +1000, Dave Chinner wrote: > On Tue, Sep 20, 2011 at 08:56:55AM -0500, Chandra Seetharaman wrote: > > Ran the xfstests (auto) overnight and didn't see any new issues. > > Sure, but xfstests won't be triggering the new failure paths. I meant to convey that I did not introduce any regressions. > > It looks to me like any failure to get a buffer will now result in a > cancelled transaction and a filesystem shutdown - the new failure > paths really need to be tested to ensure that failures are handled > gracefully and don't result in filesystem corruption. > > As it is, I'm not sure we want to do this. The only reason we can > fail to get a buffer is allocation failures in extremely low memory > conditions. However, the last thing we want is for filesystem > shutdowns to be triggered by transient low memory conditions. > > The current state of the code is that the xfs_buf_get() code tries > really, really hard to allocate memory, and we don't have any > evidence to point to the fact that is it failing to allocate memory. > We'd be seeing asserts firing and/or NULL pointer deref panics if > xfs_buf_get() was failing, and neither of these are happening. I am really confused. Are you saying we should rather let the kernel reference a null pointer and panic than making the file system shutdown gracefully ?! I do agree that we may not be seeing any buffer allocation failures (I was very surprised to see allocations not being checked in such a mature code). Under the very same token, we will not exercise the failure paths these patches introduce) and hence will not get into file system shutdown state. No ? Just in case when the buffer allocation do fail, the changes in these patches would prevent a kernel panic and lead to a graceful file system shutdown. Isn't that a better option ? > > As it is, before we can gracefully handle memory allocation failures > in the xfs_buf layer, we need to be able to roll back dirty > transactions so that memory allocation failure does not result > filesystem shutdowns. That's actually possible to do now with the > delayed logging infrastructure (because the CIL keeps a copy of the > previous in memory modifications prior to the bad transaction), so > we should look towards implementing transaction rollback first > before allowing memory allocation to fail inside transaction > contexts.... > > Cheers, > > Dave. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs