On Wed 15-03-17 01:07:43, Ted Tso wrote: > Currently, file system's writepages() function must not fail with an > ENOMEM, since if they do, it's possible for buffered data to be lost. > This is because on a data integrity writeback writepages() gets called > but once, and if it returns ENOMEM and you're lucky the error will get > reflected back to the userspace process calling fsync() --- at which > point the application may or may not be properly checking error codes. > If you aren't lucky, the user is unmounting the file system, and the > dirty pages will simply be lost. > > For this reason, file system code generally will use GFP_NOFS, and in > some cases, will retry the allocation in a loop, on the theory that > "kernel livelocks are temporary; data loss is forever". > Unfortunately, this can indeed cause livelocks, since inside the > writepages() call, the file system is holding various mutexes, and > these mutexes may prevent the OOM killer from killing its targetted > victim if it is also holding on to those mutexes. > > A better solution would be to allow writepages() to call the memory > allocator with flags that give greater latitude to the allocator to > fail, and then release its locks and return ENOMEM, and in the case of > background writeback, the writes can be retried at a later time. In > the case of data-integrity writeback retry after waiting a brief > amount of time. > > Signed-off-by: Theodore Ts'o <tytso@xxxxxxx> > --- > > As we had discussed in an e-mail thread last week, I'm interested in > allowing ext4_writepages() to return ENOMEM without causing dirty > pages from buffered writes getting list. It looks like doing so > should be fairly straightforward. What do folks think? Makes sense to me. One comment below: > + while (1) { > + if (mapping->a_ops->writepages) > + ret = mapping->a_ops->writepages(mapping, wbc); > + else > + ret = generic_writepages(mapping, wbc); > + if ((ret != ENOMEM) || (wbc->sync_mode != WB_SYNC_ALL)) -ENOMEM I guess... > + break; > + cond_resched(); > + congestion_wait(BLK_RW_ASYNC, HZ/50); > + } > return ret; > } Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>