On Wed, Mar 15, 2017 at 12:59:33PM +0100, Jan Kara wrote: > > + while (1) { > > + if (mapping->a_ops->writepages) > > + ret = mapping->a_ops->writepages(mapping, wbc); > > + else > > + ret = generic_writepages(mapping, wbc); > > + if ((ret != ENOMEM) || (wbc->sync_mode != WB_SYNC_ALL)) > > -ENOMEM I guess... Oops. Thanks for noticing! Unless anyone has any objections I plan to carry this in the ext4 tree. - Ted >From 063312672cf277b12e337e91309672499bc797f7 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o <tytso@xxxxxxx> Date: Tue, 14 Mar 2017 21:13:04 -0400 Subject: [PATCH -v2] mm: retry writepages() on ENOMEM when doing an data integrity writeback Currently, file system's writepages() function must not fail with an ENOMEM, since if they do, it's possible for buffered data to be lost. This is because on a data integrity writeback writepages() gets called but once, and if it returns ENOMEM, if you're lucky the error will get reflected back to the userspace process calling fsync(). If you aren't lucky, the user is unmounting the file system, and the dirty pages will simply be lost. For this reason, file system code generally will use GFP_NOFS, and in some cases, will retry the allocation in a loop, on the theory that "kernel livelocks are temporary; data loss is forever". Unfortunately, this can indeed cause livelocks, since inside the writepages() call, the file system is holding various mutexes, and these mutexes may prevent the OOM killer from killing its targetted victim if it is also holding on to those mutexes. A better solution would be to allow writepages() to call the memory allocator with flags that give greater latitude to the allocator to fail, and then release its locks and return ENOMEM, and in the case of background writeback, the writes can be retried at a later time. In the case of data-integrity writeback retry after waiting a brief amount of time. Signed-off-by: Theodore Ts'o <tytso@xxxxxxx> --- mm/page-writeback.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 290e8b7d3181..c623cef68a8e 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2352,10 +2352,16 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc) if (wbc->nr_to_write <= 0) return 0; - if (mapping->a_ops->writepages) - ret = mapping->a_ops->writepages(mapping, wbc); - else - ret = generic_writepages(mapping, wbc); + while (1) { + if (mapping->a_ops->writepages) + ret = mapping->a_ops->writepages(mapping, wbc); + else + ret = generic_writepages(mapping, wbc); + if ((ret != -ENOMEM) || (wbc->sync_mode != WB_SYNC_ALL)) + break; + cond_resched(); + congestion_wait(BLK_RW_ASYNC, HZ/50); + } return ret; } -- 2.11.0.rc0.7.gbe5a750 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>