[PATCH] writeback: Avoid exhausting allocation reserves under memory pressure

Jan Kara <jack@xxxxxxx> · Thu, 5 May 2016 10:14:52 +0200

When system is under memory pressure memory management frequently calls
wakeup_flusher_threads() to writeback pages to that they can be freed.
This was observed to exhaust reserves for atomic allocations since
wakeup_flusher_threads() allocates one writeback work for each device
with dirty data with GFP_ATOMIC.

However it is pointless to allocate new work items when requested work
is identical. Instead, we can merge the new work with the pending work
items and thus save memory allocation.

Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Jan Kara <jack@xxxxxxx>
---
 fs/fs-writeback.c                | 37 +++++++++++++++++++++++++++++++++++++
 include/trace/events/writeback.h |  1 +
 2 files changed, 38 insertions(+)

This is a patch which should (and in my basic testing does) address the issues
with many atomic allocations Tetsuo reported. What do people think?

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index fee81e8768c9..bb6725f5b1ba 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -189,6 +189,35 @@ out_unlock:
 	spin_unlock_bh(&wb->work_lock);
 }
 
+/*
+ * Check whether the request to writeback some pages can be merged with some
+ * other request which is already pending. If yes, merge it and return true.
+ * If no, return false.
+ */
+static bool wb_merge_request(struct bdi_writeback *wb, long nr_pages,
+			     struct super_block *sb, bool range_cyclic,
+			     enum wb_reason reason)
+{
+	struct wb_writeback_work *work;
+	bool merged = false;
+
+	spin_lock_bh(&wb->work_lock);
+	list_for_each_entry(work, &wb->work_list, list) {
+		if (work->reason == reason &&
+		    work->range_cyclic == range_cyclic &&
+		    work->auto_free == 1 && work->sb == sb &&
+		    work->for_sync == 0) {
+			work->nr_pages += nr_pages;
+			merged = true;
+			trace_writeback_merged(wb, work);
+			break;
+		}
+	}
+	spin_unlock_bh(&wb->work_lock);
+
+	return merged;
+}
+
 /**
  * wb_wait_for_completion - wait for completion of bdi_writeback_works
  * @bdi: bdi work items were issued to
@@ -928,6 +957,14 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
 		return;
 
 	/*
+	 * Can we merge current request with another pending one - saves us
+	 * atomic allocation which can be significant e.g. when MM is under
+	 * pressure and calls wake_up_flusher_threads() a lot.
+	 */
+	if (wb_merge_request(wb, nr_pages, NULL, range_cyclic, reason))
+		return;
+
+	/*
 	 * This is WB_SYNC_NONE writeback, so if allocation fails just
 	 * wakeup the thread for old dirty data writeback
 	 */
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 73614ce1d204..84ad9fac475b 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -252,6 +252,7 @@ DEFINE_WRITEBACK_WORK_EVENT(writeback_exec);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_start);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_written);
 DEFINE_WRITEBACK_WORK_EVENT(writeback_wait);
+DEFINE_WRITEBACK_WORK_EVENT(writeback_merged);
 
 TRACE_EVENT(writeback_pages_written,
 	TP_PROTO(long pages_written),
-- 
2.6.6

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html