+ mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch added to mm-hotfixes-unstable branch

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Fri, 18 Nov 2022 14:10:08 -0800

The patch titled
     Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
has been added to the -mm mm-hotfixes-unstable branch.  Its filename is
     mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch

This patch will later appear in the mm-hotfixes-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx>
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
Date: Fri, 18 Nov 2022 12:36:03 +0530

balance_dirty_pages doesn't do the required dirty throttling on cgroupv1. 
See commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
on traditional hierarchies").  Instead, the kernel depends on writeback
throttling in shrink_folio_list to achieve the same goal.  With large
memory systems, the flusher may not be able to writeback quickly enough
such that we will start finding pages in the shrink_folio_list already in
writeback.  Hence for cgroupv1 let's do a reclaim throttle after waking up
the flusher.

The below test which used to fail on a 256GB system completes till the the
file system is full with this change.

root@lp2:/sys/fs/cgroup/memory# mkdir test
root@lp2:/sys/fs/cgroup/memory# cd test/
root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes
root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks
root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M
Killed

Link: https://lkml.kernel.org/r/20221118070603.84081-1-aneesh.kumar@xxxxxxxxxxxxx
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx>
Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: zefan li <lizefan.x@xxxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/vmscan.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

--- a/mm/vmscan.c~mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1
+++ a/mm/vmscan.c
@@ -2514,8 +2514,20 @@ static unsigned long shrink_inactive_lis
 	 * the flushers simply cannot keep up with the allocation
 	 * rate. Nudge the flusher threads in case they are asleep.
 	 */
-	if (stat.nr_unqueued_dirty == nr_taken)
+	if (stat.nr_unqueued_dirty == nr_taken) {
 		wakeup_flusher_threads(WB_REASON_VMSCAN);
+		/*
+		 * For cgroupv1 dirty throttling is achieved by waking up
+		 * the kernel flusher here and later waiting on folios
+		 * which are in writeback to finish (see shrink_folio_list()).
+		 *
+		 * Flusher may not be able to issue writeback quickly
+		 * enough for cgroupv1 writeback throttling to work
+		 * on a large system.
+		 */
+		if (!writeback_throttling_sane(sc))
+			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+	}
 
 	sc->nr.dirty += stat.nr_dirty;
 	sc->nr.congested += stat.nr_congested;
_

Patches currently in -mm which might be from aneesh.kumar@xxxxxxxxxxxxx are

mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch