+ mm-vmscan-get-number-of-pages-on-the-lru-list-in-memcgroup-base-on-lru_zone_size.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
has been added to the -mm tree.  Its filename is
     mm-vmscan-get-number-of-pages-on-the-lru-list-in-memcgroup-base-on-lru_zone_size.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-vmscan-get-number-of-pages-on-the-lru-list-in-memcgroup-base-on-lru_zone_size.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-vmscan-get-number-of-pages-on-the-lru-list-in-memcgroup-base-on-lru_zone_size.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Honglei Wang <honglei.wang@xxxxxxxxxx>
Subject: mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size

lruvec_lru_size() is invokving lruvec_page_state_local() to get the
lru_size.  It's base on lruvec_stat_local.count[] of mem_cgroup_per_node. 
This counter is updated in a batched way.  It won't be charged if the
number of incoming pages doesn't meet the needs of MEMCG_CHARGE_BATCH
which is defined as 32.

The testcase in LTP madvise09[1] fails because small blocks of memory are
not charged.  It creates a new memcgroup and sets up 32 MADV_FREE pages. 
Then it forks a child who will introduce memory pressure in the memcgroup.
The MADV_FREE pages are expected to be released under the pressure, but
32 is not more than MEMCG_CHARGE_BATCH and these pages won't be charged in
lruvec_stat_local.count[] until some more pages come in to satisfy the
needs of batch charging.  So these MADV_FREE pages can't be freed in
memory pressure which is a bit conflicted with the definition of
MADV_FREE.

Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is not
updated via batching can making it more accurate in this scenario.

This is effectively a partial reversion of 1a61ab8038e72 ("mm: memcontrol:
replace zone summing with lruvec_page_state()").

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c

Tim said:

: We were running a database benchmark in mem cgroup and found that
: lruvec_lru_size is taking up a huge chunk of CPU cycles (about 25% of our
: kernel time - bout 7% of total cpu cycles) on 5.3 kernel.
: 
: The main issue is the loop in lruvec_page_state_local called by
: lruvec_lru_size in the mem cgroup path:
: 
: for_each_possible_cpu(cpu) x += per_cpu(pn->lruvec_stat_local->count[idx],
: cpu); It is costly looping through all the cpus to get the lru vec size
: info.  And doing this on our workload with 96 cpu threads and 500 mem
: cgroups makes things much worse.  We might end up having 96 cpus * 500
: cgroups * 2 (main) LRUs pagevecs, which is a lot of data structures to be
: running through all the time.
: 
: Hongwei's patch restores the previous method for computing lru_size and is
: much more efficient in getting the lru_size.  We got a 20% throughput
: improvement in our database benchmark with Hongwei's patch, and
: lruvec_lru_size's cpu overhead completely disappeared from the cpu
: profile.

Link: http://lkml.kernel.org/r/20190905071034.16822-1-honglei.wang@xxxxxxxxxx
Fixes: 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <honglei.wang@xxxxxxxxxx>
Reported-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Roman Gushchin <guro@xxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/vmscan.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/mm/vmscan.c~mm-vmscan-get-number-of-pages-on-the-lru-list-in-memcgroup-base-on-lru_zone_size
+++ a/mm/vmscan.c
@@ -351,12 +351,13 @@ unsigned long zone_reclaimable_pages(str
  */
 unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
 {
-	unsigned long lru_size;
+	unsigned long lru_size = 0;
 	int zid;
 
-	if (!mem_cgroup_disabled())
-		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
-	else
+	if (!mem_cgroup_disabled()) {
+		for (zid = 0; zid < MAX_NR_ZONES; zid++)
+			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
+	} else
 		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
 
 	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
_

Patches currently in -mm which might be from honglei.wang@xxxxxxxxxx are

mm-vmscan-get-number-of-pages-on-the-lru-list-in-memcgroup-base-on-lru_zone_size.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux