Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 10/7/19 10:28 PM, Michal Hocko wrote:
On Thu 05-09-19 15:10:34, Honglei Wang wrote:
lruvec_lru_size() is involving lruvec_page_state_local() to get the
lru_size in the current code. It's base on lruvec_stat_local.count[]
of mem_cgroup_per_node. This counter is updated in batch. It won't
do charge if the number of coming pages doesn't meet the needs of
MEMCG_CHARGE_BATCH who's defined as 32 now.

The testcase in LTP madvise09[1] fails due to small block memory is
not charged. It creates a new memcgroup and sets up 32 MADV_FREE
pages. Then it forks child who will introduce memory pressure in the
memcgroup. The MADV_FREE pages are expected to be released under the
pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages
won't be charged in lruvec_stat_local.count[] until some more pages
come in to satisfy the needs of batch charging. So these MADV_FREE
pages can't be freed in memory pressure which is a bit conflicted
with the definition of MADV_FREE.

The test case is simly wrong. The caching and the batch size is an
internal implementation detail. Moreover MADV_FREE is a _hint_ so all
you can say is that those pages will get freed at some point in time but
you cannot make any assumptions about when that moment happens.


This is a corner case, it makes extremely memory pressure which give the group no chance to satisfy the batch operation. There might be small chance to hit such problem in real workload -- 128K memory is really small in current amount of memory usage. I know exactly what you mean. The batch size is internal implementation detail, this *test case* just happen hit it in black box.

Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
is not updated in batch can make it a bit more accurate in similar
scenario.

What does that mean? It would be more helpful to describe the code path
which will use this more precise value and what is the effect of that.


How about we describe it like this:

Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is not updated via batching can help any related code path get more precise lru size in mem_cgroup case. This makes memory reclaim code won't ignore small blocks of memory(say, less than MEMCG_CHARGE_BATCH pages) in the lru list.

For this specific MADV_FREE page case, more precise lru size helps release the pages less than 32 as expected.

Thanks,
Honglei

As I've said in the previous version, I do not object to the patch
because a more precise lruvec_lru_size sounds like a nice thing as long
as we are not paying a high price for that. Just look at the global case
for mem_cgroup_disabled(). It uses node_page_state and that one is using
per-cpu accounting with regular global value refreshing IIRC.

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c

Signed-off-by: Honglei Wang <honglei.wang@xxxxxxxxxx>
---
  mm/vmscan.c | 9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c77d1e3761a7..c28672460868 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
   */
  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
  {
-	unsigned long lru_size;
+	unsigned long lru_size = 0;
  	int zid;
- if (!mem_cgroup_disabled())
-		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
-	else
+	if (!mem_cgroup_disabled()) {
+		for (zid = 0; zid < MAX_NR_ZONES; zid++)
+			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
+	} else
  		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
--
2.17.0





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux