Re: [PATCH v9 06/14] mm: multi-gen LRU: minimal implementation

"Huang, Ying" <ying.huang@xxxxxxxxx> · Wed, 16 Mar 2022 13:55:12 +0800

Hi, Yu,

Yu Zhao <yuzhao@xxxxxxxxxx> writes:

[snip]

>  
> +static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc)
> +{
> +	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> +	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> +
> +	if (!can_demote(pgdat->node_id, sc) &&
> +	    mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
> +		return 0;
> +
> +	return mem_cgroup_swappiness(memcg);
> +}
> +

We have tested v9 for memory tiering system, the demotion works now even
without swap devices configured.  Thanks!

And we found that the demotion (page reclaiming on DRAM nodes) speed is
lower than the original implementation.  The workload itself is just a
memory accessing micro-benchmark with Gauss distribution.  It is run on
a system with DRAM and PMEM.  Initially, quite some hot pages are placed
in PMEM and quite some cold pages are placed in DRAM.  Then the page
placement optimizing mechanism based on NUMA balancing will try to
promote some hot pages from PMEM node to DRAM node.  If the DRAM node
near full (reach high watermark), kswapd of the DRAM node will be woke
up to demote (reclaim) some cold DRAM pages to PMEM.  Because quite some
pages on DRAM is very cold (not accessed for at least several seconds),
the benchmark performance will be better if demotion speed is faster.

Some data comes from /proc/vmstat and perf-profile is as follows.