[no subject]

**Date** **Thread**

    The used-once mapped file page detection patchset.

    It is meant to help workloads with large amounts of shortly used file
    mappings, like rtorrent hashing a file or git when dealing with loose
    objects (git gc on a bigger site?).

    Right now, the VM activates referenced mapped file pages on first
    encounter on the inactive list and it takes a full memory cycle to
    reclaim them again.  When those pages dominate memory, the system
    no longer has a meaningful notion of 'working set' and is required
    to give up the active list to make reclaim progress.  Obviously,
    this results in rather bad scanning latencies and the wrong pages
    being reclaimed.

    This patch makes the VM be more careful about activating mapped file
    pages in the first place.  The minimum granted lifetime without
    another memory access becomes an inactive list cycle instead of the
    full memory cycle, which is more natural given the mentioned loads.

Translating this to multigen, it seems fresh faults should really
start on the second oldest rather than on the youngest generation, to
get a second chance but without jeopardizing the workingset if they
don't take it.

> +	 * 2) If it can't be evicted immediately, i.e., it's an anon page and
> +	 *    not in swapcache, or a dirty page pending writeback, add it to the
> +	 *    second oldest generation.
> +	 * 3) If it may be evicted immediately, e.g., it's a clean page, add it
> +	 *    to the oldest generation.
> +	 */
> +	if (folio_test_active(folio))
> +		gen = lru_gen_from_seq(lrugen->max_seq);
> +	else if ((!type && !folio_test_swapcache(folio)) ||
> +		 (folio_test_reclaim(folio) &&
> +		  (folio_test_dirty(folio) || folio_test_writeback(folio))))
> +		gen = lru_gen_from_seq(lrugen->min_seq[type] + 1);
> +	else
> +		gen = lru_gen_from_seq(lrugen->min_seq[type]);

Condition #2 is not quite clear to me, and the comment is incomplete:
The code does put dirty/writeback pages on the oldest gen as long as
they haven't been marked for immediate reclaim by the scanner
yet. HOWEVER, once the scanner does see those pages and sets
PG_reclaim, it will also activate them to move them out of the way
until writeback finishes (see shrink_page_list()) - at which point
we'll trigger #1. So that second part of #2 appears unreachable.

It could be a good exercise to describe how cache pages move through
the generations, similar to the comment on lru_deactivate_file_fn().
It's a good example of intent vs implementation.

On another note, "!type" meaning "anon" is a bit rough. Please follow
the "bool file" convention used elsewhere.

> @@ -113,6 +298,9 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio)
>  {
>  	enum lru_list lru = folio_lru_list(folio);
>  
> +	if (lru_gen_add_folio(lruvec, folio, true))
> +		return;
> +

bool parameters are notoriously hard to follow in the callsite. Can
you please add lru_gen_add_folio_tail() instead and have them use a
common helper?

> @@ -127,6 +315,9 @@ static __always_inline void add_page_to_lru_list_tail(struct page *page,
>  static __always_inline
>  void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio)
>  {
> +	if (lru_gen_del_folio(lruvec, folio, false))
> +		return;
> +
>  	list_del(&folio->lru);
>  	update_lru_size(lruvec, folio_lru_list(folio), folio_zonenum(folio),
>  			-folio_nr_pages(folio));
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index aed44e9b5d89..0f5e8a995781 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -303,6 +303,78 @@ enum lruvec_flags {
>  					 */
>  };
>  
> +struct lruvec;
> +
> +#define LRU_GEN_MASK		((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
> +#define LRU_REFS_MASK		((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
> +
> +#ifdef CONFIG_LRU_GEN
> +
> +#define MIN_LRU_BATCH		BITS_PER_LONG
> +#define MAX_LRU_BATCH		(MIN_LRU_BATCH * 128)

Those two aren't used in this patch, so it's hard to say whether they
are chosen correctly.

> + * Evictable pages are divided into multiple generations. The youngest and the
> + * oldest generation numbers, max_seq and min_seq, are monotonically increasing.
> + * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An
> + * offset within MAX_NR_GENS, gen, indexes the LRU list of the corresponding
> + * generation. The gen counter in folio->flags stores gen+1 while a page is on
> + * one of lrugen->lists[]. Otherwise it stores 0.
> + *
> + * A page is added to the youngest generation on faulting. The aging needs to
> + * check the accessed bit at least twice before handing this page over to the
> + * eviction. The first check takes care of the accessed bit set on the initial
> + * fault; the second check makes sure this page hasn't been used since then.
> + * This process, AKA second chance, requires a minimum of two generations,
> + * hence MIN_NR_GENS. And to be compatible with the active/inactive LRU, these
> + * two generations are mapped to the active; the rest of generations, if they
> + * exist, are mapped to the inactive. PG_active is always cleared while a page
> + * is on one of lrugen->lists[] so that demotion, which happens consequently
> + * when the aging produces a new generation, needs not to worry about it.
> + */
> +#define MIN_NR_GENS		2U
> +#define MAX_NR_GENS		((unsigned int)CONFIG_NR_LRU_GENS)
> +
> +struct lru_gen_struct {

struct lrugen?

In fact, "lrugen" for the general function and variable namespace
might be better, the _ doesn't seem to pull its weight.

CONFIG_LRUGEN
struct lrugen
lrugen_foo()
etc.

> +	/* the aging increments the youngest generation number */
> +	unsigned long max_seq;
> +	/* the eviction increments the oldest generation numbers */
> +	unsigned long min_seq[ANON_AND_FILE];

The singular max_seq vs the split min_seq raises questions. Please add
a comment that explains or points to an explanation.

> +	/* the birth time of each generation in jiffies */
> +	unsigned long timestamps[MAX_NR_GENS];

This isn't in use until the thrashing-based OOM killing patch.

> +	/* the multigenerational LRU lists */
> +	struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
> +	/* the sizes of the above lists */
> +	unsigned long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
> +	/* whether the multigenerational LRU is enabled */
> +	bool enabled;

Not (really) in use until the runtime switch. Best to keep everybody
checking the global flag for now, and have the runtime switch patch
introduce this flag and switch necessary callsites over.

> +void lru_gen_init_state(struct mem_cgroup *memcg, struct lruvec *lruvec);

"state" is what we usually init :) How about lrugen_init_lruvec()?

You can drop the memcg parameter and use lruvec_memcg().

> +#ifdef CONFIG_MEMCG
> +void lru_gen_init_memcg(struct mem_cgroup *memcg);
> +void lru_gen_free_memcg(struct mem_cgroup *memcg);

This should be either init+exit, or alloc+free.

Thanks