Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 10 Feb 2022 15:41:57 -0500

Hello Yu,

On Tue, Feb 08, 2022 at 01:18:54AM -0700, Yu Zhao wrote:
> @@ -92,11 +92,196 @@ static __always_inline enum lru_list folio_lru_list(struct folio *folio)
>  	return lru;
>  }
>  
> +#ifdef CONFIG_LRU_GEN
> +
> +static inline bool lru_gen_enabled(void)
> +{
> +	return true;
> +}
> +
> +static inline bool lru_gen_in_fault(void)
> +{
> +	return current->in_lru_fault;
> +}
> +
> +static inline int lru_gen_from_seq(unsigned long seq)
> +{
> +	return seq % MAX_NR_GENS;
> +}
> +
> +static inline bool lru_gen_is_active(struct lruvec *lruvec, int gen)
> +{
> +	unsigned long max_seq = lruvec->lrugen.max_seq;
> +
> +	VM_BUG_ON(gen >= MAX_NR_GENS);
> +
> +	/* see the comment on MIN_NR_GENS */
> +	return gen == lru_gen_from_seq(max_seq) || gen == lru_gen_from_seq(max_seq - 1);
> +}

I'm still reading the series, so correct me if I'm wrong: the "active"
set is split into two generations for the sole purpose of the
second-chance policy for fresh faults, right?

If so, it'd be better to have the comment here instead of down by
MIN_NR_GENS. This is the place that defines what "active" is, so this
is where the reader asks what it means and what it implies. The
definition of MIN_NR_GENS can be briefer: "need at least two for
second chance, see lru_gen_is_active() for details".

> +static inline void lru_gen_update_size(struct lruvec *lruvec, enum lru_list lru,
> +				       int zone, long delta)
> +{
> +	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> +
> +	lockdep_assert_held(&lruvec->lru_lock);
> +	WARN_ON_ONCE(delta != (int)delta);
> +
> +	__mod_lruvec_state(lruvec, NR_LRU_BASE + lru, delta);
> +	__mod_zone_page_state(&pgdat->node_zones[zone], NR_ZONE_LRU_BASE + lru, delta);
> +}

This is a duplicate of update_lru_size(), please use that instead.

Yeah technically you don't need the mem_cgroup_update_lru_size() but
that's not worth sweating over, better to keep it simple.

> +static inline void lru_gen_balance_size(struct lruvec *lruvec, struct folio *folio,
> +					int old_gen, int new_gen)

lru_gen_update_lru_sizes() for this one would be more descriptive imo
and in line with update_lru_size() that it's built on.

> +{
> +	int type = folio_is_file_lru(folio);
> +	int zone = folio_zonenum(folio);
> +	int delta = folio_nr_pages(folio);
> +	enum lru_list lru = type * LRU_INACTIVE_FILE;
> +	struct lru_gen_struct *lrugen = &lruvec->lrugen;
> +
> +	VM_BUG_ON(old_gen != -1 && old_gen >= MAX_NR_GENS);
> +	VM_BUG_ON(new_gen != -1 && new_gen >= MAX_NR_GENS);
> +	VM_BUG_ON(old_gen == -1 && new_gen == -1);

Could be a bit easier to read quickly with high-level descriptions:

> +	if (old_gen >= 0)
> +		WRITE_ONCE(lrugen->nr_pages[old_gen][type][zone],
> +			   lrugen->nr_pages[old_gen][type][zone] - delta);
> +	if (new_gen >= 0)
> +		WRITE_ONCE(lrugen->nr_pages[new_gen][type][zone],
> +			   lrugen->nr_pages[new_gen][type][zone] + delta);
> +
	/* Addition */
> +	if (old_gen < 0) {
> +		if (lru_gen_is_active(lruvec, new_gen))
> +			lru += LRU_ACTIVE;
> +		lru_gen_update_size(lruvec, lru, zone, delta);
> +		return;
> +	}
> +
	/* Removal */
> +	if (new_gen < 0) {
> +		if (lru_gen_is_active(lruvec, old_gen))
> +			lru += LRU_ACTIVE;
> +		lru_gen_update_size(lruvec, lru, zone, -delta);
> +		return;
> +	}
> +
	/* Promotion */
> +	if (!lru_gen_is_active(lruvec, old_gen) && lru_gen_is_active(lruvec, new_gen)) {
> +		lru_gen_update_size(lruvec, lru, zone, -delta);
> +		lru_gen_update_size(lruvec, lru + LRU_ACTIVE, zone, delta);
> +	}
> +
> +	/* Promotion is legit while a page is on an LRU list, but demotion isn't. */

	/* Demotion happens during aging when pages are isolated, never on-LRU */
> +	VM_BUG_ON(lru_gen_is_active(lruvec, old_gen) && !lru_gen_is_active(lruvec, new_gen));
> +}

On that note, please move introduction of the promotion and demotion
bits to the next patch. They aren't used here yet, and I spent some
time jumping around patches to verify the promotion callers and
confirm the validy of the BUG_ON.

> +static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
> +{
> +	int gen;
> +	unsigned long old_flags, new_flags;
> +	int type = folio_is_file_lru(folio);
> +	int zone = folio_zonenum(folio);
> +	struct lru_gen_struct *lrugen = &lruvec->lrugen;
> +
> +	if (folio_test_unevictable(folio) || !lrugen->enabled)
> +		return false;

These two checks should be in the callsite and the function should
return void. Otherwise you can't understand the callsite without
drilling down into lrugen code, even if lrugen is disabled.

folio_add_lru() gets it right.

> +	/*
> +	 * There are three common cases for this page:
> +	 * 1) If it shouldn't be evicted, e.g., it was just faulted in, add it
> +	 *    to the youngest generation.

"shouldn't be evicted" makes it sound like mlock. But they should just
be evicted last, right? Maybe:

	/*
	 * Pages start in different generations depending on
	 * advance knowledge we have about their hotness and
	 * evictability:
	 * 
	 * 1. Already active pages start out youngest. This can be
	 *    fresh faults, or refaults of previously hot pages.
	 * 2. Cold pages that require writeback before becoming
	 *    evictable start on the second oldest generation.
	 * 3. Everything else (clean, cold) starts old.
	 */

On that note, I think #1 is reintroducing a problem we have fixed
before, which is trashing the workingset with a flood of use-once
mmapped pages. It's the classic scenario where LFU beats LRU.

Mapped streaming IO isn't very common, but it does happen. See these
commits:

dfc8d636cdb95f7b792d5ba8c9f3b295809c125d
31c0569c3b0b6cc8a867ac6665ca081553f7984c
645747462435d84c6c6a64269ed49cc3015f753d