Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 15, 2023 at 15:23 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>Regarding the change itself, it'd cause a slight regression to other
>use cases (details below).
>
> > @@ -3355,6 +3359,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> >                 unsigned long pfn;
> >                 struct folio *folio;
> >                 pte_t ptent = ptep_get(pte + i);
> > +               bool is_pte_young;
> >
> >                 total++;
> >                 walk->mm_stats[MM_LEAF_TOTAL]++;
> > @@ -3363,16 +3368,20 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
> >                 if (pfn == -1)
> >                         continue;
> >
> > -               if (!pte_young(ptent)) {
> > -                       walk->mm_stats[MM_LEAF_OLD]++;
>
> Most overhead from page table scanning normally comes from
> get_pfn_folio() because it almost always causes a cache miss. This is
> like a pointer dereference, whereas scanning PTEs is like streaming an
> array (bad vs good cache performance).
>
> pte_young() is here to avoid an unnecessary cache miss from
> get_pfn_folio(). Also see the first comment in get_pfn_folio(). It
> should be easy to verify the regression -- FlameGraph from the
> memcached benchmark in the original commit message should do it.
>
> Would a tracepoint here work for you?
>
>
>
> > +               is_pte_young = !!pte_young(ptent);
> > +               folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, is_pte_young);
> > +               if (!folio) {
> > +                       if (!is_pte_young)
> > +                               walk->mm_stats[MM_LEAF_OLD]++;
> >                         continue;
> >                 }
> >
> > -               folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
> > -               if (!folio)
> > +               if (!folio_test_clear_young(folio) && !is_pte_young) {
> > +                       walk->mm_stats[MM_LEAF_OLD]++;
> >                         continue;
> > +               }
> >
> > -               if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
> > +               if (is_pte_young && !ptep_test_and_clear_young(args->vma, addr, pte + i))
> >                         VM_WARN_ON_ONCE(true);
> >
> >                 young++;

Thanks for replying.

For avoiding below:
1. confict between page_idle/bitmap and mglru scan
2. performance downgrade in mglru page-table scan if call get_pfn_folio for each pte.

We have a new idea:
1. Include a new api under /sys/kernel/mm/page_idle, support mark idle flag only, without
rmap walking or clearing pte young.
2. Use mglru proactive scan to clear page idle flag.

workflows:
      t1                      t2 
mark pages idle    mglru scan and check pages idle

It's easy for us to know that whether a page is accessed during t1~t2.

Some code changes like these:

We clear idle flags in get_pfn_folio, and in walk_pte_range we still follow
original design.

static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
					struct pglist_data *pgdat, bool can_swap, bool clear_idle)
{
	struct folio *folio;

	/* try to avoid unnecessary memory loads */
	if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
		return NULL;

	folio = pfn_folio(pfn);
+
+	if (clear_idle && folio_test_idle(folio))
+		folio_clear_idle(folio);
+
	if (folio_nid(folio) != pgdat->node_id)
		return NULL;

	if (folio_memcg_rcu(folio) != memcg)
		return NULL;

	/* file VMAs can contain anon pages from COW */
	if (!folio_is_file_lru(folio) && !can_swap)
		return NULL;

	return folio;
}

static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
			struct mm_walk *args)
{
...
	for (i = pte_index(start), addr = start; addr != end; i++, addr += PAGE_SIZE) {
		unsigned long pfn;
		struct folio *folio;
		pte_t ptent = ptep_get(pte + i);

		total++;
		walk->mm_stats[MM_LEAF_TOTAL]++;

		pfn = get_pte_pfn(ptent, args->vma, addr);
		if (pfn == -1)
			continue;

		if (!pte_young(ptent)) {
			walk->mm_stats[MM_LEAF_OLD]++;
			continue;
		}

+		folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap, true);
-		folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap);
		if (!folio)
			continue;
...
}

Is it a good idea or not ?

-- 
2.43.0





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux