Re: [RFC PATCH] mm: support large folio numa balancing

Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> · Tue, 14 Nov 2023 21:12:51 +0800

On 2023/11/14 19:35, David Hildenbrand wrote:
On 13.11.23 23:15, John Hubbard wrote:
On 11/13/23 5:01 AM, Baolin Wang wrote:

On 11/13/2023 8:10 PM, Kefeng Wang wrote:

On 2023/11/13 18:53, David Hildenbrand wrote:
On 13.11.23 11:45, Baolin Wang wrote:
Currently, the file pages already support large folio, and
supporting for
anonymous pages is also under discussion[1]. Moreover, the numa
balancing
code are converted to use a folio by previous thread[2], and the
migrate_pages
function also already supports the large folio migration.

So now I did not see any reason to continue restricting NUMA
balancing for
large folio.

I recall John wanted to look into that. CCing him.

I'll note that the "head page mapcount" heuristic to detect sharers 
will
now strike on the PTE path and make us believe that a large folios is
exclusive, although it isn't.

As spelled out in the commit you are referencing:

commit 6695cf68b15c215d33b8add64c33e01e3cbe236c
Author: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
Date:   Thu Sep 21 15:44:14 2023 +0800

      mm: memory: use a folio in do_numa_page()
      Numa balancing only try to migrate non-compound page in
do_numa_page(),
      use a folio in it to save several compound_head calls, note 
we use
      folio_estimated_sharers(), it is enough to check the folio
sharers since
      only normal page is handled, if large folio numa balancing is
supported, a
      precise folio sharers check would be used, no functional change
intended.

I'll send WIP patches for one approach that can improve the situation
soonish.

To be honest, I'm still catching up on the approximate vs. exact
sharers case. It wasn't clear to me why a precise sharers count
is needed in order to do this. Perhaps the cost of making a wrong
decision is considered just too high?

Good question, I didn't really look into the impact for the NUMA hinting 
case where we might end up not setting TNF_SHARED although it is shared. 
For other folio_estimate_sharers() users it's more obvious.

The task_numa_group() will check the TNF_SHARED, if processes share same
page/folio, they will be packed into a single numa group, and the numa
group fault statistic will be used in should_numa_migrate_memory() to
decide whether to migrate or not, if not setting TNF_SHARED, maybe be
lead to more page/folio migration.

As a side note, it could have happened already in corner cases (e.g., 
concurrent page migration of a small folio).

If precision as documented in that commit is really required remains to 
be seen -- just wanted to spell it out.