On 10/07/2023 10:01, Huang, Ying wrote: > Ryan Roberts <ryan.roberts@xxxxxxx> writes: > >> On 10/07/2023 06:37, Huang, Ying wrote: >>> Ryan Roberts <ryan.roberts@xxxxxxx> writes: >>> >>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>>> resending: >>>> >>>> On 07/07/2023 09:21, Huang, Ying wrote: >>>>> Ryan Roberts <ryan.roberts@xxxxxxx> writes: >>>>> >>>>>> With the introduction of large folios for anonymous memory, we would >>>>>> like to be able to split them when they have unmapped subpages, in order >>>>>> to free those unused pages under memory pressure. So remove the >>>>>> artificial requirement that the large folio needed to be at least >>>>>> PMD-sized. >>>>>> >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >>>>>> Reviewed-by: Yu Zhao <yuzhao@xxxxxxxxxx> >>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> >>>>>> --- >>>>>> mm/rmap.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>>> --- a/mm/rmap.c >>>>>> +++ b/mm/rmap.c >>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>>> * page of the folio is unmapped and at least one page >>>>>> * is still mapped. >>>>>> */ >>>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>>> if (!compound || nr < nr_pmdmapped) >>>>>> deferred_split_folio(folio); >>>>>> } >>>>> >>>>> One possible issue is that even for large folios mapped only in one >>>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>>> unnecessarily before freeing a large folio. >>>> >>>> Hi Huang, thanks for reviewing! >>>> >>>> I have a patch that solves this problem by determining a range of ptes covered >>>> by a single folio and doing a "batch zap". This prevents the need to add the >>>> folio to the deferred split queue, only to remove it again shortly afterwards. >>>> This reduces lock contention and I can measure a performance improvement for the >>>> kernel compilation benchmark. See [1]. >>>> >>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>>> aiming for the minimal patch set to start with and wanted to focus people on >>>> that. I intend to submit it separately later on. >>>> >>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@xxxxxxx/ >>> >>> Thanks for your information! "batch zap" can solve the problem. >>> >>> And, I agree with Matthew's comments to fix the large folios interaction >>> issues before merging the patches to allocate large folios as in the >>> following email. >>> >>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@xxxxxxxxxxxxxxxxxxxx/ >>> >>> If so, we don't need to introduce the above problem or a large patchset. >> >> I appreciate Matthew's and others position about not wanting to merge a minimal >> implementation while there are some fundamental features (e.g. compaction) it >> doesn't play well with - I'm working to create a definitive list so these items >> can be tracked and tackled. > > Good to know this, Thanks! > >> That said, I don't see this "batch zap" patch as an example of this. It's just a >> performance enhancement that improves things even further than large anon folios >> on their own. I'd rather concentrate on the core changes first then deal with >> this type of thing later. Does that work for you? > > IIUC, allocating large folios upon page fault depends on splitting large > folios in page_remove_rmap() to avoid memory wastage. Splitting large > folios in page_remove_rmap() depends on "batch zap" to avoid performance > regression in zap_pte_range(). So we need them to be done earlier. Or > I miss something? My point was just that large anon folios improves performance significantly overall, despite a small perf regression in zap_pte_range(). That regression is reduced further by a patch from Yin Fengwei to reduce the lock contention [1]. So it doesn't seem urgent to me to get the "batch zap" change in. I'll add it to my list, then prioritize it against the other stuff. [1] https://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@xxxxxxxxx/ > > Best Regards, > Huang, Ying