On 25 Apr 2024, at 23:28, Barry Song wrote: > On Fri, Apr 26, 2024 at 10:50 AM Zi Yan <ziy@xxxxxxxxxx> wrote: >> >> On 25 Apr 2024, at 22:23, Barry Song wrote: >> >>> On Fri, Apr 26, 2024 at 9:55 AM Zi Yan <ziy@xxxxxxxxxx> wrote: >>>> >>>> On 25 Apr 2024, at 21:45, Barry Song wrote: >>>> >>>>> On Fri, Apr 26, 2024 at 5:11 AM Zi Yan <zi.yan@xxxxxxxx> wrote: >>>>>> >>>>>> From: Zi Yan <ziy@xxxxxxxxxx> >>>>>> >>>>>> In __folio_remove_rmap(), a large folio is added to deferred split list >>>>>> if any page in a folio loses its final mapping. But it is possible that >>>>>> the folio is fully unmapped and adding it to deferred split list is >>>>>> unnecessary. >>>>>> >>>>>> For PMD-mapped THPs, that was not really an issue, because removing the >>>>>> last PMD mapping in the absence of PTE mappings would not have added the >>>>>> folio to the deferred split queue. >>>>>> >>>>>> However, for PTE-mapped THPs, which are now more prominent due to mTHP, >>>>>> they are always added to the deferred split queue. One side effect >>>>>> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be >>>>>> unintentionally increased, making it look like there are many partially >>>>>> mapped folios -- although the whole folio is fully unmapped stepwise. >>>>>> >>>>>> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs >>>>>> where possible starting from commit b06dc281aa99 ("mm/rmap: introduce >>>>>> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped >>>>>> folio is unmapped in one go and can avoid being added to deferred split >>>>>> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be >>>>>> noise when we cannot batch-unmap a complete PTE-mapped folio in one go >>>>>> -- or where this type of batching is not implemented yet, e.g., migration. >>>>>> >>>>>> To avoid the unnecessary addition, folio->_nr_pages_mapped is checked >>>>>> to tell if the whole folio is unmapped. If the folio is already on >>>>>> deferred split list, it will be skipped, too. >>>>>> >>>>>> Note: commit 98046944a159 ("mm: huge_memory: add the missing >>>>>> folio_test_pmd_mappable() for THP split statistics") tried to exclude >>>>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not >>>>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still >>>>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, >>>>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside >>>>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). >>>>>> >>>>>> Signed-off-by: Zi Yan <ziy@xxxxxxxxxx> >>>>>> Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> >>>>>> --- >>>>>> mm/rmap.c | 8 +++++--- >>>>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>> index a7913a454028..220ad8a83589 100644 >>>>>> --- a/mm/rmap.c >>>>>> +++ b/mm/rmap.c >>>>>> @@ -1553,9 +1553,11 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>>>>> * page of the folio is unmapped and at least one page >>>>>> * is still mapped. >>>>>> */ >>>>>> - if (folio_test_large(folio) && folio_test_anon(folio)) >>>>>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) >>>>>> - deferred_split_folio(folio); >>>>>> + if (folio_test_large(folio) && folio_test_anon(folio) && >>>>>> + list_empty(&folio->_deferred_list) && >>>>>> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) || >>>>>> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped))) >>>>>> + deferred_split_folio(folio); >>>>> >>>>> Hi Zi Yan, >>>>> in case a mTHP is mapped by two processed (forked but not CoW yet), if we >>>>> unmap the whole folio by pte level in one process only, are we still adding this >>>>> folio into deferred list? >>>> >>>> No. Because the mTHP is still fully mapped by the other process. In terms of code, >>>> nr will be 0 in that case and this if condition is skipped. nr is only increased >>>> from 0 when one of the subpages in the mTHP has no mapping, namely page->_mapcount >>>> becomes negative and last is true in the case RMAP_LEVEL_PTE. >>> >>> Ok. i see, so "last" won't be true? >>> >>> case RMAP_LEVEL_PTE: >>> do { >>> last = atomic_add_negative(-1, &page->_mapcount); >>> if (last && folio_test_large(folio)) { >>> last = atomic_dec_return_relaxed(mapped); >>> last = (last < ENTIRELY_MAPPED); >>> } >>> >>> if (last) >>> nr++; >>> } while (page++, --nr_pages > 0); >>> break; >> >> Right, because for every subpage its corresponding >> last = atomic_add_negative(-1, &page->_mapcount); is not true after the unmapping.2 > > if a mTHP is mapped only by one process, and we unmap it entirely, we will > get nr > 0, then we are executing adding it into deferred_list? so it seems > atomic_read(mapped) is preventing this case from adding deferred_list? Yes, that is what this patch is doing. When a mTHP is mapped by one process and later unmapped fully, there is no need to add it to deferred_list. The mTHP will be freed right afterwards. > > I wonder if it is possible to fixup nr to 0 from the first place? > for example > /* we are doing an entire unmapping */ > if (page==&folio->page && nr_pages == folio_nr_pages(folio)) > ... > >> >> >> -- >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature