On 08/08/2024 16:55, David Hildenbrand wrote: > On 07.08.24 15:46, Usama Arif wrote: >> This is an attempt to mitigate the issue of running out of memory when THP >> is always enabled. During runtime whenever a THP is being faulted in >> (__do_huge_pmd_anonymous_page) or collapsed by khugepaged >> (collapse_huge_page), the THP is added to _deferred_list. Whenever memory >> reclaim happens in linux, the kernel runs the deferred_split >> shrinker which goes through the _deferred_list. >> >> If the folio was partially mapped, the shrinker attempts to split it. >> A new boolean is added to be able to distinguish between partially >> mapped folios and others in the deferred_list at split time in >> deferred_split_scan. Its needed as __folio_remove_rmap decrements >> the folio mapcount elements, hence it won't be possible to distinguish >> between partially mapped folios and others in deferred_split_scan >> without the boolean. > > Just so I get this right: Are you saying that we might now add fully mapped folios to the deferred split queue and that's what you want to distinguish? Yes > > If that's the case, then could we use a bit in folio->_flags_1 instead? Yes, thats a good idea. Will create the below flag for the next revision diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 5769fe6e4950..5825bd1cf6db 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -189,6 +189,11 @@ enum pageflags { #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) +enum folioflags_1 { + /* The first 8 bits of folio->_flags_1 are used to keep track of folio order */ + FOLIO_PARTIALLY_MAPPED = 8, /* folio is partially mapped */ +} + #ifndef __GENERATING_BOUNDS_H and use set/clear/test_bit(FOLIO_PARTIALLY_MAPPED, &folio->_flags_1) in the respective places. > > Further, I think you forgot to update at least one instance if a list_empty(&folio->_deferred_list) check where we want to detect "partially mapped". Please go over all and see what needs adjustments. > Ah I think its the one in free_tail_page_prepare? The way I wrote this patch is by going through all instances of "folio->_deferred_list" and deciding if partially_mapped needs to be set/cleared/tested. I think I missed it when rebasing to mm-unstable. Double checked now and the only one missing is free_tail_page_prepare ([1] was removed recently by Barry) [1] https://lore.kernel.org/lkml/20240629234155.53524-1-21cnbao@xxxxxxxxx/ Will include the below diff in the next revision. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aae00ba3b3bd..b4e1393cbd4f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -957,8 +957,9 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) break; case 2: /* the second tail page: deferred_list overlaps ->mapping */ - if (unlikely(!list_empty(&folio->_deferred_list))) { - bad_page(page, "on deferred list"); + if (unlikely(!list_empty(&folio->_deferred_list) && + test_bit(FOLIO_PARTIALLY_MAPPED, &folio->_flags_1))) { + bad_page(page, "partially mapped folio on deferred list"); goto out; } break; > I would actually suggest to split decoupling of "_deferred_list" and "partially mapped" into a separate preparation patch. > Yes, will do. I will split it into 3 patches, 1st one that introduces FOLIO_PARTIALLY_MAPPED and sets/clear it in the right place without introducing any functional change, 2nd to split underutilized THPs and 3rd to add sysfs entry to enable/disable the shrinker. Should make the patches quite small and easy to review.