> On Sep 7, 2023, at 05:12, Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 09/06/23 16:07, Muchun Song wrote: >>> On Sep 6, 2023, at 15:33, Muchun Song <muchun.song@xxxxxxxxx> wrote: >>> On 2023/9/6 05:44, Mike Kravetz wrote: >>>> When removing hugetlb pages from the pool, we first create a list >>>> of removed pages and then free those pages back to low level allocators. >>>> Part of the 'freeing process' is to restore vmemmap for all base pages >>>> if necessary. Pass this list of pages to a new routine >>>> hugetlb_vmemmap_restore_folios() so that vmemmap restoration can be >>>> performed in bulk. >>>> >>>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >>>> --- >>>> mm/hugetlb.c | 3 +++ >>>> mm/hugetlb_vmemmap.c | 13 +++++++++++++ >>>> mm/hugetlb_vmemmap.h | 5 +++++ >>>> 3 files changed, 21 insertions(+) >>>> >>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>> index 554be94b07bd..dd2dbc256172 100644 >>>> --- a/mm/hugetlb.c >>>> +++ b/mm/hugetlb.c >>>> @@ -1838,6 +1838,9 @@ static void update_and_free_pages_bulk(struct hstate *h, struct list_head *list) >>>> { >>>> struct folio *folio, *t_folio; >>>> + /* First restore vmemmap for all pages on list. */ >>>> + hugetlb_vmemmap_restore_folios(h, list); >>>> + >>>> list_for_each_entry_safe(folio, t_folio, list, lru) { >>>> update_and_free_hugetlb_folio(h, folio, false); >>>> cond_resched(); >>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>>> index ac5577d372fe..79de984919ef 100644 >>>> --- a/mm/hugetlb_vmemmap.c >>>> +++ b/mm/hugetlb_vmemmap.c >>>> @@ -481,6 +481,19 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) >>>> return ret; >>>> } >>>> +/* >>>> + * This function will attempt to resore vmemmap for a list of folios. There >>>> + * is no guarantee that restoration will be successful for all or any folios. >>>> + * This is used in bulk operations, and no feedback is given to the caller. >>>> + */ >>>> +void hugetlb_vmemmap_restore_folios(const struct hstate *h, struct list_head *folio_list) >>>> +{ >>>> + struct folio *folio; >>>> + >>>> + list_for_each_entry(folio, folio_list, lru) >>>> + (void)hugetlb_vmemmap_restore(h, &folio->page); >>> >>> I am curious about the purpose of "void" here, seems it it not necessnary, >>> ritgh? We cound see so many palces where we do not add the void if the caller >>> does not care about the return value of the callee. >> >> Another question: should we stop restoring vmemmap pages when >> hugetlb_vmemmap_restore() fails? In which case, I suspect there >> is no memory probably, there is no need to continue, right? > > Recall that the list of hugetlb pages may be from multiple nodes. My first > thought was that we should continue because memory allocation may fail on one > node but succeed on another. However, with > https://lore.kernel.org/linux-mm/20230905031312.91929-1-yuancan@xxxxxxxxxx/ > memory allocation should fall back to other nodes. So, yes I do believe it > would make sense to stop when hugetlb_vmemmap_restore returns ENOMEM as > we are unlikely to make forward progress. Agree. > > Today's behavior will try to restore vmemmap for all pages. No stopping > on error. > > I have mixed thoughts on this. Quitting on error 'seems reasonable'. > However, if we continue we 'might' be able to allocate vmemmap for one > hugetlb page. And, if we free one hugetlb page that should provide > vmemmap for several more and we may be able to free most pages on the > list. Yes. A good point. But there should be a non-optimized huge page been freed somewhere in parallel, otherwise we still cannot allocate memory. However, the freeing operation happens after hugetlb_vmemmap_restore_folios. If we want to handle this, we should rework update_and_free_pages_bulk() to do a try when at least a huge pages is freed. Thanks. > -- > Mike Kravetz