On 09/19/23 17:52, Muchun Song wrote: > > > On 2023/9/19 07:01, Mike Kravetz wrote: > > The routine update_and_free_pages_bulk already performs vmemmap > > restoration on the list of hugetlb pages in a separate step. In > > preparation for more functionality to be added in this step, create a > > new routine hugetlb_vmemmap_restore_folios() that will restore > > vmemmap for a list of folios. > > > > This new routine must provide sufficient feedback about errors and > > actual restoration performed so that update_and_free_pages_bulk can > > perform optimally. > > > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > > --- > > mm/hugetlb.c | 36 ++++++++++++++++++------------------ > > mm/hugetlb_vmemmap.c | 37 +++++++++++++++++++++++++++++++++++++ > > mm/hugetlb_vmemmap.h | 11 +++++++++++ > > 3 files changed, 66 insertions(+), 18 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index d6f3db3c1313..814bb1982274 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1836,36 +1836,36 @@ static void update_and_free_hugetlb_folio(struct hstate *h, struct folio *folio, > > static void update_and_free_pages_bulk(struct hstate *h, struct list_head *list) > > { > > + int ret; > > + unsigned long restored; > > struct folio *folio, *t_folio; > > - bool clear_dtor = false; > > /* > > - * First allocate required vmemmmap (if necessary) for all folios on > > - * list. If vmemmap can not be allocated, we can not free folio to > > - * lower level allocator, so add back as hugetlb surplus page. > > - * add_hugetlb_folio() removes the page from THIS list. > > - * Use clear_dtor to note if vmemmap was successfully allocated for > > - * ANY page on the list. > > + * First allocate required vmemmmap (if necessary) for all folios. > > */ > > - list_for_each_entry_safe(folio, t_folio, list, lru) { > > - if (folio_test_hugetlb_vmemmap_optimized(folio)) { > > - if (hugetlb_vmemmap_restore(h, &folio->page)) { > > - spin_lock_irq(&hugetlb_lock); > > + ret = hugetlb_vmemmap_restore_folios(h, list, &restored); > > + > > + /* > > + * If there was an error restoring vmemmap for ANY folios on the list, > > + * add them back as surplus hugetlb pages. add_hugetlb_folio() removes > > + * the folio from THIS list. > > + */ > > + if (ret < 0) { > > + spin_lock_irq(&hugetlb_lock); > > + list_for_each_entry_safe(folio, t_folio, list, lru) > > + if (folio_test_hugetlb_vmemmap_optimized(folio)) > > add_hugetlb_folio(h, folio, true); > > - spin_unlock_irq(&hugetlb_lock); > > - } else > > - clear_dtor = true; > > - } > > + spin_unlock_irq(&hugetlb_lock); > > } > > /* > > - * If vmemmmap allocation was performed on any folio above, take lock > > - * to clear destructor of all folios on list. This avoids the need to > > + * If vmemmmap allocation was performed on ANY folio , take lock to > > + * clear destructor of all folios on list. This avoids the need to > > * lock/unlock for each individual folio. > > * The assumption is vmemmap allocation was performed on all or none > > * of the folios on the list. This is true expect in VERY rare cases. > > */ > > - if (clear_dtor) { > > + if (restored) { > > spin_lock_irq(&hugetlb_lock); > > list_for_each_entry(folio, list, lru) > > __clear_hugetlb_destructor(h, folio); > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > index 4558b814ffab..463a4037ec6e 100644 > > --- a/mm/hugetlb_vmemmap.c > > +++ b/mm/hugetlb_vmemmap.c > > @@ -480,6 +480,43 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head) > > return ret; > > } > > +/** > > + * hugetlb_vmemmap_restore_folios - restore vmemmap for every folio on the list. > > + * @h: struct hstate. > > + * @folio_list: list of folios. > > + * @restored: Set to number of folios for which vmemmap was restored > > + * successfully if caller passes a non-NULL pointer. > > + * > > + * Return: %0 if vmemmap exists for all folios on the list. If an error is > > + * encountered restoring vmemmap for ANY folio, an error code > > + * will be returned to the caller. It is then the responsibility > > + * of the caller to check the hugetlb vmemmap optimized flag of > > + * each folio to determine if vmemmap was actually restored. > > + */ > > +int hugetlb_vmemmap_restore_folios(const struct hstate *h, > > + struct list_head *folio_list, > > + unsigned long *restored) > > +{ > > + unsigned long num_restored; > > + struct folio *folio; > > + int ret = 0, t_ret; > > + > > + num_restored = 0; > > + list_for_each_entry(folio, folio_list, lru) { > > + if (folio_test_hugetlb_vmemmap_optimized(folio)) { > > + t_ret = hugetlb_vmemmap_restore(h, &folio->page); > > I still think we should free a non-optimized HugeTLB page if we > encounter an OOM situation instead of continue to restore > vemmmap pages. Restoring vmemmmap pages will only aggravate > the OOM situation. The suitable appraoch is to free a non-optimized > HugeTLB page to satisfy our allocation of vmemmap pages, what's > your opinion, Mike? I agree. As you mentioned previously, this may complicate this code path a bit. I will rewrite to make this happen. -- Mike Kravetz