Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08.02.21 11:38, Oscar Salvador wrote:
Free hugetlb pages are trickier to handle as to in order to guarantee
no userspace appplication disruption, we need to replace the
current free hugepage with a new one.

In order to do that, a new function called alloc_and_dissolve_huge_page
in introduced.
This function will first try to get a new fresh hugetlb page, and if it
succeeds, it will dissolve the old one.


Thanks for looking into this! Can we move this patch to #1 in the series? It is the easier case.

I also wonder if we should at least try on the memory unplug path to keep nr_pages by at least trying to allocate at new one if required, and printing a warning if that fails (after all, we're messing with something configured by the admin - "nr_pages"). Note that gigantic pages are special (below).

Signed-off-by: Oscar Salvador <osalvador@xxxxxxx>
---
  include/linux/hugetlb.h |  6 ++++++
  mm/compaction.c         | 11 +++++++++++
  mm/hugetlb.c            | 35 +++++++++++++++++++++++++++++++++++
  3 files changed, 52 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..f81afcb86e89 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,6 +505,7 @@ struct huge_bootmem_page {
  	struct hstate *hstate;
  };
+bool alloc_and_dissolve_huge_page(struct page *page);
  struct page *alloc_huge_page(struct vm_area_struct *vma,
  				unsigned long addr, int avoid_reserve);
  struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
@@ -773,6 +774,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
  #else	/* CONFIG_HUGETLB_PAGE */
  struct hstate {};
+static inline bool alloc_and_dissolve_huge_page(struct page *page)
+{
+	return false;
+}
+
  static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
  					   unsigned long addr,
  					   int avoid_reserve)
diff --git a/mm/compaction.c b/mm/compaction.c
index 89cd2e60da29..7969ddc10856 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
  					low_pfn += compound_nr(page) - 1;
  					goto isolate_success_no_list;
  				}
+			} else {

} else if (alloc_and_dissolve_huge_page(page))) {

...

+				/*
+				 * Free hugetlb page. Allocate a new one and
+				 * dissolve this is if succeed.
+				 */
+				if (alloc_and_dissolve_huge_page(page)) {
+					unsigned long order = buddy_order_unsafe(page);
+
+					low_pfn += (1UL << order) - 1;
+					continue;
+				}



Note that there is a very ugly corner case we will have to handle gracefully (I think also in patch #1):

Assume you allocated a gigantic page (and assume that we are not using CMA for gigantic pages for simplicity). Assume you want to allocate another one. alloc_pool_huge_page()->...->alloc_contig_pages() will stumble over the first allocated page. It will try to alloc_and_dissolve_huge_page() the existing gigantic page. To do that, it will alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on. Bad.

We really don't want to mess with gigantic pages (migrate, dissolve) while allocating a gigantic page. I think the easiest (and cleanest) way forward is to not mess (isolate, migrate, dissolve) with gigantic pages at all.

Gigantic pages are not movable, so they won't be placed on random CMA / ZONE_MOVABLE.

Some hstate_is_gigantic(h) calls (maybe inside alloc_and_dissolve_huge_page() ? ) along with a nice comment might be good enough to avoid having to pass down some kind of alloc_contig context. I even think that should be handled inside

(the main issue is that in contrast to CMA, plain alloc_contig_pages() has no memory about which parts were allocated and will simply try re-allocating what it previously allocated and never freed - which is usually fine, unless we're dealing with such special cases)

Apart from that, not messing with gigantic pages feels like the right approach (allocating/migrating gigantic pages is just horribly slow and most probably not worth it anyway).

--
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux