Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 26/07/2023 12:01, Mike Rapoport wrote:
On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
This propagates the hugepage size from the memblock APIs
(memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
so that it can be stored in struct memblock region. This does not
introduce any functional change and hugepage_size is not used in
this commit. It is just a setup for the next commit where huge_pagesize
is used to skip initialization of struct pages that will be freed later
when HVO is enabled.

Signed-off-by: Usama Arif <usama.arif@xxxxxxxxxxxxx>
---
  arch/arm64/mm/kasan_init.c                   |  2 +-
  arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
  arch/powerpc/platforms/pseries/setup.c       |  4 +-
  arch/powerpc/sysdev/dart_iommu.c             |  2 +-
  include/linux/memblock.h                     |  8 ++-
  mm/cma.c                                     |  4 +-
  mm/hugetlb.c                                 |  6 +-
  mm/memblock.c                                | 60 ++++++++++++--------
  mm/mm_init.c                                 |  2 +-
  mm/sparse-vmemmap.c                          |  2 +-
  tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
  11 files changed, 56 insertions(+), 38 deletions(-)


[ snip ]

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f71ff9f0ec81..bb8019540d73 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -63,6 +63,7 @@ struct memblock_region {
  #ifdef CONFIG_NUMA
  	int nid;
  #endif
+	phys_addr_t hugepage_size;
  };
/**
@@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
  				      phys_addr_t start, phys_addr_t end);
  phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
  				      phys_addr_t align, phys_addr_t start,
-				      phys_addr_t end, int nid, bool exact_nid);
+				      phys_addr_t end, int nid, bool exact_nid,
+				      phys_addr_t hugepage_size);

Rather than adding yet another parameter to memblock_phys_alloc_range() we
can have an API that sets a flag on the reserved regions.
With this the hugetlb reservation code can set a flag when HVO is
enabled and memmap_init_reserved_pages() will skip regions with this flag
set.


Hi,

Thanks for the review.

I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw and not memblock_phys_alloc_range?

My initial approach was to use flags, but I think it looks worse than what I have done in this RFC (I have pushed the flags prototype at https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO, top 4 commits for reference (the main difference is patch 2 and 4 from RFC)). The major points are (the bigger issue is in patch 4):

- (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is propagated from memblock_alloc_try_nid_raw through function calls. When using flags, the "no_init" boolean is propogated from memblock_alloc_try_nid_raw through function calls until the region flags are available in memblock_add_range and the new MEMBLOCK_NOINIT flag is set. I think its a bit more tricky to introduce a new function to set the flag in the region AFTER the call to memblock_alloc_try_nid_raw has finished as the memblock_region can not be found. So something (hugepage_size/flag information) still has to be propagated through function calls and a new argument needs to be added.

- (RFC vs flags patch 4 comparison) We can't skip initialization of the whole region, only the tail pages. We still need to initialize the HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic page. In the RFC, hugepage_size from patch 2 was used in the for loop in memmap_init_reserved_pages in patch 4 to reserve HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This looks very simple and not hacky. If we use a flag, there are 2 ways to initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage:

1. (implemented in github link patch 4) memmap_init_reserved_pages skips the region for initialization as you suggested, and then we initialize HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later (I did it in gather_bootmem_prealloc). When calling reserve_bootmem_region in gather_bootmem_prealloc, we need to skip early_page_uninitialised and this makes it look a bit hacky.

2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage in memmap_init_reserved_pages itself. As we have used a flag and havent passed hugepage_size, we need to get the gigantic page size somehow. There doesnt seem to be a nice way to determine the gigantic page size in that function which is architecture dependent. I think gigantic page size can be given by PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT), but not sure if this is ok for all architectures? If we can use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT) it will look much better than point 1.

Both the RFC patches and the github flags implementation work, but I think RFC patches look much cleaner. If there is a strong preference for the the github patches I can send it to mailing list?

Thanks,
Usama


  phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size,
@@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
  				 int nid);
  void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
  				 phys_addr_t min_addr, phys_addr_t max_addr,
-				 int nid);
+				 int nid, phys_addr_t hugepage_size);
  void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
  			     phys_addr_t min_addr, phys_addr_t max_addr,
  			     int nid);
@@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size,
  {
  	return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT,
  					  MEMBLOCK_ALLOC_ACCESSIBLE,
-					  NUMA_NO_NODE);
+					  NUMA_NO_NODE, 0);
  }
static inline void *memblock_alloc_from(phys_addr_t size,





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux