On 7/6/23 2:37 PM, David Hildenbrand wrote: > On 06.07.23 10:50, Aneesh Kumar K.V wrote: >> Radix vmemmap mapping can map things correctly at the PMD level or PTE >> level based on different device boundary checks. We also use altmap.reserve >> feature to align things correctly at pageblock granularity. We can end up >> loosing some pages in memory with this. For ex: with 256MB memory block >> size, we require 4 pages to map vmemmap pages, In order to align things >> correctly we end up adding a reserve of 28 pages. ie, for every 4096 pages >> 28 pages get reserved. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> >> --- >> arch/powerpc/Kconfig | 1 + >> arch/powerpc/mm/book3s64/radix_pgtable.c | 28 +++++++++++++++++++ >> .../platforms/pseries/hotplug-memory.c | 4 ++- >> 3 files changed, 32 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index 116d6add0bb0..f890907e5bbf 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -157,6 +157,7 @@ config PPC >> select ARCH_HAS_UBSAN_SANITIZE_ALL >> select ARCH_HAVE_NMI_SAFE_CMPXCHG >> select ARCH_KEEP_MEMBLOCK >> + select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU >> select ARCH_MIGHT_HAVE_PC_PARPORT >> select ARCH_MIGHT_HAVE_PC_SERIO >> select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX >> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c >> index a62729f70f2a..c0bd60b5fb64 100644 >> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c >> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c >> @@ -1678,3 +1678,31 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) >> return 1; >> } >> + >> +/* >> + * mm/memory_hotplug.c:mhp_supports_memmap_on_memory goes into details >> + * some of the restrictions. We don't check for PMD_SIZE because our >> + * vmemmap allocation code can fallback correctly. The pageblock > > x86 also has the fallback IIRC; the concern is rather that you end up with a PTE-mapped vmemmap, which is slower at runtime than a PMD-mapped vmemmap. The memory block size is dependent on a config option at the hypervisor for power and we can have values ranging from 16MB - 512MB With these different memory block sizes I was thinking relaxing these checks makes the feature more useful. Also with page size of 64K using a 2M vmemmap requires larger memory block size. > >> + * alignment requirement is met using altmap->reserve blocks. >> + */ >> +bool mhp_supports_memmap_on_memory(unsigned long size) >> +{ >> + if (!radix_enabled()) >> + return false; >> + /* >> + * The pageblock alignment requirement is met by using >> + * reserve blocks in altmap. >> + */ >> + return size == memory_block_size_bytes(); >> +} > > I really dislike such arch overrides. > > I think the flow should be something like that, having a generic: > > bool mhp_supports_memmap_on_memory(unsigned long size) > { > ... > return arch_mhp_supports_memmap_on_memory(size)) && > size == memory_block_size_bytes() && > ... > } > Sure we can do that. But for ppc64 we also want to skip the PMD_SIZE and pageblock alignment restrictions. > where we'll also keep the pageblock check here. For ppc64, I converted that pageblock rule as a reserve blocks allocation with altmap. I am not sure whether we want to do that outside ppc64? > > And a ppc specific > > bool arch_mhp_supports_memmap_on_memory(unsigned long size) > { > /* > * Don't check for the vmemmap covering PMD_SIZE, we accept that > * we might get a PTE-mapped vmemmap. > */ > return radix_enabled(); > } > > We can then move the PMD_SIZE check into arch specific code (x86-aarch64). > sure. will rework the changes accordingly. -aneesh