On Fri, May 10, 2019 at 6:30 AM <osalvador@xxxxxxx> wrote: > > On 2019-05-07 01:39, Dan Williams wrote: > > Towards enabling memory hotplug to track partial population of a > > section, introduce 'struct mem_section_usage'. > > > > A pointer to a 'struct mem_section_usage' instance replaces the > > existing > > pointer to a 'pageblock_flags' bitmap. Effectively it adds one more > > 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to > > house a new 'subsection_map' bitmap. The new bitmap enables the memory > > hot{plug,remove} implementation to act on incremental sub-divisions of > > a > > section. > > > > The default SUBSECTION_SHIFT is chosen to keep the 'subsection_map' no > > larger than a single 'unsigned long' on the major architectures. > > Alternatively an architecture can define ARCH_SUBSECTION_SHIFT to > > override the default PMD_SHIFT. Note that PowerPC needs to use > > ARCH_SUBSECTION_SHIFT to workaround PMD_SHIFT being a non-constant > > expression on PowerPC. > > > > The primary motivation for this functionality is to support platforms > > that mix "System RAM" and "Persistent Memory" within a single section, > > or multiple PMEM ranges with different mapping lifetimes within a > > single > > section. The section restriction for hotplug has caused an ongoing saga > > of hacks and bugs for devm_memremap_pages() users. > > > > Beyond the fixups to teach existing paths how to retrieve the 'usemap' > > from a section, and updates to usemap allocation path, there are no > > expected behavior changes. > > > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > Cc: Vlastimil Babka <vbabka@xxxxxxx> > > Cc: Logan Gunthorpe <logang@xxxxxxxxxxxx> > > Cc: Oscar Salvador <osalvador@xxxxxxx> > > Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> > > Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> > > Cc: Paul Mackerras <paulus@xxxxxxxxx> > > Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > --- > > arch/powerpc/include/asm/sparsemem.h | 3 + > > include/linux/mmzone.h | 48 +++++++++++++++++++- > > mm/memory_hotplug.c | 18 ++++---- > > mm/page_alloc.c | 2 - > > mm/sparse.c | 81 > > +++++++++++++++++----------------- > > 5 files changed, 99 insertions(+), 53 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/sparsemem.h > > b/arch/powerpc/include/asm/sparsemem.h > > index 3192d454a733..1aa3c9303bf8 100644 > > --- a/arch/powerpc/include/asm/sparsemem.h > > +++ b/arch/powerpc/include/asm/sparsemem.h > > @@ -10,6 +10,9 @@ > > */ > > #define SECTION_SIZE_BITS 24 > > > > +/* Reflect the largest possible PMD-size as the subsection-size > > constant */ > > +#define ARCH_SUBSECTION_SHIFT 24 > > + > > I guess this is done because PMD_SHIFT is defined at runtime rather at > compile time, > right? Correct, PowerPC has: #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE) #define PTE_INDEX_SIZE __pte_index_size ...where __pte_index_size is variable established at kernel init time. > > #endif /* CONFIG_SPARSEMEM */ > > > > #ifdef CONFIG_MEMORY_HOTPLUG > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > index 70394cabaf4e..ef8d878079f9 100644 > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -1160,6 +1160,44 @@ static inline unsigned long > > section_nr_to_pfn(unsigned long sec) > > #define SECTION_ALIGN_UP(pfn) (((pfn) + PAGES_PER_SECTION - 1) & > > PAGE_SECTION_MASK) > > #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) > > > > +/* > > + * SUBSECTION_SHIFT must be constant since it is used to declare > > + * subsection_map and related bitmaps without triggering the > > generation > > + * of variable-length arrays. The most natural size for a subsection > > is > > + * a PMD-page. For architectures that do not have a constant PMD-size > > + * ARCH_SUBSECTION_SHIFT can be set to a constant max size, or > > otherwise > > + * fallback to 2MB. > > + */ > > +#if defined(ARCH_SUBSECTION_SHIFT) > > +#define SUBSECTION_SHIFT (ARCH_SUBSECTION_SHIFT) > > +#elif defined(PMD_SHIFT) > > +#define SUBSECTION_SHIFT (PMD_SHIFT) > > +#else > > +/* > > + * Memory hotplug enabled platforms avoid this default because they > > + * either define ARCH_SUBSECTION_SHIFT, or PMD_SHIFT is a constant, > > but > > + * this is kept as a backstop to allow compilation on > > + * !ARCH_ENABLE_MEMORY_HOTPLUG archs. > > + */ > > +#define SUBSECTION_SHIFT 21 > > +#endif > > + > > +#define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) > > +#define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) > > +#define PAGE_SUBSECTION_MASK ((~(PAGES_PER_SUBSECTION-1))) > > + > > +#if SUBSECTION_SHIFT > SECTION_SIZE_BITS > > +#error Subsection size exceeds section size > > +#else > > +#define SUBSECTIONS_PER_SECTION (1UL << (SECTION_SIZE_BITS - > > SUBSECTION_SHIFT)) > > +#endif > > On powerpc, SUBSECTIONS_PER_SECTION will equal 1 (so one big section), > is that to be expected? Yes, it turns out that PowerPC has no real need for subsection support since they were already using small 16MB sections from day one. > Will subsection_map_init handle this right? Yes, should work as subsection_map_index() will always return 0. Which means that 'end' will always be 0: pfns = min(nr_pages, PAGES_PER_SECTION - (pfn & ~PAGE_SECTION_MASK)); end = subsection_map_index(pfn + pfns - 1); ...and then the bitmap manipulation: bitmap_set(ms->usage->subsection_map, idx, end - idx + 1); ...will only ever set bit0.