On Fri, Jun 17, 2022 at 11:25:20AM +0200, David Hildenbrand wrote: > On 17.06.22 11:10, Muchun Song wrote: > > On Fri, Jun 17, 2022 at 09:39:27AM +0200, David Hildenbrand wrote: > >> On 17.06.22 09:28, Muchun Song wrote: > >>> On Fri, Jun 17, 2022 at 07:46:53AM +0200, Oscar Salvador wrote: > >>>> On Thu, Jun 16, 2022 at 09:30:33AM +0200, David Hildenbrand wrote: > >>>>> IIRC, that was used to skip these patches on the offlining path before > >>>>> we provided the ranges to offline_pages(). > >>>> > >>>> Yeah, it was designed for that purpose back then. > >>>> > >>>>> I'd not mess with PG_reserved, and give them a clearer name, to not > >>>>> confuse them with other, ordinary, vmemmap pages that are not > >>>>> self-hosted (maybe in the future we might want to flag all vmemmap pages > >>>>> with a new type?). > >>>> > >>>> Not sure whether a new type is really needed, or to put it another way, I > >>>> cannot see the benefit. > >>>> > >>>>> > >>>>> I'd just try reusing the flag PG_owner_priv_1. And eventually, flag all > >>>>> (v)memmap pages with a type PG_memmap. However, the latter would be > >>>>> optional and might not be strictly required > >>>>> > >>>>> > >>>>> So what think could make sense is > >>>>> > >>>>> /* vmemmap pages that are self-hosted and cannot be optimized/freed. */ > >>>>> PG_vmemmap_self_hosted = PG_owner_priv_1, > >>>> > >>>> Sure, I just lightly tested the below, and seems to work, but not sure > >>>> whether that is what you are referring to. > >>>> @Munchun: thoughts? > >>>> > >>> > >>> I think it works and fits my requirement. > >>> > >>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > >>>> index e66f7aa3191d..a4556afd7bda 100644 > >>>> --- a/include/linux/page-flags.h > >>>> +++ b/include/linux/page-flags.h > >>>> @@ -193,6 +193,11 @@ enum pageflags { > >>>> > >>>> /* Only valid for buddy pages. Used to track pages that are reported */ > >>>> PG_reported = PG_uptodate, > >>>> + > >>>> +#ifdef CONFIG_MEMORY_HOTPLUG > >>>> + /* For self-hosted memmap pages */ > >>>> + PG_vmemmap_self_hosted = PG_owner_priv_1, > >>>> +#endif > >>>> }; > >>>> > >>>> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) > >>>> @@ -628,6 +633,10 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison) > >>>> */ > >>>> __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) > >>>> > >>>> +#ifdef CONFIG_MEMORY_HOTPLUG > >>>> +PAGEFLAG(Vmemmap_self_hosted, vmemmap_self_hosted, PF_ANY) > >>>> +#endif > >>>> + > >>>> /* > >>>> * On an anonymous page mapped into a user virtual memory area, > >>>> * page->mapping points to its anon_vma, not to a struct address_space; > >>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > >>>> index 1089ea8a9c98..e2de7ed27e9e 100644 > >>>> --- a/mm/hugetlb_vmemmap.c > >>>> +++ b/mm/hugetlb_vmemmap.c > >>>> @@ -101,6 +101,14 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head) > >>>> { > >>>> unsigned long vmemmap_addr = (unsigned long)head; > >>>> unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; > >>>> + struct mem_section *ms = __pfn_to_section(page_to_pfn(head)); > >>>> + struct page *memmap; > >>>> + > >>>> + memmap = sparse_decode_mem_map(ms->section_mem_map, > >>>> + pfn_to_section_nr(page_to_pfn(head))); > >>>> + > >>>> + if (PageVmemmap_self_hosted(memmap)) > >>>> + return; > >>> > >>> I think here needs a loop if it is a 1GB page (spans multiple sections). > >>> Right? Here is an implementation based on another approach. But I think > >>> your implementation is more simpler and efficient. Would you mind me > >>> squash your diff into my patch and with your "Co-developed-by"? > >> > >> Due to hugtlb alignment requirements, and the vmemmap pages being at the > >> start of the hotplugged memory region, I think that cannot currently > >> happen. Checking the first vmemmap page might be good enough for now, > >> and probably for the future. > >> > > > > If the memory block size is 128MB, then a 1GB huge page spans 8 blocks. > > Is it possible that some blocks of them are vmemmap-hosted? > > No, don't think so. If you think about it, a huge/gigantic page can only > start in a memmap-on-memory region but never end in on (or overlap one) > -- because the reserved memmap part of the memory block always precedes > actually usable data. > > So even with variable-size memory blocks and weird address alignment, > checking the first memmap of a huge page for vmemmp-on-memory should be > sufficient. > > Unless I am missing something. > Got it. You are awesome. I ignored the fact that we have reserved some memory as vmemmap pages in memmap-on-memory case, the whole memory block cannot be used as a gigantic page. Thanks for your nice explanation.