Re: [External] Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 21, 2020 at 5:11 PM Oscar Salvador <osalvador@xxxxxxx> wrote:
>
> On Thu, Dec 17, 2020 at 08:12:55PM +0800, Muchun Song wrote:
> > +static inline void free_bootmem_page(struct page *page)
> > +{
> > +     unsigned long magic = (unsigned long)page->freelist;
> > +
> > +     /*
> > +      * The reserve_bootmem_region sets the reserved flag on bootmem
> > +      * pages.
> > +      */
> > +     VM_WARN_ON(page_ref_count(page) != 2);
> > +
> > +     if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> > +             put_page_bootmem(page);
> > +     else
> > +             VM_WARN_ON(1);
>
> Ideally, I think we want to see what how the page looks since its state
> is not what we expected, so maybe join both conditions and use dump_page().

Agree. Will do. Thanks.

>
> > + * By removing redundant page structs for HugeTLB pages, memory can returned to
>                                                                      ^^ be

Thanks.

> > + * the buddy allocator for other uses.
>
> [...]
>
> > +void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> > +{
> > +     unsigned long vmemmap_addr = (unsigned long)head;
> > +
> > +     if (!free_vmemmap_pages_per_hpage(h))
> > +             return;
> > +
> > +     vmemmap_remap_free(vmemmap_addr + RESERVE_VMEMMAP_SIZE,
> > +                        free_vmemmap_pages_size_per_hpage(h));
>
> I am not sure what others think, but I would like to see vmemmap_remap_free taking
> three arguments: start, end, and reuse addr, e.g:
>
>  void free_huge_page_vmemmap(struct hstate *h, struct page *head)
>  {
>       unsigned long vmemmap_addr = (unsigned long)head;
>       unsigned long vmemmap_end, vmemmap_reuse;
>
>       if (!free_vmemmap_pages_per_hpage(h))
>               return;
>
>       vmemmap_addr += RESERVE_MEMMAP_SIZE;
>       vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h);
>       vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
>
>       vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
>  }
>
> The reason for me to do this is to let the callers of vmemmap_remap_free decide
> __what__ they want to remap.
>
> More on this below.
>
>
> > +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> > +                           unsigned long end,
> > +                           struct vmemmap_remap_walk *walk)
> > +{
> > +     pte_t *pte;
> > +
> > +     pte = pte_offset_kernel(pmd, addr);
> > +
> > +     if (walk->reuse_addr == addr) {
> > +             BUG_ON(pte_none(*pte));
> > +             walk->reuse_page = pte_page(*pte++);
> > +             addr += PAGE_SIZE;
> > +     }
>
> Although it is quite obvious, a brief comment here pointing out what are we
> doing and that this is meant to be set only once would be nice.

OK. Will do.

>
>
> > +static void vmemmap_remap_range(unsigned long start, unsigned long end,
> > +                             struct vmemmap_remap_walk *walk)
> > +{
> > +     unsigned long addr = start - PAGE_SIZE;
> > +     unsigned long next;
> > +     pgd_t *pgd;
> > +
> > +     VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
> > +     VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
> > +
> > +     walk->reuse_page = NULL;
> > +     walk->reuse_addr = addr;
>
> With the change I suggested above, struct vmemmap_remap_walk should be
> initialitzed at once in vmemmap_remap_free, so this should not longer be needed.

You are right.

> (And btw, you do not need to set reuse_page to NULL, the way you init the struct
> in vmemmap_remap_free makes sure to null any field you do not explicitly set).
>
>
> > +static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
> > +                           struct vmemmap_remap_walk *walk)
> > +{
> > +     /*
> > +      * Make the tail pages are mapped with read-only to catch
> > +      * illegal write operation to the tail pages.
>         "Remap the tail pages as read-only to ..."

Thanks.

>
> > +      */
> > +     pgprot_t pgprot = PAGE_KERNEL_RO;
> > +     pte_t entry = mk_pte(walk->reuse_page, pgprot);
> > +     struct page *page;
> > +
> > +     page = pte_page(*pte);
>
>  struct page *page = pte_page(*pte);
>
> since you did the same for the other two.

Yeah. Will change to this.

>
> > +     list_add(&page->lru, walk->vmemmap_pages);
> > +
> > +     set_pte_at(&init_mm, addr, pte, entry);
> > +}
> > +
> > +/**
> > + * vmemmap_remap_free - remap the vmemmap virtual address range
> > + *                      [start, start + size) to the page which
> > + *                      [start - PAGE_SIZE, start) is mapped,
> > + *                      then free vmemmap pages.
> > + * @start:   start address of the vmemmap virtual address range
> > + * @size:    size of the vmemmap virtual address range
> > + */
> > +void vmemmap_remap_free(unsigned long start, unsigned long size)
> > +{
> > +     unsigned long end = start + size;
> > +     LIST_HEAD(vmemmap_pages);
> > +
> > +     struct vmemmap_remap_walk walk = {
> > +             .remap_pte      = vmemmap_remap_pte,
> > +             .vmemmap_pages  = &vmemmap_pages,
> > +     };
>
> As stated above, this would become:
>
>  void vmemmap_remap_free(unsigned long start, unsigned long end,
>                          usigned long reuse)
>  {
>        LIST_HEAD(vmemmap_pages);
>        struct vmemmap_remap_walk walk = {
>                .reuse_addr = reuse,
>                .remap_pte = vmemmap_remap_pte,
>                .vmemmap_pages = &vmemmap_pages,
>        };
>
>   You might have had your reasons to do this way, but this looks more natural
>   to me, with the plus that callers of vmemmap_remap_free can specify
>   what they want to remap.

Should we add a BUG_ON in vmemmap_remap_free() for now?

        BUG_ON(reuse != start + PAGE_SIZE);

>
>
> --
> Oscar Salvador
> SUSE L3



-- 
Yours,
Muchun




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux