Re: [PATCH resend] mm: hugetlb_vmemmap: use bulk allocator in alloc_vmemmap_page_list()

Muchun Song <muchun.song@xxxxxxxxx> · Thu, 7 Sep 2023 14:35:23 +0800

> On Sep 6, 2023, at 22:58, Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
> 
> 
> 
> On 2023/9/6 22:32, Muchun Song wrote:
>>> On Sep 6, 2023, at 17:33, Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> 
>>> On 2023/9/6 11:25, Muchun Song wrote:
>>>>> On Sep 6, 2023, at 11:13, Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 2023/9/6 10:47, Matthew Wilcox wrote:
>>>>>> On Tue, Sep 05, 2023 at 06:35:08PM +0800, Kefeng Wang wrote:
>>>>>>> It is needed 4095 pages(1G) or 7 pages(2M) to be allocated once in
>>>>>>> alloc_vmemmap_page_list(), so let's add a bulk allocator varietas
>>>>>>> alloc_pages_bulk_list_node() and switch alloc_vmemmap_page_list()
>>>>>>> to use it to accelerate page allocation.
>>>>>> Argh, no, please don't do this.
>>>>>> Iterating a linked list is _expensive_.  It is about 10x quicker to
>>>>>> iterate an array than a linked list.  Adding the list_head option
>>>>>> to __alloc_pages_bulk() was a colossal mistake.  Don't perpetuate it.
>>>>>> These pages are going into an array anyway.  Don't put them on a list
>>>>>> first.
>>>>> 
>>>>> struct vmemmap_remap_walk - walk vmemmap page table
>>>>> 
>>>>> * @vmemmap_pages:  the list head of the vmemmap pages that can be freed
>>>>> *                  or is mapped from.
>>>>> 
>>>>> At present, the struct vmemmap_remap_walk use a list for vmemmap page table walk, so do you mean we need change vmemmap_pages from a list to a array firstly and then use array bulk api, even kill list bulk api ?
>>>> It'll be a little complex for hugetlb_vmemmap. Should it be reasonable to
>>>> directly use __alloc_pages_bulk in hugetlb_vmemmap itself?
>>> 
>>> 
>>> We could use alloc_pages_bulk_array_node() here without introduce a new
>>> alloc_pages_bulk_list_node(), only focus on accelerate page allocation
>>> for now.
>>> 
>> No. Using alloc_pages_bulk_array_node() will add more complexity (you need to allocate
>> an array fist) for hugetlb_vmemap and this path that you optimized is only a control
>> path and this optimization is at the millisecond level. So I don't think it is a great
>> value to do this.
> I tried it, yes, a little complex,
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4b9734777f69..5f502e18f950 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -377,26 +377,53 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> return ret;
> }
> 
> +static int vmemmap_bulk_alloc_pages(gfp_t gfp, int nid, unsigned int nr_pages,
> +     struct page **pages)
> +{
> + unsigned int last, allocated = 0;
> +
> + do {
> + last = allocated;
> +
> + allocated = alloc_pages_bulk_array_node(gfp, nid, nr_pages, pages);
> + if (allocated == last)
> + goto err;
> +
> + } while (allocated < nr_pages)
> +
> + return 0;
> +err:
> + for (allocated = 0; allocated < nr_pages; allocated++) {
> + if (pages[allocated])
> + __free_page(pages[allocated]);
> + }
> +
> + return -ENOMEM;
> +}
> +
> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>    struct list_head *list)
> {
> gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
> unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
> int nid = page_to_nid((struct page *)start);
> - struct page *page, *next;
> + struct page **pages;
> + int ret = -ENOMEM;
> +
> + pages = kzalloc(array_size(nr_pages, sizeof(struct page *)), gfp_mask);
> + if (!pages)
> + return ret;
> +
> + ret = vmemmap_bulk_alloc_pages(gfp_mask, nid, nr_pages, pages);
> + if (ret)
> + goto out;
> 
> while (nr_pages--) {
> - page = alloc_pages_node(nid, gfp_mask, 0);
> - if (!page)
> - goto out;
> - list_add_tail(&page->lru, list);
> + list_add_tail(&pages[nr_pages]->lru, list);
> }
> -
> - return 0;
> out:
> - list_for_each_entry_safe(page, next, list, lru)
> - __free_page(page);
> - return -ENOMEM;
> + kfree(pages);
> + return ret;
> }
> 
> or just use __alloc_pages_bulk in it, but as Matthew said, we should
> avoid list usage, list api need to be cleanup and no one should use it,
> or no change, since it is not a hot path :)> Thanks.

Let's keep it no change.

Thanks.