Re: [PATCH v2] mm/hugetlb_vmemmap: remap head page to newly allocated page

Joao Martins <joao.m.martins@xxxxxxxxxx> · Wed, 9 Nov 2022 13:15:36 +0000

On 09/11/2022 02:42, Muchun Song wrote:
>> On Nov 8, 2022, at 18:38, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
>> On 08/11/2022 09:13, Muchun Song wrote:
>>>> On Nov 7, 2022, at 23:39, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote:
>>>> + int nid = page_to_nid((struct page *)start);
>>>> + struct page *page = NULL;
>>>> +
>>>> + /*
>>>> + * Allocate a new head vmemmap page to avoid breaking a contiguous
>>>> + * block of struct page memory when freeing it back to page allocator
>>>> + * in free_vmemmap_page_list(). This will allow the likely contiguous
>>>> + * struct page backing memory to be kept contiguous and allowing for
>>>> + * more allocations of hugepages. Fallback to the currently
>>>> + * mapped head page in case should it fail to allocate.
>>>> + */
>>>> + if (IS_ALIGNED((unsigned long)start, PAGE_SIZE))
>>>
>>> I'm curious why we need this check. IIUC, this is unnecessary.
>>>
>>
>> So if the start of the vmemmap range (the head page) we will remap isn't the
>> first struct page, then we would corrupt the other struct pages in
>> that vmemmap page unrelated to hugetlb? That was my thinking
> 
> Actually, @start address should be always aligned with PAGE_SIZE. If not,
> vmemmap_remap_range() will complain. So the check can be removed.
>

True, but it will be a VM_BUG_ON().

I can remove this check here, but I would suggest that
rather than crashing that we detect the error in the caller
(vmemmap_remap_free()) prior to proceeding.

>>
>>>> + page = alloc_pages_node(nid, gfp_mask, 0);
>>>> + walk.head_page = page;
>>>>
>>>> /*
>>>> * In order to make remapping routine most efficient for the huge pages,
>>>> -- 
>>>> 2.17.2
>>>>
>>>
>>> I have implemented a version based on yours, which does not introduce
>>> ->head_page field (Not test if it works). Seems to be simple.
>>>
>>
>> Let me try out with the adjustment below
>>
>>> Thanks.
>>>
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index c98805d5b815..8ee94f6a6697 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -202,12 +202,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end,
>>>                        return ret;
>>>        } while (pgd++, addr = next, addr != end);
>>>
>>> -       /*
>>> -        * We only change the mapping of the vmemmap virtual address range
>>> -        * [@start + PAGE_SIZE, end), so we only need to flush the TLB which
>>> -        * belongs to the range.
>>> -        */
>>> -       flush_tlb_kernel_range(start + PAGE_SIZE, end);
>>> +       flush_tlb_kernel_range(start, end);
>>>
>>>        return 0;
>>> }
>>> @@ -246,6 +241,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
>>>        pte_t entry = mk_pte(walk->reuse_page, pgprot);
>>>        struct page *page = pte_page(*pte);
>>>
>>> +       /* The case of remapping the head vmemmap page. */

I changed this comment to:

/* Remapping the head page requires r/w */

>>> +       if (unlikely(addr == walk->reuse_addr)) {
>>
>> You replace the head_page with checking the reuse_addr , but that is
>> set on the tail page. So if we want to rely on reuse_addr perhaps
>> best if do:
>>
>> if (unlikely(addr == (walk->reuse_addr - PAGE_SIZE))) {
>> ...
>> }
> 
> I don't think so. The @addr here should be equal to @walk->reuse_addr
> when vmemmap_remap_pte() is fist called since @addr does not be updated
> from vmemmap_pte_range(). Right?
> 

Ah, yes -- I misread it. I'm confusing with the 2 vmemmap pages reuse
rather the 1 (which is the latest, yes). And thus the reuse_addr is
pointed at the head page.

Ok, this should be much cleaner with these adjustments you proposed, and
works just as good as far as I tested

I'll respin v3.