Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

Anshuman Khandual <anshuman.khandual@xxxxxxx> · Mon, 20 Jul 2020 11:44:04 +0530

On 07/17/2020 11:07 PM, Mike Kravetz wrote:
> On 7/17/20 2:51 AM, Anshuman Khandual wrote:
>>
>>
>> On 07/17/2020 02:06 PM, Will Deacon wrote:
>>> On Fri, Jul 17, 2020 at 10:32:53AM +0530, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 07/16/2020 11:55 PM, Mike Kravetz wrote:
>>>>> >From 17c8f37afbf42fe7412e6eebb3619c6e0b7e1c3c Mon Sep 17 00:00:00 2001
>>>>> From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>>>>> Date: Tue, 14 Jul 2020 15:54:46 -0700
>>>>> Subject: [PATCH] hugetlb: move cma reservation to code setting up gigantic
>>>>>  hstate
>>>>>
>>>>> Instead of calling hugetlb_cma_reserve() directly from arch specific
>>>>> code, call from hugetlb_add_hstate when adding a gigantic hstate.
>>>>> hugetlb_add_hstate is either called from arch specific huge page setup,
>>>>> or as the result of hugetlb command line processing.  In either case,
>>>>> this is late enough in the init process that all numa memory information
>>>>> should be initialized.  And, it is early enough to still use early
>>>>> memory allocator.
>>>>
>>>> This assumes that hugetlb_add_hstate() is called from the arch code at
>>>> the right point in time for the generic HugeTLB to do the required CMA
>>>> reservation which is not ideal. I guess it must have been a reason why
>>>> CMA reservation should always called by the platform code which knows
>>>> the boot sequence timing better.
>>>
>>> Ha, except we've moved it around two or three times already in the last
>>> month or so, so I'd say we don't have a clue when to call it in the arch
>>> code.
>>
>> The arch dependency is not going way with this change either. Just that
>> its getting transferred to hugetlb_add_hstate() which gets called from
>> arch_initcall() in every architecture.
>>
>> The perfect timing here happens to be because of arch_initcall() instead.
>> This is probably fine, as long as
>>
>> 0. hugetlb_add_hstate() is always called at arch_initcall()
> 
> In another reply, I give reasoning why it would be safe to call even later
> at hugetlb command line processing time.

Understood, but there is a time window in which CMA reservation is available
irrespective of whether it is called from arch or generic code. Finding this
right time window and ensuring that N_MEMORY nodemask is initialized, easier
done in the platform code.

> 
>> 1. N_MEMORY mask is guaranteed to be initialized at arch_initcall()
> 
> This is a bit more difficult to guarantee.  I find the init sequence hard to
> understand.  Looking at the arm code, arch_initcall(hugetlbpage_init)
> happens after N_MEMORY mask is setup.  I can't imagine any arch code setting
> up huge pages before N_MEMORY.  But, I suppose it is possible and we would
> need to somehow guarantee this.

Ensuring that N_MEMORY nodemask is initialized from the generic code is even
more difficult.

> 
>> 2. CMA reservation is available to be called at arch_initcall()
> 
> Since I am pretty sure we can delay the reservation until hugetlb command
> line processing time, it would be great if it was always done there.

But moving hugetlb CMA reservation completely during command line processing
has got another concern of mixing with existing memblock based pre-allocation.

> Unfortunately, I can not immediately think of an easy way to do this.
> 

It is rationale to move CMA reservation stuff into generic HugeTLB but there
are some challenges which needs to be solved comprehensively. The patch here
from Barry does solve a short term problem (N_ONLINE ---> N_MEMORY) for now
which IMHO should be considered. Moving CMA reservation into generic HugeTLB
would require some more thoughts and can be attempted later.