On 7/19/20 11:22 PM, Anshuman Khandual wrote: > > > On 07/17/2020 10:32 PM, Mike Kravetz wrote: >> On 7/16/20 10:02 PM, Anshuman Khandual wrote: >>> >>> >>> On 07/16/2020 11:55 PM, Mike Kravetz wrote: >>>> >From 17c8f37afbf42fe7412e6eebb3619c6e0b7e1c3c Mon Sep 17 00:00:00 2001 >>>> From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >>>> Date: Tue, 14 Jul 2020 15:54:46 -0700 >>>> Subject: [PATCH] hugetlb: move cma reservation to code setting up gigantic >>>> hstate >>>> >>>> Instead of calling hugetlb_cma_reserve() directly from arch specific >>>> code, call from hugetlb_add_hstate when adding a gigantic hstate. >>>> hugetlb_add_hstate is either called from arch specific huge page setup, >>>> or as the result of hugetlb command line processing. In either case, >>>> this is late enough in the init process that all numa memory information >>>> should be initialized. And, it is early enough to still use early >>>> memory allocator. >>> >>> This assumes that hugetlb_add_hstate() is called from the arch code at >>> the right point in time for the generic HugeTLB to do the required CMA >>> reservation which is not ideal. I guess it must have been a reason why >>> CMA reservation should always called by the platform code which knows >>> the boot sequence timing better. >> >> Actually, the code does not make the assumption that hugetlb_add_hstate >> is called from arch specific huge page setup. It can even be called later >> at the time of hugetlb command line processing. > > Yes, now that hugetlb_cma_reserve() has been moved into hugetlb_add_hstate(). > But then there is an explicit warning while trying to mix both the command > line options i.e hugepagesz= and hugetlb_cma=. The proposed code here have > not changed that behavior and hence the following warning should have been > triggered here as well. > > 1) hugepagesz_setup() > hugetlb_add_hstate() > hugetlb_cma_reserve() > > 2) hugepages_setup() > hugetlb_hstate_alloc_pages() when order >= MAX_ORDER > > if (hstate_is_gigantic(h)) { > if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) { > pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n"); > break; > } > if (!alloc_bootmem_huge_page(h)) > break; > } > > Nonetheless, it does not make sense to mix both memblock and CMA based huge > page pre-allocations. But looking at this again, could this warning be ever > triggered till now ? Unless, a given platform calls hugetlb_cma_reserve() > before _setup("hugepages=", hugepages_setup). Anyways, there seems to be > good reasons to keep both memblock and CMA based pre-allocations in place. > But mixing them together (as done in the proposed code here) does not seem > to be right. I'm not sure if I follow the question. This proposal does not change the trigger for the warning printed when one tries to both reserve CMA and pre-allocate gigantic pages. If hugetlb_cma is specified on the command line, and someone tries to pre-allocate gigantic pages they will get the warning. Such a command line on x86 might look like, hugetlb_cma=4G hugepagesz=1G hugepages=4 You will then see, [ 0.065864] HugeTLB: hugetlb_cma is enabled, skip boot time allocation [ 0.065866] HugeTLB: allocating 4 of page size 1.00 GiB failed. Only allocated 0 hugepages. Ideally we could/should eliminate the second message. This behavior exists in the current code. >> My 'reasoning' is that gigantic pages can currently be preallocated from >> bootmem/memblock_alloc at the time of command line processing. Therefore, >> we should be able to reserve bootmem for CMA at the same time. Is there >> something wrong with this reasoning? I tested this on x86 by removing the >> call to hugetlb_add_hstate from arch specific code and instead forced the >> call at command line processing time. The ability to reserve CMA was the >> same. > > There is no problem with that reasoning. __setup() triggered function should > be able perform CMA reservation. But as pointed out before, it does not make > sense to mix both CMA reservation and memblock based pre-allocation. Agree. I am not proposing we do. Sorry, if you got that impression. >> Yes, the CMA reservation interface says it should be called from arch >> specific code. However, if we currently depend on the ability to do >> memblock_alloc at hugetlb command line processing time for gigantic page >> preallocation, then I think we can do the CMA reservation here as well. > > IIUC, CMA reservation and memblock alloc have some differences in terms of > how the memory can be used later on, will have to dig deeper on this. But > the comment section near cma_declare_contiguous_nid() is a concern. > > * This function reserves memory from early allocator. It should be > * called by arch specific code once the early allocator (memblock or bootmem) > * has been activated and all other subsystems have already allocated/reserved > * memory. This function allows to create custom reserved areas. > Yes, that is the comment I was looking at as well. However, note that hugetlb pre-allocation of gigantic pages will end up calling memblock_alloc_range_nid. This is the same routine used for CMA reservations/allocations from cma_declare_contiguous_nid. This is why there should be no issue with doing CMA reservations at this time. This may be the confusing part. I am not saying we would do CMA reservations and pre-allocations together. Rather, they both rely on the underlying code so we can call them at the same time in the init process. >> Thinking about it some more, I suppose there could be some arch code that >> could call hugetlb_add_hstate too early in the boot process. But, I do >> not think we have an issue with calling it too late. >> > > Calling it too late might have got the page allocator initialized completely > and then CMA reservation would not be possible afterwards. Also calling it > too early would prevent other subsystems which might need memory reservation > in specific physical ranges. I thought about it some more and came up with a way to do all this at command line processing time. It will take me a day or two to put together. The patch from Barry which started this thread is indeed needed and is in Andrew's tree. I'll start another thread with a patch to move CMA reservations to command line processing. -- Mike Kravetz