On 3/10/20 11:05 AM, Roman Gushchin wrote: > On Tue, Mar 10, 2020 at 10:27:01AM -0700, Mike Kravetz wrote: >> On 3/9/20 5:25 PM, Roman Gushchin wrote: >>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c >>> index a74262c71484..ceeb06ddfd41 100644 >>> --- a/arch/x86/kernel/setup.c >>> +++ b/arch/x86/kernel/setup.c >>> @@ -16,6 +16,7 @@ >>> #include <linux/pci.h> >>> #include <linux/root_dev.h> >>> #include <linux/sfi.h> >>> +#include <linux/hugetlb.h> >>> #include <linux/tboot.h> >>> #include <linux/usb/xhci-dbgp.h> >>> >>> @@ -1158,6 +1159,8 @@ void __init setup_arch(char **cmdline_p) >>> initmem_init(); >>> dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT); >>> >>> + hugetlb_cma_reserve(); >>> + >> >> I know this is called from arch specific code here to fit in with the timing >> of CMA setup/reservation calls. However, there really is nothing architecture >> specific about this functionality. It would be great IMO if we could make >> this architecture independent. However, I can not think of a straight forward >> way to do this. > > I agree. Unfortunately I have no better idea than having an arch-dependent hook. > >> >>> /* >>> * Reserve memory for crash kernel after SRAT is parsed so that it >>> * won't consume hotpluggable memory. >> <snip> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> <snip> >>> +void __init hugetlb_cma_reserve(void) >>> +{ >>> + unsigned long totalpages = 0; >>> + unsigned long start_pfn, end_pfn; >>> + phys_addr_t size; >>> + int nid, i, res; >>> + >>> + if (!hugetlb_cma_size && !hugetlb_cma_percent) >>> + return; >>> + >>> + if (hugetlb_cma_percent) { >>> + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, >>> + NULL) >>> + totalpages += end_pfn - start_pfn; >>> + >>> + size = PAGE_SIZE * (hugetlb_cma_percent * 100 * totalpages) / >>> + 10000UL; >>> + } else { >>> + size = hugetlb_cma_size; >>> + } >>> + >>> + pr_info("hugetlb_cma: reserve %llu, %llu per node\n", size, >>> + size / nr_online_nodes); >>> + >>> + size /= nr_online_nodes; >>> + >>> + for_each_node_state(nid, N_ONLINE) { >>> + unsigned long min_pfn = 0, max_pfn = 0; >>> + >>> + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { >>> + if (!min_pfn) >>> + min_pfn = start_pfn; >>> + max_pfn = end_pfn; >>> + } >>> + >>> + res = cma_declare_contiguous(PFN_PHYS(min_pfn), size, >>> + PFN_PHYS(max_pfn), (1UL << 30), >> >> The alignment is hard coded for x86 gigantic page size. If this supports >> more architectures or becomes arch independent we will need to determine >> what this alignment should be. Perhaps an arch specific call back to get >> the alignment for gigantic pages. That will require a little thought as >> some arch's support multiple gigantic page sizes. > > Good point! > Should we take the biggest possible size as a reference? > Or the smallest (larger than MAX_ORDER)? As mentioned, it is pretty simple for architectures like x86 that only have one gigantic page size. Just a random thought, but since hugetlb_cma_reserve is called from arch specific code perhaps the arch code could pass in at least alignment as an argument to this function? That way we can somewhat push the issue to the architectures. For example, power supports 16GB gigantic page size but I believe they are allocated via firmware somehow. So, they would not pass 16G as alignment. In this case setup of the CMA area is somewhat architecture dependent. So, perhaps the call from arch specific code is not such a bad idea. With that in mind, we may want some type of coordination between arch specific and independent code. Right now, cmdline_parse_hugetlb_cma is will accept a hugetlb_cma command line option without complaint even if the architecture does not call hugetlb_cma_reserve. Just a nit, but cma_declare_contiguous if going to round up size by alignment. So, the actual reserved size may not match what is printed with, + pr_info("hugetlb_cma: successfully reserved %llu on node %d\n", + size, nid); I found this out by testing code and specifying hugetlb_cma=2M. Messages in log were: kernel: hugetlb_cma: reserve 2097152, 1048576 per node kernel: hugetlb_cma: successfully reserved 1048576 on node 0 kernel: hugetlb_cma: successfully reserved 1048576 on node 1 But, it really reserved 1GB per node. -- Mike Kravetz