Hi Michal, On 07/12/18 at 02:32pm, Michal Hocko wrote: > On Thu 12-07-18 14:01:15, Chao Fan wrote: > > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote: > > >Hi Baoquan, > > > > > >At 07/11/2018 08:40 PM, Baoquan He wrote: > > >> Please try this v3 patch: > > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001 > > >> From: Baoquan He <bhe@xxxxxxxxxx> > > >> Date: Wed, 11 Jul 2018 20:31:51 +0800 > > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text > > >> > > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting > > >> PFN movable zone begins in each node, kernel text position is not > > >> considered. KASLR may put kernel after which movable zone begins. > > >> > > >> Fix it by finding movable zone after kernel text on that node. > > >> > > >> Signed-off-by: Baoquan He <bhe@xxxxxxxxxx> > > > > > > > > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or > > >'movablecore=' failed if the KASLR puts the kernel back the tail of the > > >last node, or more. > > > > I think it may not fail. > > There is a 'restart' to do another pass. > > > > > > > >Due to we have fix the mirror memory in KASLR side, and Chao is trying > > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix > > >this in the KASLR side. > > > > > > > I think it's better to fix here, but not KASLR side. > > Cause much more code will be change if doing it in KASLR side. > > Since we didn't parse 'kernelcore' in compressed code, and you can see > > the distribution of ZONE_MOVABLE need so much code, so we do not need > > to do so much job in KASLR side. But here, several lines will be OK. > > I am not able to find the beginning of the email thread right now. Could > you summarize what is the actual problem please? The bug is found on x86 now. When added "kernelcore=" or "movablecore=" into kernel command line, kernel memory is spread evenly among nodes. However, this is right when KASLR is not enabled, then kernel will be at 16M of place in x86 arch. If KASLR enabled, it could be put any place from 16M to 64T randomly. Consider a scenario, we have 10 nodes, and each node has 20G memory, and we specify "kernelcore=50%", means each node will take 10G for kernelcore, 10G for movable area. But this doesn't take kernel position into consideration. E.g if kernel is put at 15G of 2nd node, namely node1. Then we think on node1 there's 10G for kernelcore, 10G for movable, in fact there's only 5G available for movable, just after kernel. I made a v4 patch which possibly can fix it. >From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001 From: Baoquan He <bhe@xxxxxxxxxx> Date: Fri, 13 Jul 2018 07:49:29 +0800 Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text In find_zone_movable_pfns_for_nodes(), when try to find the starting PFN movable zone begins at in each node, kernel text position is not considered. KASLR may put kernel after which movable zone begins. Fix it by finding movable zone after kernel text on that node. Signed-off-by: Baoquan He <bhe@xxxxxxxxxx> --- mm/page_alloc.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..5bc1a47dafda 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void) static void __init find_zone_movable_pfns_for_nodes(void) { int i, nid; - unsigned long usable_startpfn; + unsigned long usable_startpfn, kernel_endpfn, arch_startpfn; unsigned long kernelcore_node, kernelcore_remaining; /* save the state before borrow the nodemask */ nodemask_t saved_node_state = node_states[N_MEMORY]; @@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void) if (!required_kernelcore || required_kernelcore >= totalpages) goto out; + kernel_endpfn = PFN_UP(__pa_symbol(_end)); /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; + arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: /* Spread kernelcore memory as evenly as possible throughout nodes */ @@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void) unsigned long start_pfn, end_pfn; /* + * KASLR may put kernel near tail of node memory, + * start after kernel on that node to find PFN + * at which zone begins. + */ + if (pfn_to_nid(kernel_endpfn) == nid) + usable_startpfn = max(arch_startpfn, kernel_endpfn); + else + usable_startpfn = arch_startpfn; + + /* * Recalculate kernelcore_node if the division per node * now exceeds what is necessary to satisfy the requested * amount of memory for the kernel -- 2.13.6