>-----Original Message----- >From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >Sent: Friday, June 18, 2021 3:47 AM >To: linux-mm@xxxxxxxxx >Cc: linux-kernel@xxxxxxxxxxxxxxx; Dave Hansen ><dave.hansen@xxxxxxxxxxxxxxx>; Du, Fan <fan.du@xxxxxxxxx>; Chatre, >Reinette <reinette.chatre@xxxxxxxxx>; jarkko@xxxxxxxxxx; Williams, Dan J ><dan.j.williams@xxxxxxxxx>; Hansen, Dave <dave.hansen@xxxxxxxxx>; >x86@xxxxxxxxxx; linux-sgx@xxxxxxxxxxxxxxx; luto@xxxxxxxxxx; >peterz@xxxxxxxxxxxxx >Subject: [PATCH] x86/mm: avoid truncating memblocks for SGX memory > > >From: Fan Du <fan.du@xxxxxxxxx> > >tl;dr: > >Several SGX users reported seeing the following message on NUMA systems: > > sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback >to the NUMA node 0. > >This turned out to be the 'memblock' code mistakenly throwing away >SGX memory. > >=== Full Changelog === > >The 'max_pfn' variable represents the highest known RAM address. It can >be used, for instance, to quickly determine for which physical addresses >there is mem_map[] space allocated. The numa_meminfo code makes an >effort to throw out ("trim") all memory blocks which are above 'max_pfn'. > >SGX memory is not considered RAM (it is marked as "Reserved" in the >e820) and is not taken into account by max_pfn. Despite this, SGX >memory areas have NUMA affinity and are enumerated in the ACPI SRAT. >The existing SGX code uses the numa_meminfo mechanism to look up the >NUMA affinity for its memory areas. > >In cases where SGX memory was above max_pfn (usually just the one EPC >section in the last highest NUMA node), the numa_memblock is truncated >at 'max_pfn', which is below the SGX memory. When the SGX code tries to >look up the affinity of this memory, it fails and produces an error message: > > sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback >to the NUMA node 0. > >and assigns the memory to NUMA node 0. > >Instead of silently truncating the memory block at 'max_pfn' and >dropping the SGX memory, add the truncated portion to >'numa_reserved_meminfo'. This allows the SGX code to later determine >the NUMA affinity of its 'Reserved' area. > >Without this patch, numa_meminfo looks like this (from 'crash'): > > blk = { start = 0x0, end = 0x2080000000, nid = 0x0 } > { start = 0x2080000000, end = 0x4000000000, nid = 0x1 } > >numa_reserved_meminfo is empty. > >After the patch, numa_meminfo looks like this: > > blk = { start = 0x0, end = 0x2080000000, nid = 0x0 } > { start = 0x2080000000, end = 0x4000000000, nid = 0x1 } > >and numa_reserved_meminfo has an entry for node 1's SGX memory: > > blk = { start = 0x4000000000, end = 0x4080000000, nid = 0x1 } > > [ daveh: completely rewrote/reworked changelog ] Really what's your PROBLEM?! Neither did I ask you to send my patch, nor do I agree to change it. Who grant you the right to do this ?! It's disgraceful to do this w/o my notice. If you have comments, please DO align with the other two maintainers Jarkko and Dan first, who already reviewed the patch in this format. https://lkml.org/lkml/2021/6/17/1151 >Signed-off-by: Fan Du <fan.du@xxxxxxxxx> >Reported-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> >Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx> >Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> >Reviewed-by: Dave Hansen <dave.hansen@xxxxxxxxx> >Fixes: 5d30f92e7631 ("x86/NUMA: Provide a range-to-target_node lookup >facility") >Cc: x86@xxxxxxxxxx >Cc: linux-sgx@xxxxxxxxxxxxxxx >Cc: Andy Lutomirski <luto@xxxxxxxxxx> >Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >--- > > b/arch/x86/mm/numa.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > >diff -puN arch/x86/mm/numa.c~sgx-srat arch/x86/mm/numa.c >--- a/arch/x86/mm/numa.c~sgx-srat 2021-06-17 11:23:05.116159990 -0700 >+++ b/arch/x86/mm/numa.c 2021-06-17 11:55:46.117155100 -0700 >@@ -254,7 +254,13 @@ int __init numa_cleanup_meminfo(struct n > > /* make sure all non-reserved blocks are inside the limits */ > bi->start = max(bi->start, low); >- bi->end = min(bi->end, high); >+ >+ /* preserve info for non-RAM areas above 'max_pfn': */ >+ if (bi->end > high) { >+ numa_add_memblk_to(bi->nid, high, bi->end, >+ &numa_reserved_meminfo); >+ bi->end = high; >+ } > > /* and there's no empty block */ > if (bi->start >= bi->end) >_