The one thing that currently needs doing from an architecture point of view is associating the GI domain with its nearest memory domain. This allows all the standard NUMA aware code to get a 'reasonable' answer. A clever driver might elect to do load balancing etc if there are multiple host / memory domains nearby, but that's a decision for the driver. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> --- I plan to test on x86 qemu, but if anyone has hardware where this makes sense then that would be even better. arch/arm64/kernel/smp.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 824de7038967..7c419bf92374 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -731,6 +731,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) { int err; unsigned int cpu; + unsigned int node; unsigned int this_cpu; init_cpu_topology(); @@ -769,6 +770,13 @@ void __init smp_prepare_cpus(unsigned int max_cpus) set_cpu_present(cpu, true); numa_store_cpu_info(cpu); } + + /* + * Walk the numa domains and set the node to numa memory reference + * for any that are Generic Initiator Only. + */ + for_each_node_state(node, N_GENERIC_INITIATOR) + set_gi_numa_mem(node, local_memory_node(node)); } void (*__smp_cross_call)(const struct cpumask *, unsigned int); -- 2.18.0