Re: [PATCH] init: refactor the generic cpu_to_node for NUMA

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Thu, 18 Jan 2024 10:27:48 +0100

On Thu, Jan 18, 2024 at 11:14:12AM +0800, Huang Shijie wrote:
> (0) We list the ARCHs which support the NUMA:
>        arm64, loongarch, powerpc, riscv,
>        sparc, mips, s390, x86,

I do not understand this format, what are you saying here?

Have you read the kernel documentation for how to write changelog texts?
It doesn't say "list a bunch of things", it's a bit more descriptive.

> 
> (1) Some ARCHs in (0) override the generic cpu_to_node(), such as:
>        sparc, mips, s390, x86.
> 
>     Since these ARCHs have their own cpu_to_node(), we do not care
>     about them.
> 
> (2) The ARCHs enable NUMA and use the generic cpu_to_node.
>     From (0) and (1), we can know that four ARCHs support NUMA and
>     use the generic cpu_to_node:
>         arm64, loongarch, powerpc, riscv,
> 
>     The generic cpu_to_node depends on percpu "numa_node".
> 
>     (2.1) The loongarch sets "numa_node" in:
>           start_kernel --> smp_prepare_boot_cpu()
> 
>     (2.2) The arm64, powerpc, riscv set "numa_node" in:
>        	  start_kernel --> arch_call_rest_init() --> rest_init()
>        	               --> kernel_init() --> kernel_init_freeable()
>                        --> smp_prepare_cpus()
> 
>     (2.3) The first place calling the cpu_to_node() is early_trace_init():
>           start_kernel --> early_trace_init()--> __ring_buffer_alloc()
> 	               --> rb_allocate_cpu_buffer()
> 
>     (2.4) So it safe for loongarch. But for arm64, powerpc and riscv,
>           there are at least four places in the common code where
> 	  the cpu_to_node() is called before it is initialized:
> 	   a.) early_trace_init()         in kernel/trace/trace.c
> 	   b.) sched_init()               in kernel/sched/core.c
> 	   c.) init_sched_fair_class()    in kernel/sched/fair.c
> 	   d.) workqueue_init_early()     in kernel/workqueue.c
> 
> (3) In order to fix the issue, the patch refactors the generic cpu_to_node:
>     (3.1) change cpu_to_node to function pointer,
>           and export it for kernel modules.
> 
>     (3.2) introduce _cpu_to_node() which is the original cpu_to_node().
> 
>     (3.3) introduce smp_prepare_boot_cpu_start() to wrap the original
>           smp_prepare_boot_cpu(), and set cpu_to_node with
> 	  early_cpu_to_node which works fine for arm64, powerpc,
> 	  riscv and loongarch.
> 
>     (3.4) introduce smp_prepare_cpus_done() to wrap the original
>           smp_prepare_cpus().
> 	  The "numa_node" is ready after smp_prepare_cpus(),
> 	  then set cpu_to_node with _cpu_to_node().

When you start listing different things in a changelog, that's a hint to
the reviewer to say "please break this up" as patches need to do only
one thing at a time.  As I can't follow the above text at all, that's
all the review comments I'm able to give here, sorry.

But as-is, this isn't acceptable :(

thanks,

greg k-h