On Wed, Apr 13, 2022 at 9:28 PM Libo Chen <libo.chen@xxxxxxxxxx> wrote: > On 4/13/22 08:41, Randy Dunlap wrote: > > On 4/12/22 23:56, Libo Chen wrote: > >>> --- a/lib/Kconfig > >>> +++ b/lib/Kconfig > >>> @@ -511,7 +511,8 @@ config CHECK_SIGNATURE > >>> bool > >>> config CPUMASK_OFFSTACK > >>> - bool "Force CPU masks off stack" if DEBUG_PER_CPU_MAPS > >>> + bool "Force CPU masks off stack" > >>> + depends on DEBUG_PER_CPU_MAPS > >> This forces every arch to enable DEBUG_PER_CPU_MAPS if they want to enable CPUMASK_OFFSTACK, it's even stronger than "if". My whole argument is CPUMASK_OFFSTACK should be enable/disabled independent of DEBUG_PER_CPU_MASK > >>> help > >>> Use dynamic allocation for cpumask_var_t, instead of putting > >>> them on the stack. This is a bit more expensive, but avoids > >>> > >>> > >>> As I said earlier, the "if" on the "bool" line just controls the prompt message. > >>> This patch make CPUMASK_OFFSTACK require DEBUG_PER_CPU_MAPS -- which might be overkill. > >>> > >> Okay I understand now "if" on the "boot" is not a dependency and it only controls the prompt message, then the question is why we cannot enable CPUMASK_OFFSTACK without DEBUG_PER_CPU_MAPS if it only controls prompt message? Is it not the behavior we expect? > > Yes, it is. I don't know that the problem is... > Masahiro explained that CPUMASK_OFFSTACK can only be configured by > options not users if DEBUG_PER_CPU_MASK is not enabled. This doesn't > seem to be what we want. I think the correct way to do it is to follow x86 and powerpc, and tying CPUMASK_OFFSTACK to "large" values of CONFIG_NR_CPUS. For smaller values of NR_CPUS, the onstack masks are obviously cheaper, we just need to decide what the cut-off point is. In x86, the onstack masks can be selected for normal SMP builds with up to 512 CPUs, while CONFIG_MAXSMP=y raises the limit to 8192 CPUs while selecting CPUMASK_OFFSTACK. PowerPC does it the other way round, selecting CPUMASK_OFFSTACK implicitly whenever NR_CPUS is set to 8192 or more. I think we can easily do the same as powerpc on arm64. With the ApacheBench test you cite in the patch description, what is the value of NR_CPUS at which you start seeing a noticeable benefit for offstack masks? Can you do the same test for NR_CPUS=1024 or 2048? Arnd