On Sun, Aug 28, 2022 at 10:35:38AM +0200, Sander Vanheule wrote: ... > > It's really a puzzle, and some of my thoughts are below. So. > > > > This is a question what for we need nr_cpumask_bits while we already > > have nr_cpu_ids. When OFFSTACK is ON, they are obviously the same. > > When it's of - the nr_cpumask_bits is an alias to NR_CPUS. > > > > I tried to wire the nr_cpumask_bits to nr_cpu_ids unconditionally, and > > it works even when OFFSTACK is OFF, no surprises. > > > > I didn't find any discussions describing what for we need nr_cpumask_bits, > > and the code adding it dates to a very long ago. > > > > If I alias nr_cpumask_bits to nr_cpu_ids unconditionally on my VM with > > NR_CPUs == 256 and nr_cpu_ids == 4, there's obviously a clear win in > > performance, but the Image size gets 2.5K bigger. Probably that's the > > reason for what nr_cpumask_bits was needed... > > I think it makes sense to have a compile-time-constant value for nr_cpumask_bits > in some cases. For example on embedded platforms, where every opportunity to > save a few kB should be used, or cases where NR_CPUS <= BITS_PER_LONG. > > > > > There's also a very old misleading comment in cpumask.h: > > > > * If HOTPLUG is enabled, then cpu_possible_mask is forced to have > > * all NR_CPUS bits set, otherwise it is just the set of CPUs that > > * ACPI reports present at boot. > > > > It lies, and I checked with x86_64 that cpu_possible_mask is populated > > during boot time with 0b1111, if I create a 4-cpu VM. Hence, the > > nr_cpu_ids is 4, while NR_CPUS == 256. > > > > Interestingly, there's no a single user of the cpumask_full(), > > obviously, because it's broken. This is really a broken dead code. > > > > Now that we have a test that checks sanity of cpumasks, this mess > > popped up. > > > > Your fix doesn't look correct, because it fixes one function, and > > doesn't touch others. For example, the cpumask subset() may fail > > if src1p will have set bits after nr_cpu_ids, while cpumask_full() > > will be returning true. > > It appears the documentation for cpumask_full() is also incorrect, because it > claims to check if all CPUs < nr_cpu_ids are set. Meanwhile, the implementation > checks if all CPUs < nr_cpumask_bits are set. > > cpumask_weight() has a similar issue, and maybe also other cpumask_*() functions > (I didn't check in detail yet). > > > > > In -next, there is an update from Sander for the cpumask test that > > removes this check, and probably if you rebase on top of -next, you > > can drop this and 2nd patch of your series. > > > > What about proper fix? I think that a long time ago we didn't have > > ACPI tables for possible cpus, and didn't populate cpumask_possible > > from that, so the > > > > #define nr_cpumask_bits NR_CPUS > > > > worked well. Now that we have cpumask_possible partially filled, > > we have to always > > > > #define nr_cpumask_bits nr_cpu_ids > > > > and pay +2.5K price in size even if OFFSTACK is OFF. At least, it wins > > at runtime... > > > > Any thoughts? > > It looks like both nr_cpumask_bits and nr_cpu_ids are used in a number of places > outside of lib/cpumask.c. Documentation for cpumask_*() functions almost always > refers to nr_cpu_ids as a highest valid value. > > Perhaps nr_cpumask_bits should become an variable for internal cpumask usage, > and external users should only use nr_cpu_ids? The changes in 6.0 are my first > real interaction with cpumask, so it's possible that there are things I'm > missing here. > > That being said, some of the cpumask tests compare results to nr_cpumask_bits, > so those should then probably be fixed to compare against nr_cpu_ids instead. Aha, and it kills me how we have such a mess in a very core subsystem. We have 3 problems here: - mess with nr_cpumask_bits and nr_cpu_ids; - ineffectiveness of cpumask routines when nr_cpumask_bits > nr_cpu_ids; - runtime nature of nr_cpu_ids, even for those embedded systems with taught memory constraints. So that if we just drop nr_cpumask_bits, it will add 2.5K to the Image. I think that dropping nr_cpumask_bits is our only choice, and to avoid Image bloating for embedded users, we can hint the kernel that NR_CPUS is an exact number, so that it will skip setting it in runtime. I added a EXACT_NR_CPUS option for this, which works like this: #if (NR_CPUS == 1) || defined(CONFIG_EXACT_NR_CPUS) #define nr_cpu_ids ((unsigned int)NR_CPUS) #else extern unsigned int nr_cpu_ids; #endif /* Deprecated */ #define nr_cpumask_bits nr_cpu_ids I tried it with arm64 4-CPU build. When the EXACT_NR_CPUS is enabled, the difference is: add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300) Total: Before=25670945, After=25624645, chg -0.18% Looks quite impressive to me. I'll send a patch soon. Thanks, Yury