On Mon, Mar 6, 2023 at 9:47 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > The drivers/char/random.c code is very wrong, and does > > if (cpu == nr_cpumask_bits) > cpu = cpumask_first(&timer_cpus); > > which fails miserably exactly because it doesn't use ">=". Turns out this "cpu == nr_cpumask_bits" pattern exists in a couple of other places too. It was always wrong, but it always just happened to work. The lpfc SCSI driver in particular seems to *love* this pattern: start_cpu = cpumask_next(new_cpu, cpu_present_mask); if (start_cpu == nr_cpumask_bits) start_cpu = first_cpu; and has repeated it multiple times, all incorrect. We do have "cpumask_next_wrap()", and that *seems* to be what the lpcf driver actually wants to do. .. and then we have kernel/sched/fair.c, which is actually not buggy, just odd. It uses nr_cpumask_bits too, but it uses it purely for its own internal nefarious reasons - it's not actually related to the cpumask functions at all, its just used as a "not valid CPU number". I think that scheduler use is still very *wrong*, but it doesn't look actively buggy. The other cases all look very buggy indeed, but yes, they happened to work, and now they don't. So commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") did break them. I'd rather fix these bad users than revert, but there does seem to be an alarming number of these things, which worries me: git grep '== nr_cpumask_bits' and that's just checking for this *exact* thing. Linus