On Thu, 4 Feb 2021, Vincent Guittot wrote: > > So what is preferrable here now? Above or other quick fix or reverting > > the original commit? > > I'm fine with whatever the solution as long as we can use keep using > nr_cpu_ids when other values like num_present_cpus, don't reflect > correctly the system AFAICT they are correctly reflecting the current state of the system. The problem here is the bringup of the system and the tuning therefor. One additional thing that may help: The slab caches can work in a degraded mode where no fastpath allocations can occur. That mode is used primarily for debugging but maybe that mode can also help during bootstrap to avoid having to deal with the per cpu data and so on. In degraded mode SLUB will take a lock for each operation on an object. In this mode the following is true kmem_cache_cpu->page == NULL kmem_cache_cpu->freelist == NULL kmem_cache_debug(s) == true So if you define a new debug mode and include it in SLAB_DEBUG_FLAGS then you can force SLUB to fallback to operations where a lock is taken and where slab allocation can be stopped. This may be ok for bring up. The debug flags are also tied to some wizardry that can patch the code at runtime to optimize for debubgging or fast operations. You would tie into that one as well. Start in debug mode by default and switch to fast operations after all processors are up.