On Mon, Feb 06 2023 at 15:24, Vlastimil Babka wrote: > On 2/1/23 14:27, Thomas Gleixner wrote: >> This triggers because __kmem_cache_alloc_bulk() uses >> lock_irq()/unlock_irq(). Seems nobody used it during early boot stage >> yet. Though the maple tree conversion of the interrupt descriptor >> storage which is the purpose of the patch in question makes that happen. >> >> Fix below. > > Looks like it should work. But I think we also need to adjust SLAB's > mm/slab.c kmem_cache_alloc_bulk() which does local_irq_disable(); / > local_irq_enable(); right? Yup. > Also if we enter this with IRQ's disabled, then we should take care about > the proper gfp flags. Looking at the patch [1] I see > > WARN_ON(mas_store_gfp(&mas, desc, GFP_KERNEL) != 0); > > so GFP_KERNEL would be wrong with irqs disabled, looks like a case for > GFP_ATOMIC. > OTOH I can see the thing it replaces was: > > static RADIX_TREE(irq_desc_tree, GFP_KERNEL); > > so that's also a GFP_KERNEL and we haven't seen debug splats from > might_alloc() checks before in this code?. That's weird, or maybe the > case might_alloc() might_sleep_if() __might_sleep() WARN_ON(task->state != RUNNING); <- Does not trigger __might_resched() if (.... || system_state == SYSTEM_BOOTING || ...) return; As system_state is SYSTEM_BOOTING at this point the splats are not happening. > of "we didn't enable irqs yet on this cpu being bootstrapped" is handled > differently than "we have temporarily disabled irqs"? Sure, during early > boot we should have all the memory and no need to reclaim... The point is that interrupts are fully disabled during early boot and there is no scheduler so there is no scheduling possible. Quite some code in the kernel relies on GFP_KERNEL being functional during that early boot stage. If the kernel runs out of memory that early, then the chance of recovery is exactly 0. Thanks, tglx