Let me just express my general concern. SLUB was written because SLAB became a Byzantine mess with layer upon layer of debugging and queues here and there and with "maintenance" for these queues going on every 2 seconds staggered on all processors. This caused a degree of OS noise that caused HPC jobs (and today we see similar issues with AI jobs) to not be able to accomplish a deterministic rendezvous. On some large machines we had ~10% of the whole memory vanish into one of the other queue on boot up with the customers being a bit upset were all the expensive memory went. It seems that were have nearly recreated the old nightmare again. I would suggest rewriting the whole allocator once again trying to simplify things as much as possible and isolating specialized allocator functionality needed for some subsystems into different APIs. The main allocation / free path needs to be as simple and as efficient as possible. It may not be possible to accomplish something like that given all the special casing that we have been pushing into it. Also consider the runtime security measures and verification stuff that is on by default at runtime as well.