Questions: - Is there a reason that SLUB does not implement cache coloring? it will help utilizing hardware cache. Especially in block layer, they are literally *squeezing* its performance now. - In SLAB, do we really need to flush queues every few seconds? (per cpu queue and shared queue). Flushing alien caches makes sense, but flushing queues seems reducing it's fastpath. But yeah, we need to reclaim memory. can we just defer this? Idea: - I don't like SLAB's per-node cache coloring, because L1 cache isn't shared between cpus. For now, cpus in same node are sharing its colour_next - but we can do better. what about splitting some per-cpu variables into kmem_cache_cpu like SLUB? I think cpu_cache, colour (and colour_next), alloc{hit,miss}, and free{hit,miss} can be per-cpu variables.