Hello Christoph, thank you for answering. On Mon, Oct 11, 2021 at 09:13:52AM +0200, Christoph Lameter wrote: > On Sat, 9 Oct 2021, Hyeonggon Yoo wrote: > > > - Is there a reason that SLUB does not implement cache coloring? > > it will help utilizing hardware cache. Especially in block layer, > > they are literally *squeezing* its performance now. > > Well as Matthew says: The high associativity of caches it seems not useful on my both machines (4-way / 8-way set associative) too. > and the execution > of other code path seems to make this not useful anymore. > > I am sure you can find a benchmark that shows some benefit. But please > realize that in real-life the OS must perform work. This means that > multiple other code paths are executed that affect cache use and placement > of data in cache lines. > cache coloring can make benchmark results better. But as slab uses more cache lines - that reduces other code paths' cache line. Did I get right? > > > - In SLAB, do we really need to flush queues every few seconds? > > (per cpu queue and shared queue). Flushing alien caches makes > > sense, but flushing queues seems reducing it's fastpath. > > But yeah, we need to reclaim memory. can we just defer this? > > The queues are designed to track cache hot objects (See the Bonwick > paper). After a while the cachelines will be used for other purposes and > no longer reflect what is in the caches. That is why they need to be > expired. I've read Bonwick paper but I thought expiring was need for reclaiming memory. maybe I got it wrong.. I should read it again. > > > > - I don't like SLAB's per-node cache coloring, because L1 cache > > isn't shared between cpus. For now, cpus in same node are sharing > > its colour_next - but we can do better. > > This differs based on the cpu architecture in use. SLAB has an ideal model > of how caches work and keeps objects cache hot based on that. In real life > the cpu architecture differs from what SLAB things how caches operate. > So the point is, As cache hierarchy differs based on architecture, assuming cpus have both unique cache per cpu, and shared cache among cpus can misfit in some architectures. > > what about splitting some per-cpu variables into kmem_cache_cpu > > like SLUB? I think cpu_cache, colour (and colour_next), > > alloc{hit,miss}, and free{hit,miss} can be per-cpu variables. > > That would in turn increase memory use and potentially the cache footprint > of the hot paths. > I thought splitting percpu data was need for coloring but it isn't useful. So that's unnecessary cost. Thanks, Hyeonggon.