On 4/4/19 5:57 PM, Christopher Lameter wrote: > On Thu, 4 Apr 2019, Vlastimil Babka wrote: > >> I looked a bit at SLUB debugging capabilities and first thing I noticed is >> there's no static key guarding the runtime enablement as is common for similar >> debugging functionalities, so here's a RFC to add it. Can be further improved >> if there's interest. > > Well the runtime enablement is per slab cache and static keys are global. Sure, but the most common scenario worth optimizing for is that no slab cache has debugging enabled, and thus the global static key is disabled. Once it becomes enabled, the flags are checked for each cache, no change there. > Adding static key adds code to the critical paths. It's effectively a NOP as long as it's disabled. When enabled, the NOP is livepatched into a jump (unconditional!) to the flags check. > Since the flags for a > kmem_cache have to be inspected anyways there may not be that much of a > benefit. The point is that as long as it's disabled (the common case), no flag check (most likely involving a conditional jump) is being executed at all (unlike now). NOP is obviously cheaper than a flag check. For the (uncommon) case with debugging enabled, it adds unconditional jump which is also rather cheap. So the tradeoff looks good. >> It's true that in the alloc fast path the debugging check overhead is AFAICS >> amortized by the per-cpu cache, i.e. when the allocation is from there, no >> debugging functionality is performed. IMHO that's kinda a weakness, especially >> for SLAB_STORE_USER, so I might also look at doing something about it, and then >> the static key might be more critical for overhead reduction. > > Moving debugging out of the per cpu fastpath allows that fastpath to be > much simpler and faster. > > SLAB_STORE_USER is mostly used only for debugging in which case we are > less concerned with performance. Agreed, so it would be nice if we could do e.g. SLAB_STORE_USER for all allocations in such case. > If you want to use SLAB_STORE_USER in the fastpath then we have to do some > major redesign there. Sure. Just saying that benefit of static key in alloc path is currently limited as the debugging itself is limited due to alloc fast path being effective. But there's still immediate benefit in free path. >> In the freeing fast path I quickly checked the stats and it seems that in >> do_slab_free(), the "if (likely(page == c->page))" is not as likely as it >> declares, as in the majority of cases, freeing doesn't happen on the object >> that belongs to the page currently cached. So the advantage of a static key in >> slow path __slab_free() should be more useful immediately. > > Right. The freeing logic is actuall a weakness in terms of performance for > SLUB due to the need to operate on a per page queue immediately. >