On 7/15/22 10:05, Rongwei Wang wrote: > > > On 6/17/22 5:40 PM, Vlastimil Babka wrote: >> On 6/8/22 14:23, Christoph Lameter wrote: >>> On Wed, 8 Jun 2022, Rongwei Wang wrote: >>> >>>> If available, I think document the issue and warn this incorrect >>>> behavior is >>>> OK. But it still prints a large amount of confusing messages, and >>>> disturbs us? >>> >>> Correct it would be great if you could fix this in a way that does not >>> impact performance. >>> >>>>> are current operations on the slab being validated. >>>> And I am trying to fix it in following way. In a short, these changes only >>>> works under the slub debug mode, and not affects the normal mode (I'm not >>>> sure). It looks not elegant enough. And if all approve of this way, I can >>>> submit the next version. >>> >>> >>>> >>>> Anyway, thanks for your time:). >>>> -wrw >>>> >>>> @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, >>> struct >>>> slab *slab, >>>> >>>> { >>>> void *prior; >>>> - int was_frozen; >>>> + int was_frozen, to_take_off = 0; >>>> struct slab new; >>> >>> to_take_off has the role of !n ? Why is that needed? >>> >>>> - do { >>>> - if (unlikely(n)) { >>>> + spin_lock_irqsave(&n->list_lock, flags); >>>> + ret = free_debug_processing(s, slab, head, tail, cnt, >>>> addr); >>> >>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks >>> ok. But it still adds a number of new branches etc to the free loop. >> > Hi, Vlastimil, sorry for missing your message long time. Hi, no problem. >> It also further complicates the already tricky code. I wonder if we should >> make more benefit from the fact that for kmem_cache_debug() caches we don't >> leave any slabs on percpu or percpu partial lists, and also in >> free_debug_processing() we aready take both list_lock and slab_lock. If we >> just did the freeing immediately there under those locks, we would be >> protected against other freeing cpus by that list_lock and don't need the >> double cmpxchg tricks. > enen, I'm not sure get your "don't need the double cmpxchg tricks" means > completely. What you want to say is that replace cmpxchg_double_slab() here > with following code when kmem_cache_debug(s)? > > __slab_lock(slab); > if (slab->freelist == freelist_old && slab->counters == counters_old){ > slab->freelist = freelist_new; > slab->counters = counters_new; > __slab_unlock(slab); > local_irq_restore(flags); > return true; > } > __slab_unlock(slab); Pretty much, but it's more complicated. > If I make mistakes for your words, please let me know. > Thanks! >> >> What about against allocating cpus? More tricky as those will currently end >> up privatizing the freelist via get_partial(), only to deactivate it again, >> so our list_lock+slab_lock in freeing path would not protect in the >> meanwhile. But the allocation is currently very inefficient for debug >> caches, as in get_partial() it will take the list_lock to take the slab from >> partial list and then in most cases again in deactivate_slab() to return it. > It seems that I need speed some time to eat these words. Anyway, thanks. >> >> If instead the allocation path for kmem_cache_debug() cache would take a >> single object from the partial list (not whole freelist) under list_lock, it >> would be ultimately more efficient, and protect against freeing using >> list_lock. Sounds like an idea worth trying to me? > > Hyeonggon had a similar advice that split freeing and allocating slab from > debugging, likes below: > > > __slab_alloc() { > if (kmem_cache_debug(s)) > slab_alloc_debug() > else > ___slab_alloc() > } > > I guess that above code aims to solve your mentioned problem (idea)? > > slab_free() { > if (kmem_cache_debug(s)) > slab_free_debug() > else > __do_slab_free() > } > > Currently, I only modify the code of freeing slab to fix the confusing > messages of "slabinfo -v". If you agree, I can try to realize above > mentioned slab_alloc_debug() code. Maybe it's also a challenge to me. I already started working on this approach and hope to post a RFC soon. > Thanks for your time. > >> And of course we would stop creating the 'validate' sysfs files for >> non-debug caches.