On Tue, May 31, 2022 at 11:47:22AM +0800, Muchun Song wrote: > On Sun, May 29, 2022 at 11:37:06AM +0000, Hyeonggon Yoo wrote: > > On Sun, May 29, 2022 at 04:15:33PM +0800, Rongwei Wang wrote: > > > In use cases where allocating and freeing slab frequently, some > > > error messages, such as "Left Redzone overwritten", "First byte > > > 0xbb instead of 0xcc" would be printed when validating slabs. > > > That's because an object has been filled with SLAB_RED_INACTIVE, > > > but has not been added to slab's freelist. And between these > > > two states, the behaviour of validating slab is likely to occur. > > > > > > Actually, it doesn't mean the slab can not work stably. But, these > > > confusing messages will disturb slab debugging more or less. > > > > > > Signed-off-by: Rongwei Wang <rongwei.wang@xxxxxxxxxxxxxxxxx> > > > > Have you observed it or it's from code inspection? > > > > > --- > > > mm/slub.c | 40 +++++++++++++++++----------------------- > > > 1 file changed, 17 insertions(+), 23 deletions(-) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index ed5c2c03a47a..310e56d99116 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -1374,15 +1374,12 @@ static noinline int free_debug_processing( > > > void *head, void *tail, int bulk_cnt, > > > unsigned long addr) > > > { > > > - struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > > > void *object = head; > > > int cnt = 0; > > > - unsigned long flags, flags2; > > > + unsigned long flags; > > > int ret = 0; > > > > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - slab_lock(slab, &flags2); > > > - > > > + slab_lock(slab, &flags); > > > if (s->flags & SLAB_CONSISTENCY_CHECKS) { > > > if (!check_slab(s, slab)) > > > goto out; > > > @@ -1414,8 +1411,7 @@ static noinline int free_debug_processing( > > > slab_err(s, slab, "Bulk freelist count(%d) invalid(%d)\n", > > > bulk_cnt, cnt); > > > > > > - slab_unlock(slab, &flags2); > > > - spin_unlock_irqrestore(&n->list_lock, flags); > > > + slab_unlock(slab, &flags); > > > if (!ret) > > > slab_fix(s, "Object at 0x%p not freed", object); > > > return ret; > > > @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > > > > > > { > > > void *prior; > > > - int was_frozen; > > > + int was_frozen, to_take_off = 0; > > > struct slab new; > > > unsigned long counters; > > > struct kmem_cache_node *n = NULL; > > > @@ -3315,15 +3311,19 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > > > if (kfence_free(head)) > > > return; > > > > > > + n = get_node(s, slab_nid(slab)); > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + > > > > Oh please don't do this. > > > > SLUB free slowpath can be hit a lot depending on workload. > > > > __slab_free() try its best not to take n->list_lock. currently takes n->list_lock > > only when the slab need to be taken from list. > > > > Unconditionally taking n->list_lock will degrade performance. > > > > I can confirm you are right. We have encountered this issue in practise. > We have deployed somen HDFS instance on a 256-CPU machine. When there > are lots of IO requests from users, we can see lots of threads are contended > on n->list_lock. Lots of call traces are like following: > > ffffffff810dfbb4 native_queued_spin_lock_slowpath+0x1a4 > ffffffff81780ffb _raw_spin_lock+0x1b > ffffffff8127327e get_partial_node.isra.81+0x5e > ffffffff812752d3 ___slab_alloc+0x2f3 > ffffffff8127559c __slab_alloc+0x1c > ffffffff81275828 kmem_cache_alloc+0x278 > ffffffff812e9e3d alloc_buffer_head+0x1d > ffffffff812e9f74 alloc_page_buffers+0xa4 > ffffffff812eb0e9 create_empty_buffers+0x19 > ffffffff812eb37d create_page_buffers+0x7d > ffffffff812ed78d __block_write_begin_int+0x9d > > I thought it was because there are lots of threads which consume local > CPU slab cache quickly and then both of them try to get a new slab > from node partial list. Since there are 256 CPUs, the contention > is more competitive and easy to be visible. > > Any thoughts on this issue (e.e. how to ease contention)? Comments > are welcome. How does increasing number of partial slabs affect your situation? (increasing /sys/slab/<cache name>/cpu_partial) > Thanks. > >