On 2020/7/2 PM 7:59, Pekka Enberg wrote: > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang <xlpang@xxxxxxxxxxxxxxxxx> wrote: >> The node list_lock in count_partial() spend long time iterating >> in case of large amount of partial page lists, which can cause >> thunder herd effect to the list_lock contention, e.g. it cause >> business response-time jitters when accessing "/proc/slabinfo" >> in our production environments. > > Would you have any numbers to share to quantify this jitter? I have no We have HSF RT(High-speed Service Framework Response-Time) monitors, the RT figures fluctuated randomly, then we deployed a tool detecting "irq off" and "preempt off" to dump the culprit's calltrace, capturing the list_lock cost up to 100ms with irq off issued by "ss", this also caused network timeouts. > objections to this approach, but I think the original design > deliberately made reading "/proc/slabinfo" more expensive to avoid > atomic operations in the allocation/deallocation paths. It would be > good to understand what is the gain of this approach before we switch > to it. Maybe even run some slab-related benchmark (not sure if there's > something better than hackbench these days) to see if the overhead of > this approach shows up. I thought that before, but most atomic operations are serialized by the list_lock. Another possible way is to hold list_lock in __slab_free(), then these two counters can be changed from atomic to long. I also have no idea what's the standard SLUB benchmark for the regression test, any specific suggestion? > >> This patch introduces two counters to maintain the actual number >> of partial objects dynamically instead of iterating the partial >> page lists with list_lock held. >> >> New counters of kmem_cache_node are: pfree_objects, ptotal_objects. >> The main operations are under list_lock in slow path, its performance >> impact is minimal. >> >> Co-developed-by: Wen Yang <wenyang@xxxxxxxxxxxxxxxxx> >> Signed-off-by: Xunlei Pang <xlpang@xxxxxxxxxxxxxxxxx> >> --- >> mm/slab.h | 2 ++ >> mm/slub.c | 38 +++++++++++++++++++++++++++++++++++++- >> 2 files changed, 39 insertions(+), 1 deletion(-) >> >> diff --git a/mm/slab.h b/mm/slab.h >> index 7e94700..5935749 100644 >> --- a/mm/slab.h >> +++ b/mm/slab.h >> @@ -616,6 +616,8 @@ struct kmem_cache_node { >> #ifdef CONFIG_SLUB >> unsigned long nr_partial; >> struct list_head partial; >> + atomic_long_t pfree_objects; /* partial free objects */ >> + atomic_long_t ptotal_objects; /* partial total objects */ > > You could rename these to "nr_partial_free_objs" and > "nr_partial_total_objs" for readability. Sounds good. Thanks! > > - Pekka >