On Tue, 7 Jul 2020, Pekka Enberg wrote: > On Fri, Jul 3, 2020 at 12:38 PM xunlei <xlpang@xxxxxxxxxxxxxxxxx> wrote: > > > > On 2020/7/2 PM 7:59, Pekka Enberg wrote: > > > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang <xlpang@xxxxxxxxxxxxxxxxx> wrote: > > >> The node list_lock in count_partial() spend long time iterating > > >> in case of large amount of partial page lists, which can cause > > >> thunder herd effect to the list_lock contention, e.g. it cause > > >> business response-time jitters when accessing "/proc/slabinfo" > > >> in our production environments. > > > > > > Would you have any numbers to share to quantify this jitter? I have no > > > > We have HSF RT(High-speed Service Framework Response-Time) monitors, the > > RT figures fluctuated randomly, then we deployed a tool detecting "irq > > off" and "preempt off" to dump the culprit's calltrace, capturing the > > list_lock cost up to 100ms with irq off issued by "ss", this also caused > > network timeouts. > > Thanks for the follow up. This sounds like a good enough motivation > for this patch, but please include it in the changelog. Well this is access via sysfs causing a holdoff. Another way of access to the same information without adding atomics and counters would be best. > > I also have no idea what's the standard SLUB benchmark for the > > regression test, any specific suggestion? > > I don't know what people use these days. When I did benchmarking in > the past, hackbench and netperf were known to be slab-allocation > intensive macro-benchmarks. Christoph also had some SLUB > micro-benchmarks, but I don't think we ever merged them into the tree. They are still where they have been for the last decade or so. In my git tree on kernel.org. They were also reworked a couple of times and posted to linux-mm. There are historical posts going back over the years where individuals have modified them and used them to create multiple other tests.