On 3/17/21 8:54 AM, Xunlei Pang wrote: > The node list_lock in count_partial() spends long time iterating > in case of large amount of partial page lists, which can cause > thunder herd effect to the list_lock contention. > > We have HSF RT(High-speed Service Framework Response-Time) monitors, > the RT figures fluctuated randomly, then we deployed a tool detecting > "irq off" and "preempt off" to dump the culprit's calltrace, capturing > the list_lock cost nearly 100ms with irq off issued by "ss", this also > caused network timeouts. I forgot to ask, how does "ss" come into this? It displays network connections AFAIK. Does it read any SLUB counters or slabinfo?