On 3/18/21 8:18 PM, Vlastimil Babka wrote: > On 3/17/21 8:54 AM, Xunlei Pang wrote: >> The node list_lock in count_partial() spends long time iterating >> in case of large amount of partial page lists, which can cause >> thunder herd effect to the list_lock contention. >> >> We have HSF RT(High-speed Service Framework Response-Time) monitors, >> the RT figures fluctuated randomly, then we deployed a tool detecting >> "irq off" and "preempt off" to dump the culprit's calltrace, capturing >> the list_lock cost nearly 100ms with irq off issued by "ss", this also >> caused network timeouts. > > I forgot to ask, how does "ss" come into this? It displays network connections > AFAIK. Does it read any SLUB counters or slabinfo? > ss may access /proc/slabinfo to acquire network related slab statistics.