On Wed, Mar 19, 2025 at 10:53 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > On Wed, Mar 19, 2025 at 12:13:30AM -0700, Greg Thelen wrote: > > From: Eric Dumazet <edumazet@xxxxxxxxxx> > > > > cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while > > iterating all possible cpus. It only drops the lock if there is > > scheduler or spin lock contention. If neither, then interrupts can be > > disabled for a long time. On large machines this can disable interrupts > > for a long enough time to drop network packets. On 400+ CPU machines > > I've seen interrupt disabled for over 40 msec. > > Which kernel was this observed on in production? > > > > > Prevent rstat from disabling interrupts while processing all possible > > cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. > > Doing for each cpu might be too extreme. Have you tried with some > batching? > > > This > > approach was previously discussed in > > https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@xxxxxxxxxxxxxxx/, > > though this was in the context of an non-irq rstat spin lock. > > > > Benchmark this change with: > > 1) a single stat_reader process with 400 threads, each reading a test > > memcg's memory.stat repeatedly for 10 seconds. > > 2) 400 memory hog processes running in the test memcg and repeatedly > > charging memory until oom killed. Then they repeat charging and oom > > killing. > > > > Though this benchmark seems too extreme but userspace holding off irqs > for that long time is bad. BTW are these memory hoggers, creating anon > memory or file memory? Is [z]swap enabled? The memory hoggers were anon, without any form of swap. I think the other questions were answered in other replies, but feel free to re-ask and I'll provide details. > For the long term, I think we can use make this work without disabling > irqs, similar to how networking manages sock lock. > > > v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds > > interrupts are disabled by rstat for 45341 usec: > > # => started at: _raw_spin_lock_irq > > # => ended at: cgroup_rstat_flush > > # > > # > > # _------=> CPU# > > # / _-----=> irqs-off/BH-disabled > > # | / _----=> need-resched > > # || / _---=> hardirq/softirq > > # ||| / _--=> preempt-depth > > # |||| / _-=> migrate-disable > > # ||||| / delay > > # cmd pid |||||| time | caller > > # \ / |||||| \ | / > > stat_rea-96532 52d.... 0us*: _raw_spin_lock_irq > > stat_rea-96532 52d.... 45342us : cgroup_rstat_flush > > stat_rea-96532 52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat_flush > > stat_rea-96532 52d.... 45343us : <stack trace> > > => memcg1_stat_format > > => memory_stat_format > > => memory_stat_show > > => seq_read_iter > > => vfs_read > > => ksys_read > > => do_syscall_64 > > => entry_SYSCALL_64_after_hwframe > > > > With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the > > longest holder. The longest irqs-off holder has irqs disabled for > > 4142 usec, a huge reduction from previous 45341 usec rstat finding. > > > > Running stat_reader memory.stat reader for 10 seconds: > > - without memory hogs: 9.84M accesses => 12.7M accesses > > - with memory hogs: 9.46M accesses => 11.1M accesses > > The throughput of memory.stat access improves. > > > > The mode of memory.stat access latency after grouping by of 2 buckets: > > - without memory hogs: 64 usec => 16 usec > > - with memory hogs: 64 usec => 8 usec > > The memory.stat latency improves. > > So, things are improving even without batching. I wonder if there are > less readers then how will this look like. Can you try with single > reader as well? > > > > > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> > > Signed-off-by: Greg Thelen <gthelen@xxxxxxxxxx> > > Tested-by: Greg Thelen <gthelen@xxxxxxxxxx> > > Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> >