RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size

"Zhang, Cathy" <cathy.zhang@xxxxxxxxx> · Thu, 11 May 2023 06:59:19 +0000

> -----Original Message-----
> From: Zhang, Cathy
> Sent: Thursday, May 11, 2023 8:53 AM
> To: Shakeel Butt <shakeelb@xxxxxxxxxx>
> Cc: Eric Dumazet <edumazet@xxxxxxxxxx>; Linux MM <linux-
> mm@xxxxxxxxx>; Cgroups <cgroups@xxxxxxxxxxxxxxx>; Paolo Abeni
> <pabeni@xxxxxxxxxx>; davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx;
> Brandeburg, Jesse <jesse.brandeburg@xxxxxxxxx>; Srinivas, Suresh
> <suresh.srinivas@xxxxxxxxx>; Chen, Tim C <tim.c.chen@xxxxxxxxx>; You,
> Lizhen <Lizhen.You@xxxxxxxxx>; eric.dumazet@xxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> size
> 
> 
> 
> > -----Original Message-----
> > From: Shakeel Butt <shakeelb@xxxxxxxxxx>
> > Sent: Thursday, May 11, 2023 3:00 AM
> > To: Zhang, Cathy <cathy.zhang@xxxxxxxxx>
> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx>; Linux MM <linux-
> > mm@xxxxxxxxx>; Cgroups <cgroups@xxxxxxxxxxxxxxx>; Paolo Abeni
> > <pabeni@xxxxxxxxxx>; davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx;
> Brandeburg,
> > Jesse <jesse.brandeburg@xxxxxxxxx>; Srinivas, Suresh
> > <suresh.srinivas@xxxxxxxxx>; Chen, Tim C <tim.c.chen@xxxxxxxxx>; You,
> > Lizhen <lizhen.you@xxxxxxxxx>; eric.dumazet@xxxxxxxxx;
> > netdev@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a
> > proper size
> >
> > On Wed, May 10, 2023 at 9:09 AM Zhang, Cathy <cathy.zhang@xxxxxxxxx>
> > wrote:
> > >
> > >
> > [...]
> > > > > >
> > > > > > Have you tried to increase batch sizes ?
> > > > >
> > > > > I jus picked up 256 and 1024 for a try, but no help, the
> > > > > overhead still
> > exists.
> > > >
> > > > This makes no sense at all.
> > >
> > > Eric,
> > >
> > > I added a pr_info in try_charge_memcg() to print nr_pages if
> > > nr_pages
> > > >= MEMCG_CHARGE_BATCH, except it prints 64 during the initialization
> > > of instances, there is no other output during the running. That
> > > means nr_pages is not over 64, I guess that might be the reason why
> > > to increase MEMCG_CHARGE_BATCH doesn't affect this case.
> > >
> >
> > I am assuming you increased MEMCG_CHARGE_BATCH to 256 and 1024
> but
> > that did not help. To me that just means there is a different
> > bottleneck in the memcg charging codepath. Can you please share the
> > perf profile? Please note that memcg charging does a lot of other
> > things as well like updating memcg stats and checking (and enforcing)
> > memory.high even if you have not set memory.high.
> 
> Thanks Shakeel! I will check more details on what you mentioned. We use
> "sudo perf top -p $(docker inspect -f '{{.State.Pid}}' memcached_2)" to
> monitor one of those instances, and also use "sudo perf top" to check the
> overhead from system wide.

Here is the annotate output of perf top for the three memcg hot paths:

Showing cycles for page_counter_try_charge
  Events  Pcnt (>=5%)
 Percent |      Source code & Disassembly of elf for cycles (543288 samples, percent: local period)
---------------------------------------------------------------------------------------------------
    0.00 :   ffffffff8141388d:       mov    %r12,%rax
   76.82 :   ffffffff81413890:       lock xadd %rax,(%rbx)
   22.10 :   ffffffff81413895:       lea    (%r12,%rax,1),%r15

Showing cycles for page_counter_cancel
  Events  Pcnt (>=5%)
 Percent |      Source code & Disassembly of elf for cycles (1004744 samples, percent: local period)
----------------------------------------------------------------------------------------------------
         : 160              return i + xadd(&v->counter, i);
   77.42 :   ffffffff81413759:       lock xadd %rax,(%rdi)
   22.34 :   ffffffff8141375e:       sub    %rsi,%rax

Showing cycles for try_charge_memcg
  Events  Pcnt (>=5%)
 Percent |      Source code & Disassembly of elf for cycles (256531 samples, percent: local period)
---------------------------------------------------------------------------------------------------
         : 22               return __READ_ONCE((v)->counter);
   77.53 :   ffffffff8141df86:       mov    0x100(%r13),%rdx
         : 2826             READ_ONCE(memcg->memory.high);
   19.45 :   ffffffff8141df8d:       mov    0x190(%r13),%rcx