Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size

Eric Dumazet <edumazet@xxxxxxxxxx> · Thu, 1 Jun 2023 05:21:51 +0200

On Thu, Jun 1, 2023 at 4:48 AM Zhang, Cathy <cathy.zhang@xxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Shakeel Butt <shakeelb@xxxxxxxxxx>
> > Sent: Thursday, June 1, 2023 3:45 AM
> > To: Sang, Oliver <oliver.sang@xxxxxxxxx>
> > Cc: Zhang, Cathy <cathy.zhang@xxxxxxxxx>; Yin, Fengwei
> > <fengwei.yin@xxxxxxxxx>; Tang, Feng <feng.tang@xxxxxxxxx>; Eric Dumazet
> > <edumazet@xxxxxxxxxx>; Linux MM <linux-mm@xxxxxxxxx>; Cgroups
> > <cgroups@xxxxxxxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>;
> > davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx; Brandeburg, Jesse
> > <jesse.brandeburg@xxxxxxxxx>; Srinivas, Suresh
> > <suresh.srinivas@xxxxxxxxx>; Chen, Tim C <tim.c.chen@xxxxxxxxx>; You,
> > Lizhen <lizhen.you@xxxxxxxxx>; eric.dumazet@xxxxxxxxx;
> > netdev@xxxxxxxxxxxxxxx; Li, Philip <philip.li@xxxxxxxxx>; Liu, Yujie
> > <yujie.liu@xxxxxxxxx>
> > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> > size
> >
> > Hi Oliver,
> >
> > On Wed, May 31, 2023 at 04:46:08PM +0800, Oliver Sang wrote:
> > [...]
> > >
> > > we applied below patch upon v6.4-rc2, so far, we didn't spot out
> > > performance impacts of it to other tests.
> > >
> > > but we found -7.6% regression of netperf.Throughput_Mbps
> > >
> >
> > Thanks, this is what I was looking for. I will dig deeper and decide how to
> > proceed (i.e. improve this patch or work on long term approach).
>
> Hi Shakeel,
> If I understand correctly, I think the long-term goal you mentioned is to
> implement a per-memcg per-cpu cache or a percpu_counter solution, right?
> A per-memcg per-cpu cache solution can avoid the uncharge overhead and
> reduce charge overhead, while the percpu_counter solution can help avoid
> the charge overhead ultimately. It seems both are necessary and complex.
>
> The above is from memory side, regarding to our original proposal that is to
> tune the reclaim threshold from network side, Eric and you worry about that
> it might re-introduce the OOM issue. I see the following two interfaces in
> network stack, which indicate the memory usage status, so is it possible to
> tune the reclaim threshold to smaller when enter memory pressure and set
> it to 64K when leave memory pressure? How do you think?
>         void                    (*enter_memory_pressure)(struct sock *sk);
>         void                    (*leave_memory_pressure)(struct sock *sk);
>

No it is not possible to reclaim 'the threshold' when 10,000,000
sockets are alive and kept
a budget around 64KB.
We do not have a shrinker and we do not want one.
The only sensible solution is per-cpu cache.
This was done in core TCP stack, a similar solution is needed in
memcg, if memcg has to be used.