> -----Original Message----- > From: Shakeel Butt <shakeelb@xxxxxxxxxx> > Sent: Wednesday, May 10, 2023 1:58 AM > To: Zhang, Cathy <cathy.zhang@xxxxxxxxx>; Linux MM <linux- > mm@xxxxxxxxx>; Cgroups <cgroups@xxxxxxxxxxxxxxx> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx>; Paolo Abeni > <pabeni@xxxxxxxxxx>; davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx; > Brandeburg, Jesse <jesse.brandeburg@xxxxxxxxx>; Srinivas, Suresh > <suresh.srinivas@xxxxxxxxx>; Chen, Tim C <tim.c.chen@xxxxxxxxx>; You, > Lizhen <lizhen.you@xxxxxxxxx>; eric.dumazet@xxxxxxxxx; > netdev@xxxxxxxxxxxxxxx > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper > size > > On Tue, May 9, 2023 at 8:07 AM Zhang, Cathy <cathy.zhang@xxxxxxxxx> > wrote: > > > [...] > > > > > > Something must be wrong in your setup, because the only small issue > > > that was noticed was the memcg one that Shakeel solved last year. > > > > As mentioned in commit log, the test is to create 8 memcached-memtier > > pairs on the same host, when server and client of the same pair > > connect to the same CPU socket and share the same CPU set (28 CPUs), > > the memcg overhead is obviously high as shown in commit log. If they > > are set with different CPU set from separate CPU socket, the overhead > > is not so high but still observed. Here is the server/client command in our > test: > > server: > > memcached -p ${port_i} -t ${threads_i} -c 10240 > > client: > > memtier_benchmark --server=${memcached_id} --port=${port_i} \ > > --protocol=memcache_text --test-time=20 --threads=${threads_i} \ -c 1 > > --pipeline=16 --ratio=1:100 --run-count=5 > > > > So, is there anything wrong you see? > > > > What is the memcg hierarchy of this workload? Is each server and client > processes running in their own memcg? How many levels of memcgs? Are > you setting memory.max and memory.high to some value? Also how are you > limiting the processes to CPUs? cpusets? Here is the full command to start memcached instance: docker run -d --name ${memcached_name} --privileged --memory 1G --network bridge \ -p ${port_i}:${port_i} ${cpu_pinning_s[set]} memcached memcached -p ${port_i} \ -t ${threads_i} -c 10240 We have a script to get CPU set from the same NUMA node, both CPU count and thread count for each instance are equal to Num(system online CPUs) / Num(memcached instances). That is, if we run 8 memcached instances, 224 / 8 = 28, so each instance will get 28 CPUs and 28 threads assigned. Here is the full command to start memtier instance: docker run --rm --network bridge ${cpu_pinning_s[set]} --memory 1G \ redislabs/memtier_benchmark memtier_benchmark --server=${memcached_id} --port=${port_i} \ --protocol=memcache_text --test-time=20 --threads=${threads_i} -c 1 --pipeline=16 --ratio=1:100 \ --run-count=5 --hide-histogram Each instance has the same CPU set as the server it connects to, and it has the same threads count. That is all for server and client settings.