Re: [PATCH] ipc/msg.c: mitigate the lock contention with percpu counter

"Sun, Jiebin" <jiebin.sun@xxxxxxxxx> · Mon, 5 Sep 2022 20:02:34 +0800

On 9/3/2022 12:27 AM, Shakeel Butt wrote:
On Fri, Sep 2, 2022 at 12:04 AM Jiebin Sun <jiebin.sun@xxxxxxxxx> wrote:
The msg_bytes and msg_hdrs atomic counters are frequently
updated when IPC msg queue is in heavy use, causing heavy
cache bounce and overhead. Change them to percpu_counters
greatly improve the performance. Since there is one unique
ipc namespace, additional memory cost is minimal. Reading
of the count done in msgctl call, which is infrequent. So
the need to sum up the counts in each CPU is infrequent.

Apply the patch and test the pts/stress-ng-1.4.0
-- system v message passing (160 threads).

Score gain: 3.38x

CPU: ICX 8380 x 2 sockets
Core number: 40 x 2 physical cores
Benchmark: pts/stress-ng-1.4.0
-- system v message passing (160 threads)

Signed-off-by: Jiebin Sun <jiebin.sun@xxxxxxxxx>
[...]
+void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount)
+{
+       this_cpu_add(*fbc->counters, amount);
+}
+EXPORT_SYMBOL(percpu_counter_add_local);
Why not percpu_counter_add()? This may drift the fbc->count more than
batch*nr_cpus. I am assuming that is not the issue for you as you
always do an expensive sum in the slow path. As Andrew asked, this
should be a separate patch.

Yes. It will always do sum in msgctl_info. So there is no need to
do global updating in percpu_counter_add when the percpu counter
reaches the batch size. We add percpu_counter_add_local in this
case. The sum in slow path is infrequent. So the additional cost
is much less compared to the atomic updating in do_msgsnd and
do_msgrcv every time. I have separate the original patch into two
patches.

Thanks.