On Fri, Sep 2, 2022 at 12:04 AM Jiebin Sun <jiebin.sun@xxxxxxxxx> wrote: > > The msg_bytes and msg_hdrs atomic counters are frequently > updated when IPC msg queue is in heavy use, causing heavy > cache bounce and overhead. Change them to percpu_counters > greatly improve the performance. Since there is one unique > ipc namespace, additional memory cost is minimal. Reading > of the count done in msgctl call, which is infrequent. So > the need to sum up the counts in each CPU is infrequent. > > Apply the patch and test the pts/stress-ng-1.4.0 > -- system v message passing (160 threads). > > Score gain: 3.38x > > CPU: ICX 8380 x 2 sockets > Core number: 40 x 2 physical cores > Benchmark: pts/stress-ng-1.4.0 > -- system v message passing (160 threads) > > Signed-off-by: Jiebin Sun <jiebin.sun@xxxxxxxxx> [...] > > +void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) > +{ > + this_cpu_add(*fbc->counters, amount); > +} > +EXPORT_SYMBOL(percpu_counter_add_local); Why not percpu_counter_add()? This may drift the fbc->count more than batch*nr_cpus. I am assuming that is not the issue for you as you always do an expensive sum in the slow path. As Andrew asked, this should be a separate patch.