Any performance gains from using per thread(thread local) urings?

Dmitry Sychov <dmitry.sychov@xxxxxxxxx> · Tue, 12 May 2020 23:20:57 +0300

Hello,

I'am writing a small web + embedded database application taking
advantage of the multicore performance of the latest AMD Epyc (up to
128 threads/CPU).

Is there any performance advantage of using per thread uring setups?
Such as every thread will own its unique sq+cq.

My feeling is there are no gains since internally, in Linux kernel,
the uring system is represented as a single queue pickup thread
anyway(?) and sharing a one pair of sq+cq (through exclusive locks)
via all threads would be enough to achieve maximum throughput.

I want to squeeze the max performance out of uring in multi threading
clients <-> server environment, where the max number of threads is
always bounded by the max number of CPUs cores.

Regards, Dmitry