Re: Any performance gains from using per thread(thread local) urings?

"H. de Vries" <hdevries@xxxxxxxxxxxx> · Wed, 13 May 2020 08:07:07 +0200

Hi Dmitry,

If you want max performance, what you generally will see in non-blocking servers is one event loop per core/thread. This means one ring per core/thread. Of course there is no simple answer to this. See how thread-based servers work vs non-blocking servers. E.g. Apache vs Nginx or Tomcat vs Netty.

—
Hielke de Vries

On Tue, May 12, 2020, at 22:20, Dmitry Sychov wrote:
> Hello,
> 
> I'am writing a small web + embedded database application taking
> advantage of the multicore performance of the latest AMD Epyc (up to
> 128 threads/CPU).
> 
> Is there any performance advantage of using per thread uring setups?
> Such as every thread will own its unique sq+cq.
> 
> My feeling is there are no gains since internally, in Linux kernel,
> the uring system is represented as a single queue pickup thread
> anyway(?) and sharing a one pair of sq+cq (through exclusive locks)
> via all threads would be enough to achieve maximum throughput.
> 
> I want to squeeze the max performance out of uring in multi threading
> clients <-> server environment, where the max number of threads is
> always bounded by the max number of CPUs cores.
> 
> Regards, Dmitry
>