Re: Any performance gains from using per thread(thread local) urings?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dmitry,

If you want max performance, what you generally will see in non-blocking servers is one event loop per core/thread. This means one ring per core/thread. Of course there is no simple answer to this. See how thread-based servers work vs non-blocking servers. E.g. Apache vs Nginx or Tomcat vs Netty.

—
Hielke de Vries

On Tue, May 12, 2020, at 22:20, Dmitry Sychov wrote:
> Hello,
> 
> I'am writing a small web + embedded database application taking
> advantage of the multicore performance of the latest AMD Epyc (up to
> 128 threads/CPU).
> 
> Is there any performance advantage of using per thread uring setups?
> Such as every thread will own its unique sq+cq.
> 
> My feeling is there are no gains since internally, in Linux kernel,
> the uring system is represented as a single queue pickup thread
> anyway(?) and sharing a one pair of sq+cq (through exclusive locks)
> via all threads would be enough to achieve maximum throughput.
> 
> I want to squeeze the max performance out of uring in multi threading
> clients <-> server environment, where the max number of threads is
> always bounded by the max number of CPUs cores.
> 
> Regards, Dmitry
>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux