Hi Dmitry, If you want max performance, what you generally will see in non-blocking servers is one event loop per core/thread. This means one ring per core/thread. Of course there is no simple answer to this. See how thread-based servers work vs non-blocking servers. E.g. Apache vs Nginx or Tomcat vs Netty. — Hielke de Vries On Tue, May 12, 2020, at 22:20, Dmitry Sychov wrote: > Hello, > > I'am writing a small web + embedded database application taking > advantage of the multicore performance of the latest AMD Epyc (up to > 128 threads/CPU). > > Is there any performance advantage of using per thread uring setups? > Such as every thread will own its unique sq+cq. > > My feeling is there are no gains since internally, in Linux kernel, > the uring system is represented as a single queue pickup thread > anyway(?) and sharing a one pair of sq+cq (through exclusive locks) > via all threads would be enough to achieve maximum throughput. > > I want to squeeze the max performance out of uring in multi threading > clients <-> server environment, where the max number of threads is > always bounded by the max number of CPUs cores. > > Regards, Dmitry >