Hi Hielke, > If you want max performance, what you generally will see in non-blocking servers is one event loop per core/thread. > This means one ring per core/thread. Of course there is no simple answer to this. > See how thread-based servers work vs non-blocking servers. E.g. Apache vs Nginx or Tomcat vs Netty. I think a lot depends on the internal uring implementation. To what degree the kernel is able to handle multiple urings independently, without much congestion points(like updates of the same memory locations from multiple threads), thus taking advantage of one ring per CPU core. For example, if the tasks from multiple rings are later combined into single input kernel queue (effectively forming a congestion point) I see no reason to use exclusive ring per core in user space. [BTW in Windows IOCP is always one input+output queue for all(active) threads]. Also we could pop out multiple completion events from a single CQ at once to spread the handling to cores-bound threads . I thought about one uring per core at first, but now I'am not sure - maybe the kernel devs have something to add to the discussion? P.S. uring is the main reason I'am switching from windows to linux dev for client-sever app so I want to extract the max performance possible out of this new exciting uring stuff. :) Thanks, Dmitry