Hey, With the intro of uring there is finally a sane aio interface which allows straight porting of Win IOCP-based apps to Linux - hooray. The question though which I've raised in the prev. thread but was kinda unanswered is how good single uring submits+retrieves are at scaling across multiple active user threads for socket io, still bound(to prevent excessive ctx switches) by the overall number of system cores. For IOCP I'am observing raw socket TPS increase with every added core thread up to like 32..64 of them using single submit+completion queue pair. All uring benchmarks I've found lack specific context like the number of running threads and there is no data on the gains from multicore submits/retrieves against a single uring that make them kinda not to the point... Dmitry