> Can you try the attached patch and see if it fixes it for you? Thank you very much, that worked like a charm for both O_DIRECT and page cache. Below is the output for O_DIRECT reads submission on the same machine: root@localhost:~/io_uring# ./io_uring_read_blkdev /dev/sda8 submitted_already = 0, submitted_now = 32, submit_time = 277 us submitted_already = 32, submitted_now = 32, submit_time = 131 us submitted_already = 64, submitted_now = 32, submit_time = 213 us submitted_already = 96, submitted_now = 32, submit_time = 170 us submitted_already = 128, submitted_now = 32, submit_time = 161 us submitted_already = 160, submitted_now = 32, submit_time = 169 us submitted_already = 192, submitted_now = 32, submit_time = 184 us > Not sure how best to convery that bit of information. If you're using > the sq thread for submission, then we cannot reliably tell the > application when an sqe has been consumed. The application must look for > completions (successful or errors) in the CQ ring. I know that SQPOLL feature support is not fully implemented in liburing, so for now it seems that io_uring_get_sqe() could return not actually submitted SQE, editing which could lead to race between kernel polling thread and user space. I just think it is worth mentioning this fact in documentation. > You could wait on cq ring completions, each sqe should trigger one. Unfortunately few issues seem to arise if this approach is taken in IO-intensive application. As a disclaimer I should note that SQ ring overflow is a rare event given enough entries, nevertheless applications, especially those using SQPOLL, should handle this situation gracefully and in a performant manner. So what we have is highly IO-intensive application that submits very slow IOs*** (that's why it uses async IO in the first place) and cares much about the progress of the submitting threads(the most probable reason to use SQPOLL feature). Given such prerequisites, the following scenario is probable: *** by 'very slow' I mean IOs, completion of which takes significantly more time than submission 1. Put @sq_entries with very slow IOs in SQ... PENDING SQ INFLIGHT CQ +---+ +---+ +---+ +---+---+ ============>| X | | | | | | +---+ +---+ +---+ +---+---+ ...which will be submitted by polling thread PENDING SQ INFLIGHT CQ +---+ +---+ +---+ +---+---+ | | | |====>| X | | | | +---+ +---+ +---+ +---+---+ 2. Then try to add (@sq_entries + @pending) entries to SQ, but only succeed with @sq_entries. PENDING SQ INFLIGHT CQ +---+ +---+ +---+ +---+---+ ==>| X |====>| X | | X | | | | +---+ +---+ +---+ +---+---+ 3. Wait very long time in io_uring_enter(GETEVENTS) waiting for CQ ring completion... PENDING SQ INFLIGHT CQ +---+ +---+ +---+ +---+---+ | X | | X | | |====>| X | | +---+ +---+ +---+ +---+---+ ...and still there is no guarantee that slot in SQ ring became available. Should we call io_uring_enter(GETEVENTS, min_complete = 1); in a loop, checking (*khead == *ktail) at every iteration? Concluding, it seems reasonable to instruct applications using SQPOLL to submit SQEs until the queue is full, and then call io_uring_enter(), probably with some flag, to wait for a slot in submission queue, not for completions, since 1) Time needed to complete IO tends to be much greater than time needed to submit it. 2) CQ ring completion does not imply the slot became available in SQ (see diagram above). 3) Busy waiting of submitting thread is probably not what is desired by SQPOLL users. Side note: eventloop-driven applications could find themselves comforted by epoll()-ing ioring fd with EPOLLOUT to wait for the available entry in SQ. Do I understand it correctly that spurious wakeups are currently possible since io_uring_poll() is awakened only on io_commit_cqring(), which, as shown above, doesn't guarantee that EPOLLOUT may be set? Thank you again! __ Best regards, Filipp Mikoian