On Fri, Jan 31, 2020 at 08:39:46AM -0700, Jens Axboe wrote: > On 1/31/20 7:29 AM, Stefano Garzarella wrote: > > Hi Jens, > > this is a v2 of the epoll test. > > > > v1 -> v2: > > - if IORING_FEAT_NODROP is not available, avoid to overflow the CQ > > - add 2 new tests to test epoll with IORING_FEAT_NODROP > > - cleanups > > > > There are 4 sub-tests: > > 1. test_epoll > > 2. test_epoll_sqpoll > > 3. test_epoll_nodrop > > 4. test_epoll_sqpoll_nodrop > > > > In the first 2 tests, I try to avoid to queue more requests than we have room > > for in the CQ ring. These work fine, I have no faults. > > Thanks! > > > In the tests 3 and 4, if IORING_FEAT_NODROP is supported, I try to submit as > > much as I can until I get a -EBUSY, but they often fail in this way: > > the submitter manages to submit everything, the receiver receives all the > > submitted bytes, but the cleaner loses completion events (I also tried to put a > > timeout to epoll_wait() in the cleaner to be sure that it is not related to the > > patch that I send some weeks ago, but the situation doesn't change, it's like > > there is still overflow in the CQ). > > > > Next week I'll try to investigate better which is the problem. > > Does it change if you have an io_uring_enter() with GETEVENTS set? I wonder if > you just pruned the CQ ring but didn't flush the internal side. > Just an update: after the "io_uring: flush overflowed CQ events in the io_uring_poll()" the test 3 works well. Now the problem is the test 4 (with sqpoll). It works in most cases, but it fails a few times in this way: - the submitter freezes after submitting X requests - the cleaner and the consumer see X-2 requests (2 are the entries in the queue) I tried to put a timeout on the submitter's epoll and do an io_uring_submit() to wake up the kthread (if we lose some notifications), but the problem seems to be somewhere else. I think a race somewhere. Any suggestion on how to debug this case? I'll try with tracing. Thanks, Stefano