On Thu, Dec 09, 2021 at 03:02:12PM +0000, Pavel Begunkov wrote: > Don't see how a CQE may get missing, so let me ask a bunch of questions: > > First, let's try out my understanding of your problem. At the beginning you > submit MAX_CONNECTIONS/2 accept requests and _all_ of them complete. correct. > In the main loop you add another bunch of accepts, but you're not getting CQEs > from them. Right ? yes, io_uring_prep_accept_direct() submissions before entering the main loop complete.any io_uring_prep_accept_direct() submitted from within the main loop goes missing. > 1) Anything in dmesg? Please when it got stuck (or what the symptoms are), > don't kill it but wait for 3 minutes and check dmesg again. > nothing in dmesg! > Or you to reduce the waiting time: > "echo 10 > /proc/sys/kernel/hung_task_timeout_secs" oh, my kernel[mek] is missing that; rebuilding right now with `CONFIG_DETECT_HUNG_TASK=y`; will report back after reboot. btw, enabled CONFIG_WQ_WATCHDOG=y for workqueue.watchdog_thresh; don't know if that would help too. let me know. also any magic with bpftrace you would suggest? > And then should if anything wrong it should appear in dmesg max in 20-30 secs > > 2) What kernel version are you running? [mek]: Linux 5.15.6-gentoo-p51 #5 SMP PREEMPT x86_64 i7-7700HQ > 3) Have you tried normal accept (non-direct)? no, will try, but accept_direct worked for me before introducing pthread into the code. don't know if it matters. > 4) Can try increase the max number io-wq workers exceeds the max number > of inflight requests? Increase RLIMIT_NPROC, E.g. set it to > RLIMIT_NPROC = nr_threads + max inflight requests. i only have 1 thread atm but will try this with the new kernel and report back. > 5) Do you get CQEs when you shutdown listening sockets? yes! io_uring_prep_close_direct() call, there is only one inside dq_msg(), come in on subsequent arrival of connect() requests from the client. tested with and without IOSQE_ASYNC set. > 6) Do you check return values of io_uring_submit()? > > 7) Any variability during execution? E.g. a different number of > sockets get accepted. with IORING_SETUP_SQPOLL, i was getting different numbers for: pending, = io_uring_sq_ready(ring); vs submitted, = io_uring_submit(ring); according to the commented block at the beginning of the event loop. don't if that's the way to check what you're asking. let me know please. thanks for the help, - jrun