On 12/8/21 19:07, jrun wrote:
hello, - this may very well be something simple i'm missing so apologies in advance. - _some_ calls to io_uring_prep_accept_direct() never make it back from kernel! or they seems so... since io_uring_prep_accept_direct() is a new introduction to io_uring i thought i check with you first and get some help if possible.
Don't see how a CQE may get missing, so let me ask a bunch of questions: First, let's try out my understanding of your problem. At the beginning you submit MAX_CONNECTIONS/2 accept requests and _all_ of them complete. In the main loop you add another bunch of accepts, but you're not getting CQEs from them. Right? 1) Anything in dmesg? Please when it got stuck (or what the symptoms are), don't kill it but wait for 3 minutes and check dmesg again. Or you to reduce the waiting time: "echo 10 > /proc/sys/kernel/hung_task_timeout_secs" And then should if anything wrong it should appear in dmesg max in 20-30 secs 2) What kernel version are you running? 3) Have you tried normal accept (non-direct)? 4) Can try increase the max number io-wq workers exceeds the max number of inflight requests? Increase RLIMIT_NPROC, E.g. set it to RLIMIT_NPROC = nr_threads + max inflight requests. 5) Do you get CQEs when you shutdown listening sockets? 6) Do you check return values of io_uring_submit()? 7) Any variability during execution? E.g. a different number of sockets get accepted.
--------- TEST_PROG: --------- this msg has a git repo bundled which has the crap i've put together where i encounter this. to compile/run it do this, save the bundle somewhere, say under `/tmp/` and then do: ``` cd /tmp/ git clone wsub.git wsub cd wsub # maybe have a look at build.sh before running the following # it will install a single binary under ~/.local/bin # also it will fire up the binary, the server part, wsub, right away sh build.sh # then from a different terminal cd /tmp/wsub/client # in zsh, use seq for bash MAX_CONNECTIONS=4; for i in {0..$MAX_CONNECTIONS}; do ./client foo; done ``` srv starts listening on a *abstract* unix socket, names after the binary which should turn up in the output of this, if you have ss(8) installed: `ss -l -x --socket=unix_seqpacket` it will be called `@wsub` if you don't change anything. client bit just sends it's first arg, "foo" in this case, to the server, and srv prints it out into it's stderr. -------- PROBLEM: -------- every calls to io_uring_prep_accept_direct() via q_accept(), before entering event_loop(), main.c:587, get properly completed, but subsequent calls to io_uring_prep_accept_direct() after entering event_loop(), main.c:487 `case ACCEPT:`, never turn up on ring's cq! you will notice that all other submissions inside event_loop(), to the same ring, get completed fine. note also that io_uring_prep_accept_direct() completions make it once there is a new connection! running the client bit one-by-one might illustrate the point better. i also experimented with using IORING_SETUP_SQPOLL, different articles but same result for io_uring_prep_accept_direct() submissions. thoughts? - jrun
-- Pavel Begunkov