Hi BTW, my apologies but I should have specified the kernel I am running: 90206ac99c1f25b7f7a4c2c40a0b9d4561ffa9bf On Sat, Feb 8, 2020 at 9:26 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > Hi > > On 2/8/2020 4:55 PM, Glauber Costa wrote: > > Hi > > > > I've been trying to make sense of some weird behavior with the seastar > > implementation of io_uring, and started to suspect a bug in io_uring's > > connect. > > > > The situation is as follows: > > > > - A connect() call is issued (and in the backend I can choose if I use > > uring or not) > > - The connection is supposed to take a while to establish. > > - I call shutdown on the file descriptor > > > > If io_uring is not used: > > - connect() starts by returning EINPROGRESS as expected, and after > > the shutdown the file descriptor is finally made ready for epoll. I > > call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104) > > > > if io_uring is used: > > - if the SQE has the IOSQE_ASYNC flag on, connect() never returns. > > - if the SQE *does not* have the IOSQE_ASYNC flag on, then most of the > > time the test works as intended and connect() returns 104, but > > occasionally it hangs too. Note that, seastar may choose not to call > > io_uring_enter immediately and batch sqes. > > > > Sounds like some kind of race? > > > > I know C++ probably stinks like the devil for you guys, but if you are > > curious to see the code, this fails one of our unit tests: > > > > https://github.com/scylladb/seastar/blob/master/tests/unit/connect_test.cc > > See test_connection_attempt_is_shutdown > > (above is the master seastar tree, not including the io_uring implementation) > > > Is this chaining with connect().then_wrapped() asynchronous? Like kind > of future/promise stuff? Correct. then_wrapped executes eventually when connect returns either success or failure > I wonder, if connect() and shutdown() there may > be executed in the reverse order. The methods connect and shutdown will execute in this order. But connect will just queue something that will later be sent down to the kernel. I initially suspected an ordering issue on my side. What made me start suspecting a bug are two reasons: - I can force the code to grab an sqe and call io_uring_enter at the moment the connect() call happens : I see no change. - that IOSQE_ASYNC changes this behavior, as you acknowledged yourself. It seems to me that if shutdown happens when the sqe is sitting on a kernel queue somewhere the connection will hang forever instead of failing right away as I would expect - if shutdown happens after the call to io_uring_enter > > The hung with IOSQE_ASYNC sounds strange anyway. > > > > Please let me know if this rings a bell and if there is anything I > > should be verifying here > > > > -- > Pavel Begunkov