On 10/29/24 10:11 PM, Ming Lei wrote: > On Wed, Oct 30, 2024 at 11:08:16AM +0800, Ming Lei wrote: >> On Tue, Oct 29, 2024 at 08:43:39PM -0600, Jens Axboe wrote: > > ... > >>> You could avoid the OP dependency with just a flag, if you really wanted >>> to. But I'm not sure it makes a lot of sense. And it's a hell of a lot >> >> Yes, IO_LINK won't work for submitting multiple IOs concurrently, extra >> syscall makes application too complicated, and IO latency is increased. >> >>> simpler than the sqe group scheme, which I'm a bit worried about as it's >>> a bit complicated in how deep it needs to go in the code. This one >>> stands alone, so I'd strongly encourage we pursue this a bit further and >>> iron out the kinks. Maybe it won't work in the end, I don't know, but it >>> seems pretty promising and it's soooo much simpler. >> >> If buffer register and lookup are always done in ->prep(), OP dependency >> may be avoided. > > Even all buffer register and lookup are done in ->prep(), OP dependency > still can't be avoided completely, such as: > > 1) two local buffers for sending to two sockets > > 2) group 1: IORING_OP_LOCAL_KBUF1 & [send(sock1), send(sock2)] > > 3) group 2: IORING_OP_LOCAL_KBUF2 & [send(sock1), send(sock2)] > > group 1 and group 2 needs to be linked, but inside each group, the two > sends may be submitted in parallel. That is where groups of course work, in that you can submit 2 groups and have each member inside each group run independently. But I do think we need to decouple the local buffer and group concepts entirely. For the first step, getting local buffers working with zero copy would be ideal, and then just live with the fact that group 1 needs to be submitted first and group 2 once the first ones are done. Once local buffers are done, we can look at doing the sqe grouping in a nice way. I do think it's a potentially powerful concept, but we're going to make a lot more progress on this issue if we carefully separate dependencies and get each of them done separately. -- Jens Axboe