On Mon, Oct 14, 2024 at 07:40:40PM +0100, Pavel Begunkov wrote: > On 10/11/24 16:45, Ming Lei wrote: > > On Fri, Oct 11, 2024 at 08:41:03AM -0600, Jens Axboe wrote: > > > On 10/11/24 8:20 AM, Ming Lei wrote: > > > > On Fri, Oct 11, 2024 at 07:24:27AM -0600, Jens Axboe wrote: > > > > > On 10/10/24 9:07 PM, Ming Lei wrote: > > > > > > On Thu, Oct 10, 2024 at 08:39:12PM -0600, Jens Axboe wrote: > > > > > > > On 10/10/24 8:30 PM, Ming Lei wrote: > > > > > > > > Hi Jens, > ... > > > > > > Suppose we have N consumers OPs which depends on OP_BUF_UPDATE. > > > > > > > > > > > > 1) all N OPs are linked with OP_BUF_UPDATE > > > > > > > > > > > > Or > > > > > > > > > > > > 2) submit OP_BUF_UPDATE first, and wait its completion, then submit N > > > > > > OPs concurrently. > > > > > > > > > > Correct > > > > > > > > > > > But 1) and 2) may slow the IO handing. In 1) all N OPs are serialized, > > > > > > and 1 extra syscall is introduced in 2). > > > > > > > > > > Yes you don't want do do #1. But the OP_BUF_UPDATE is cheap enough that > > > > > you can just do it upfront. It's not ideal in terms of usage, and I get > > > > > where the grouping comes from. But is it possible to do the grouping in > > > > > a less intrusive fashion with OP_BUF_UPDATE? Because it won't change any > > > > > > > > The most of 'intrusive' change is just on patch 4, and Pavel has commented > > > > that it is good enough: > > > > > > > > https://lore.kernel.org/linux-block/ZwZzsPcXyazyeZnu@fedora/T/#m551e94f080b80ccbd2561e01da5ea8e17f7ee15d > > Trying to catch up on the thread. I do think the patch is tolerable and > mergeable, but I do it adds quite a bit of complication to the path if > you try to have a map in what state a request can be and what I admit that sqe group adds a little complexity to the submission & completion code, especially dealing with completion code. But with your help, patch 4 has become easy to follow and sqe group is well-defined now, and it does add new feature of N:M dependency, otherwise one extra syscall is required for supporting N:M dependency, this way not only saves one syscall, but also simplify application. > dependencies are there, and then patches after has to go to every each > io_uring opcode and add support for leased buffers. And I'm afraid Only fast IO(net, fs) needs it, not see other OPs for such support. > that we'll also need to feedback from completion of those to let > the buffer know what ranges now has data / initialised. One typical > problem for page flipping rx, for example, is that you need to have > a full page of data to map it, otherwise it should be prezeroed, > which is too expensive, same problem you can have without mmap'ing > and directly exposing pages to the user.