Re: [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf

Jens Axboe <axboe@xxxxxxxxx> · Thu, 31 Oct 2024 08:29:37 -0600

On 10/31/24 7:25 AM, Pavel Begunkov wrote:
> On 10/30/24 02:43, Jens Axboe wrote:
>> On 10/29/24 8:03 PM, Ming Lei wrote:
>>> On Tue, Oct 29, 2024 at 03:26:37PM -0600, Jens Axboe wrote:
>>>> On 10/29/24 2:06 PM, Jens Axboe wrote:
>>>>> On 10/29/24 1:18 PM, Jens Axboe wrote:
> ...
>>>> +    node->buf = imu;
>>>> +    node->kbuf_fn = kbuf_fn;
>>>> +    return node;
>>>
>>> Also this function needs to register the buffer to table with one
>>> pre-defined buf index, then the following request can use it by
>>> the way of io_prep_rw_fixed().
>>
>> It should not register it with the table, the whole point is to keep
>> this node only per-submission discoverable. If you're grabbing random
>> request pages, then it very much is a bit finicky 
> 
> Registering it into the table has enough of design and flexibility
> merits: error handling, allowing any type of dependencies of requests
> by handling it in the user space, etc.

Right, but it has to be a special table. See my lengthier reply to Ming.
The initial POC did install it into a table, it's just a one-slot table,
io_submit_state. I think the right approach is to have an actual struct
io_rsrc_data local_table in the ctx, with refs put at the end of submit.
Same kind of concept, just allows for more entries (potentially), with
the same requirement that nodes get put when submit ends. IOW, requests
need to find it within the same submit.

Obviously you would not NEED to do that, but if the use case is grabbing
bvecs out of a request, then it very much should not be discoverable
past the initial assignments within that submit scope.

>> and needs to be of
>> limited scope.
> 
> And I don't think we can force it, neither with limiting exposure to
> submission only nor with the Ming's group based approach. The user can
> always queue a request that will never complete and/or by using
> DEFER_TASKRUN and just not letting it run. In this sense it might be
> dangerous to block requests of an average system shared block device,
> but if it's fine with ublk it sounds like it should be fine for any of
> the aforementioned approaches.

As long as the resource remains valid until the last put of the node,
then it should be OK. Yes the application can mess things up in terms of
latency if it uses one of these bufs for eg a read on a pipe that never
gets any data, but the data will remain valid regardless. And that's
very much a "doctor it hurts when I..." case, it should not cause any
safety issues. It'll just prevent progress for the other requests that
are using that buffer, if they need the final put to happen before
making progress.

-- 
Jens Axboe