Re: relative openat dirfd reference on submit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/2/20 5:41 PM, Pavel Begunkov wrote:
> On 03/11/2020 00:34, Jens Axboe wrote:
>> On 11/2/20 5:17 PM, Pavel Begunkov wrote:
>>> On 03/11/2020 00:05, Jens Axboe wrote:
>>>> On 11/2/20 1:52 PM, Vito Caputo wrote:
>>>>> Hello list,
>>>>>
>>>>> I've been tinkering a bit with some async continuation passing style
>>>>> IO-oriented code employing liburing.  This exposed a kind of awkward
>>>>> behavior I suspect could be better from an ergonomics perspective.
>>>>>
>>>>> Imagine a bunch of OPENAT SQEs have been prepared, and they're all
>>>>> relative to a common dirfd.  Once io_uring_submit() has consumed all
>>>>> these SQEs across the syscall boundary, logically it seems the dirfd
>>>>> should be safe to close, since these dirfd-dependent operations have
>>>>> all been submitted to the kernel.
>>>>>
>>>>> But when I attempted this, the subsequent OPENAT CQE results were all
>>>>> -EBADFD errors.  It appeared the submit didn't add any references to
>>>>> the dependent dirfd.
>>>>>
>>>>> To work around this, I resorted to stowing the dirfd and maintaining a
>>>>> shared refcount in the closures associated with these SQEs and
>>>>> executed on their CQEs.  This effectively forced replicating the
>>>>> batched relationship implicit in the shared parent dirfd, where I
>>>>> otherwise had zero need to.  Just so I could defer closing the dirfd
>>>>> until once all these closures had run on their respective CQE arrivals
>>>>> and the refcount for the batch had reached zero.
>>>>>
>>>>> It doesn't seem right.  If I ensure sufficient queue depth and
>>>>> explicitly flush all the dependent SQEs beforehand
>>>>> w/io_uring_submit(), it seems like I should be able to immediately
>>>>> close(dirfd) and have the close be automagically deferred until the
>>>>> last dependent CQE removes its reference from the kernel side.
>>>>
>>>> We pass the 'dfd' straight on, and only the async part acts on it.
>>>> Which is why it needs to be kept open. But I wonder if we can get
>>>> around it by just pinning the fd for the duration. Since you didn't
>>>> include a test case, can you try with this patch applied? Totally
>>>> untested...
>>>
>>> afaik this doesn't pin an fd in a file table, so the app closes and
>>> dfd right after submit and then do_filp_open() tries to look up
>>> closed dfd. Doesn't seem to work, and we need to pass that struct
>>> file to do_filp_open().
>>
>> Yeah, I just double checked, and it's just referenced, but close() will
>> still make it NULL in the file table. So won't work... We'll have to
>> live with it for now, I'm afraid.
> 
> Is there a problem with passing in a struct file? Apart from it
> being used deep in open callchains?

No technical problems as far as I can tell, just needs doing...

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux