On 11/2/20 5:41 PM, Pavel Begunkov wrote: > On 03/11/2020 00:34, Jens Axboe wrote: >> On 11/2/20 5:17 PM, Pavel Begunkov wrote: >>> On 03/11/2020 00:05, Jens Axboe wrote: >>>> On 11/2/20 1:52 PM, Vito Caputo wrote: >>>>> Hello list, >>>>> >>>>> I've been tinkering a bit with some async continuation passing style >>>>> IO-oriented code employing liburing. This exposed a kind of awkward >>>>> behavior I suspect could be better from an ergonomics perspective. >>>>> >>>>> Imagine a bunch of OPENAT SQEs have been prepared, and they're all >>>>> relative to a common dirfd. Once io_uring_submit() has consumed all >>>>> these SQEs across the syscall boundary, logically it seems the dirfd >>>>> should be safe to close, since these dirfd-dependent operations have >>>>> all been submitted to the kernel. >>>>> >>>>> But when I attempted this, the subsequent OPENAT CQE results were all >>>>> -EBADFD errors. It appeared the submit didn't add any references to >>>>> the dependent dirfd. >>>>> >>>>> To work around this, I resorted to stowing the dirfd and maintaining a >>>>> shared refcount in the closures associated with these SQEs and >>>>> executed on their CQEs. This effectively forced replicating the >>>>> batched relationship implicit in the shared parent dirfd, where I >>>>> otherwise had zero need to. Just so I could defer closing the dirfd >>>>> until once all these closures had run on their respective CQE arrivals >>>>> and the refcount for the batch had reached zero. >>>>> >>>>> It doesn't seem right. If I ensure sufficient queue depth and >>>>> explicitly flush all the dependent SQEs beforehand >>>>> w/io_uring_submit(), it seems like I should be able to immediately >>>>> close(dirfd) and have the close be automagically deferred until the >>>>> last dependent CQE removes its reference from the kernel side. >>>> >>>> We pass the 'dfd' straight on, and only the async part acts on it. >>>> Which is why it needs to be kept open. But I wonder if we can get >>>> around it by just pinning the fd for the duration. Since you didn't >>>> include a test case, can you try with this patch applied? Totally >>>> untested... >>> >>> afaik this doesn't pin an fd in a file table, so the app closes and >>> dfd right after submit and then do_filp_open() tries to look up >>> closed dfd. Doesn't seem to work, and we need to pass that struct >>> file to do_filp_open(). >> >> Yeah, I just double checked, and it's just referenced, but close() will >> still make it NULL in the file table. So won't work... We'll have to >> live with it for now, I'm afraid. > > Is there a problem with passing in a struct file? Apart from it > being used deep in open callchains? No technical problems as far as I can tell, just needs doing... -- Jens Axboe