On Wed, Feb 12, 2025 at 10:49:15AM +0800, Ming Lei wrote: > On Mon, Feb 10, 2025 at 04:56:44PM -0800, Keith Busch wrote: > > From: Keith Busch <kbusch@xxxxxxxxxx> > > > > Provide new operations for the user to request mapping an active request > > to an io uring instance's buf_table. The user has to provide the index > > it wants to install the buffer. > > > > A reference count is taken on the request to ensure it can't be > > completed while it is active in a ring's buf_table. > > > > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> > > --- > > drivers/block/ublk_drv.c | 145 +++++++++++++++++++++++++--------- > > include/uapi/linux/ublk_cmd.h | 4 + > > 2 files changed, 113 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > index 529085181f355..ccfda7b2c24da 100644 > > --- a/drivers/block/ublk_drv.c > > +++ b/drivers/block/ublk_drv.c > > @@ -51,6 +51,9 @@ > > /* private ioctl command mirror */ > > #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) > > > > +#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF) > > +#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF) > > UBLK_IO_REGISTER_IO_BUF command may be completed, and buffer isn't used > by RW_FIXED yet in the following cases: > > - application doesn't submit any RW_FIXED consumer OP > > - io_uring_enter() only issued UBLK_IO_REGISTER_IO_BUF, and the other > OPs can't be issued because of out of resource > > ... > > Then io_uring_enter() returns, and the application is panic or killed, > how to avoid buffer leak? The death of the uring that registered the node tears down the table that it's registered with, which releases its reference. All good. > It need to deal with in io_uring cancel code for calling ->release() if > the kbuffer node isn't released. There should be no situation here where it isn't released after its use is completed. Either the resource was gracefully unregistered or the ring close while it was still active, but either one drops its reference. > UBLK_IO_UNREGISTER_IO_BUF still need to call ->release() if the node > buffer isn't used. Only once the last reference is dropped. Which should happen no matter which way the node is freed. > > +static void ublk_io_release(void *priv) > > +{ > > + struct request *rq = priv; > > + struct ublk_queue *ubq = rq->mq_hctx->driver_data; > > + > > + ublk_put_req_ref(ubq, rq); > > +} > > It isn't enough to just get & put request reference here between registering > buffer and freeing the registered node buf, because the same reference can be > dropped from ublk_commit_completion() which is from queueing > UBLK_IO_COMMIT_AND_FETCH_REQ, and buggy app may queue this command multiple > times for freeing the request. > > One solution is to not allow request completion until the ->release() is > returned. Double completions are tricky because the same request id can be reused pretty quickly and there's no immediate way to tell if the 2nd completion is a double or a genuine completion of the reused request. We have rotating sequence numbers in the nvme driver to try to detect a similar situation. So far it hasn't revealed any real bugs as far as I know. This feels like the other side screwed up and that's their fault.