On 5/9/22 3:23 AM, Ming Lei wrote: > This is the driver part of userspace block driver(ubd driver), the other > part is userspace daemon part(ubdsrv)[1]. > > The two parts communicate by io_uring's IORING_OP_URING_CMD with one > shared cmd buffer for storing io command, and the buffer is read only for > ubdsrv, each io command is indexed by io request tag directly, and > is written by ubd driver. > > For example, when one READ io request is submitted to ubd block driver, ubd > driver stores the io command into cmd buffer first, then completes one > IORING_OP_URING_CMD for notifying ubdsrv, and the URING_CMD is issued to > ubd driver beforehand by ubdsrv for getting notification of any new io request, > and each URING_CMD is associated with one io request by tag. > > After ubdsrv gets the io command, it translates and handles the ubd io > request, such as, for the ubd-loop target, ubdsrv translates the request > into same request on another file or disk, like the kernel loop block > driver. In ubdsrv's implementation, the io is still handled by io_uring, > and share same ring with IORING_OP_URING_CMD command. When the target io > request is done, the same IORING_OP_URING_CMD is issued to ubd driver for > both committing io request result and getting future notification of new > io request. > > Another thing done by ubd driver is to copy data between kernel io > request and ubdsrv's io buffer: > > 1) before ubsrv handles WRITE request, copy the request's data into > ubdsrv's userspace io buffer, so that ubdsrv can handle the write > request > > 2) after ubsrv handles READ request, copy ubdsrv's userspace io buffer > into this READ request, then ubd driver can complete the READ request > > Zero copy may be switched if mm is ready to support it. > > ubd driver doesn't handle any logic of the specific user space driver, > so it should be small/simple enough. This is pretty interesting! Just one small thing I noticed, since you want to make sure batching is Good Enough: > +static blk_status_t ubd_queue_rq(struct blk_mq_hw_ctx *hctx, > + const struct blk_mq_queue_data *bd) > +{ > + struct ubd_queue *ubq = hctx->driver_data; > + struct request *rq = bd->rq; > + struct ubd_io *io = &ubq->ios[rq->tag]; > + struct ubd_rq_data *data = blk_mq_rq_to_pdu(rq); > + blk_status_t res; > + > + if (ubq->aborted) > + return BLK_STS_IOERR; > + > + /* this io cmd slot isn't active, so have to fail this io */ > + if (WARN_ON_ONCE(!(io->flags & UBD_IO_FLAG_ACTIVE))) > + return BLK_STS_IOERR; > + > + /* fill iod to slot in io cmd buffer */ > + res = ubd_setup_iod(ubq, rq); > + if (res != BLK_STS_OK) > + return BLK_STS_IOERR; > + > + blk_mq_start_request(bd->rq); > + > + /* mark this cmd owned by ubdsrv */ > + io->flags |= UBD_IO_FLAG_OWNED_BY_SRV; > + > + /* > + * clear ACTIVE since we are done with this sqe/cmd slot > + * > + * We can only accept io cmd in case of being not active. > + */ > + io->flags &= ~UBD_IO_FLAG_ACTIVE; > + > + /* > + * run data copy in task work context for WRITE, and complete io_uring > + * cmd there too. > + * > + * This way should improve batching, meantime pinning pages in current > + * context is pretty fast. > + */ > + task_work_add(ubq->ubq_daemon, &data->work, TWA_SIGNAL); > + > + return BLK_STS_OK; > +} It'd be better to use bd->last to indicate what kind of signaling you need here. TWA_SIGNAL will force an immediate transition if the app is running in userspace, which may not be what you want. Also see: https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=e788be95a57a9bebe446878ce9bf2750f6fe4974 But regardless of signaling needed, you don't need it except if bd->last is true. Would need a commit_rqs() as well, but that's trivial. More importantly, what prevents ubq->ubq_daemon from going away after it's been assigned? I didn't look at the details, but is this relying on io_uring being closed to cancel pending requests? That should work, but we need some way to ensure that ->ubq_daemon is always valid here. -- Jens Axboe