On 5/31/22 7:31 AM, Ankit Kumar wrote: > From: Anuj Gupta <anuj20.g@xxxxxxxxxxx> > > Add a new I/O engine (io_uring_cmd) for sending uring passthrough > commands. It will also use most of the existing helpers from the > I/O engine io_uring. The I/O preparation, completion, file open, > file close and post init paths are going to differ and hence > io_uring_cmd will have its own helper for them. > > Add a new io_uring_cmd engine specific option to support nvme > passthrough commands. Filename name for this specific option > must specify nvme-ns generic character device (dev/ngXnY). > This provides io_uring_cmd I/O engine a bandwidth to support > various passthrough commands in future. > > The engine_pos and engine_data fields in struct fio_file are > separated now. This will help I/O engine io_uring_cmd to store > specific data as well as keep track of register files. > > Added a new option cmd_type. This specifies the type of uring > command to submit. Currently it only supports nvme uring command A few minor nits which we don't need to respin this for, unless other changes end up being required. In general this to me looks like it's ready to go, people please holler if they agree / disagree so we can get this finished up. > +static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u) > +{ > + struct ioring_data *ld = td->io_ops_data; > + struct ioring_options *o = td->eo; > + struct fio_file *f = io_u->file; > + struct io_uring_sqe *sqe; > + int ret; > + > + /* nvme_uring_cmd case */ > + if (o->cmd_type == FIO_URING_CMD_NVME) { > + struct nvme_uring_cmd *cmd; This should just be: if (o->cmd_type != FIO_URING_CMD_NVME) return -EINVAL; ... other case ... to prevent unnecessary nesting. > @@ -728,6 +829,61 @@ retry: > return fio_ioring_mmap(ld, &p); > } > > +static int fio_ioring_cmd_queue_init(struct thread_data *td) > +{ > + struct ioring_data *ld = td->io_ops_data; > + struct ioring_options *o = td->eo; > + int depth = td->o.iodepth; > + struct io_uring_params p; > + int ret; > + > + memset(&p, 0, sizeof(p)); > + > + if (o->hipri) > + p.flags |= IORING_SETUP_IOPOLL; > + if (o->sqpoll_thread) { > + p.flags |= IORING_SETUP_SQPOLL; > + if (o->sqpoll_set) { > + p.flags |= IORING_SETUP_SQ_AFF; > + p.sq_thread_cpu = o->sqpoll_cpu; > + } > + } > + if (o->cmd_type == FIO_URING_CMD_NVME) { > + p.flags |= IORING_SETUP_SQE128; > + p.flags |= IORING_SETUP_CQE32; > + } > + > + /* > + * Clamp CQ ring size at our SQ ring size, we don't need more entries > + * than that. > + */ > + p.flags |= IORING_SETUP_CQSIZE; > + p.cq_entries = depth; > + > +retry: > + ret = syscall(__NR_io_uring_setup, depth, &p); > + if (ret < 0) { > + if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) { > + p.flags &= ~IORING_SETUP_CQSIZE; > + goto retry; > + } > + return ret; > + } This retry doesn't really have any reason for existing anymore, as we know we have SETUP_CQSIZE if we have SQE128/CQE32. Hence I think it should just go away, and we should probably have a nice warning print of some sort if we get EINVAL as it means the kernel is too old to support SQE128/CQE32 and hence passthrough in general. That prevents people from emailing me that it doesn't work just because their kernel is too old. Apart from that, this looks good to me. -- Jens Axboe