On Wed, Aug 24, 2022 at 02:09:53PM -0600, Jens Axboe wrote: > On 8/24/22 3:31 AM, Anuj Gupta wrote: > > @@ -457,12 +509,13 @@ static unsigned file_depth(struct submitter *s) > > > > static void init_io(struct submitter *s, unsigned index) > > { > > - struct io_uring_sqe *sqe = &s->sqes[index]; > > + struct io_uring_sqe *sqe = NULL; > > unsigned long offset; > > struct file *f; > > long r; > > > > if (do_nop) { > > + sqe = &s->sqes[index]; > > sqe->opcode = IORING_OP_NOP; > > return; > > I'd just leave the initial assignment rather than have two manual ones > for the same thing, and then just have the 'pt' path override it. > right, makes sense > > @@ -490,6 +543,41 @@ static void init_io(struct submitter *s, unsigned index) > > f->cur_off = 0; > > } > > > > + if (pt) { > > + struct nvme_uring_cmd *cmd; > > + unsigned long long slba; > > + unsigned long long nlb; > > + > > + sqe = &s->sqes[index<<1]; > > sqe = &s->sqes[index << 1]; > > please. > acked > > + if (register_files) { > > + sqe->fd = f->fixed_fd; > > + sqe->flags = IOSQE_FIXED_FILE; > > + } else { > > + sqe->fd = f->real_fd; > > + sqe->flags = 0; > > + } > > + sqe->opcode = IORING_OP_URING_CMD; > > + sqe->user_data = (unsigned long) f->fileno; > > + if (stats) > > + sqe->user_data |= ((unsigned long)s->clock_index << 32); > > + sqe->cmd_op = NVME_URING_CMD_IO; > > + slba = offset / lbs; > > + nlb = bs/lbs - 1; > > You have two expensive integer divisions here. Would make sense to turn > lbs into a shift instead, those are a lot cheaper to do. And then you > can put the expensive part in the setup part rather than have it in the > fast path. > sure, will change division to shift here. Will see if I can move the expensive nvme command formation part in the setup (to the submitter_init function) > Another thing to consider is to make the pt path separate rather than > need a bunch of 'if (pt) ... else ...' in here? > right, for passthru we do things differently compared to existing path primarily during "init_io" and "reap_events_uring". I think we can have a "init_io_pt" and "reap_events_uring_pt" specific to passthru. But i think we can keep using the same "submitter_uring_fn" for both existing path and passthru as that part should still remain the same > > @@ -587,11 +690,14 @@ static int reap_events_uring(struct submitter *s) > > > > f = &s->files[fileno]; > > f->pending_ios--; > > - if (cqe->res != bs) { > > + if (!pt && cqe->res != bs) { > > printf("io: unexpected ret=%d\n", cqe->res); > > if (polled && cqe->res == -EOPNOTSUPP) > > printf("Your filesystem/driver/kernel doesn't support polled IO\n"); > > return -1; > > + } else if (pt && cqe->res != 0) { > > + printf("io: unexpected ret=%d\n", cqe->res); > > + return -1; > > } > > } > > if (stats) { > > These two would be examples of things that would be cleaner with a > separate path too. > > > @@ -1306,11 +1428,12 @@ static void usage(char *argv, int status) > > " -a <bool> : Use legacy aio, default %d\n" > > " -S <bool> : Use sync IO (preadv2), default %d\n" > > " -X <bool> : Use registered ring %d\n" > > - " -P <bool> : Automatically place on device home node %d\n", > > + " -P <bool> : Automatically place on device home node %d\n" > > + " -u <bool> : Use nvme-passthrough I/O, default %d\n", > > argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled, > > fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop, > > stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio, > > - use_sync, register_ring, numa_placement); > > + use_sync, register_ring, numa_placement, pt); > > exit(status); > > Reminds me, should the nsid and lbs be in the submitter struct? > right, will make these two a part of submitter struct > Thanks for doing this work!! > > -- > Jens Axboe > -- Anuj Gupta