On Wed, Oct 23, 2024 at 03:15:19PM -0600, Uday Shankar wrote: > From: Xinyu Zhang <xizhang@xxxxxxxxxxxxxxx> > > blk_rq_map_user_bvec contains a check bytes + bv->bv_len > nr_iter which > causes unnecessary failures in NVMe passthrough I/O, reproducible as > follows: > > - register a 2 page, page-aligned buffer against a ring > - use that buffer to do a 1 page io_uring NVMe passthrough read > > The second (i = 1) iteration of the loop in blk_rq_map_user_bvec will > then have nr_iter == 1 page, bytes == 1 page, bv->bv_len == 1 page, so > the check bytes + bv->bv_len > nr_iter will succeed, causing the I/O to > fail. This failure is unnecessary, as when the check succeeds, it means > we've checked the entire buffer that will be used by the request - i.e. > blk_rq_map_user_bvec should complete successfully. Therefore, terminate > the loop early and return successfully when the check bytes + bv->bv_len > > nr_iter succeeds. For anyone interested, here are the details on how to reproduce the issue described above: # cat test.c #include <fcntl.h> #include <stdio.h> #include <string.h> #include <sys/ioctl.h> #include <liburing.h> #include <stdlib.h> #include <assert.h> #include <linux/nvme_ioctl.h> int main(int argc, char *argv[]) { struct io_uring ring; assert(io_uring_queue_init(1, &ring, IORING_SETUP_SQE128 | IORING_SETUP_CQE32) == 0); void *buf = memalign(4096, 2 * 4096); printf("buf %p\n", buf); struct iovec iov = { .iov_base = buf, .iov_len = 2 * 4096, }; assert(io_uring_register_buffers(&ring, &iov, 1) == 0); struct io_uring_sqe *sqe = io_uring_get_sqe(&ring); assert(sqe != NULL); int fd = open("/dev/ng0n1", O_RDONLY); assert(fd > 0); sqe->fd = fd; sqe->opcode = IORING_OP_URING_CMD; sqe->cmd_op = NVME_URING_CMD_IO; sqe->buf_index = 0; sqe->flags = 0; sqe->uring_cmd_flags = IORING_URING_CMD_FIXED; struct nvme_passthru_cmd *cmd = &sqe->cmd; cmd->opcode = 2; // read cmd->nsid = 1; cmd->data_len = 1 * 4096; cmd->addr = buf; struct io_uring_cqe *cqe; assert(io_uring_submit(&ring) == 1); assert(io_uring_wait_cqe(&ring, &cqe) == 0); printf("res %d\n", cqe->res); return 0; } # gcc -o test -luring test.c test.c: In function ‘main’: test.c:15:17: warning: implicit declaration of function ‘memalign’ [-Wimplicit-function-declaration] 15 | void *buf = memalign(4096, 2 * 4096); | ^~~~~~~~ test.c:15:17: warning: initialization of ‘void *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] test.c:36:37: warning: initialization of ‘struct nvme_passthru_cmd *’ from incompatible pointer type ‘__u8 (*)[0]’ {aka ‘unsigned char (*)[]’} [-Wincompatible-pointer-types] 36 | struct nvme_passthru_cmd *cmd = &sqe->cmd; | ^ test.c:40:15: warning: assignment to ‘__u64’ {aka ‘long long unsigned int’} from ‘void *’ makes integer from pointer without a cast [-Wint-conversion] 40 | cmd->addr = buf; | # ./test buf 0x406000 res -22