On 2/12/21 8:33 AM, Aneesh Kumar K.V wrote: > On 2/12/21 8:45 PM, Jens Axboe wrote: >> On 2/11/21 11:59 PM, Aneesh Kumar K.V wrote: >>> >>> Hi, >>> >>> I am trying to estabilish the behaviour we should expect when passing a >>> buffer with memory keys attached to io_uring syscalls. As show in the >>> blow test >>> >>> /* >>> * gcc -Wall -O2 -D_GNU_SOURCE -o pkey_uring pkey_uring.c -luring >>> */ >>> #include <stdio.h> >>> #include <fcntl.h> >>> #include <string.h> >>> #include <stdlib.h> >>> #include <unistd.h> >>> #include <sys/mman.h> >>> #include "liburing.h" >>> >>> #define PAGE_SIZE (64 << 10) >>> >>> int main(int argc, char *argv[]) >>> { >>> int fd, ret, pkey; >>> struct io_uring ring; >>> struct io_uring_sqe *sqe; >>> struct io_uring_cqe *cqe; >>> struct iovec iovec; >>> void *buf; >>> >>> if (argc < 2) { >>> printf("%s: file\n", argv[0]); >>> return 1; >>> } >>> >>> ret = io_uring_queue_init(1, &ring, IORING_SETUP_SQPOLL); >>> if (ret < 0) { >>> fprintf(stderr, "queue_init: %s\n", strerror(-ret)); >>> return 1; >>> } >>> >>> fd = open(argv[1], O_RDONLY | O_DIRECT); >>> if (fd < 0) { >>> perror("open"); >>> return 1; >>> } >>> >>> if (posix_memalign(&buf, PAGE_SIZE, PAGE_SIZE)) >>> return 1; >>> iovec.iov_base = buf; >>> iovec.iov_len = PAGE_SIZE; >>> >>> //mprotect(buf, PAGE_SIZE, PROT_NONE); >>> pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); >>> pkey_mprotect(buf, PAGE_SIZE, PROT_READ | PROT_WRITE, pkey); >>> >>> >>> sqe = io_uring_get_sqe(&ring); >>> if (!sqe) { >>> perror("io_uring_get_sqe"); >>> return 1; >>> } >>> io_uring_prep_readv(sqe, fd, &iovec, 1, 0); >>> >>> ret = io_uring_submit(&ring); >>> if (ret != 1) { >>> fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret)); >>> return 1; >>> } >>> >>> ret = io_uring_wait_cqe(&ring, &cqe); >>> >>> if (cqe->res < 0) >>> fprintf(stderr, "iouring submit failed %s\n", strerror(-cqe->res)); >>> else >>> fprintf(stderr, "iouring submit success\n"); >>> >>> io_uring_cqe_seen(&ring, cqe); >>> >>> /* >>> * let's access this via a read syscall >>> */ >>> ret = read(fd, buf, PAGE_SIZE); >>> if (ret < 0) >>> fprintf(stderr, "read failed : %s\n", strerror(errno)); >>> >>> close(fd); >>> io_uring_queue_exit(&ring); >>> >>> return 0; >>> } >>> >>> A read syscall do fail with EFAULT. But we allow read via io_uring >>> syscalls. Is that ok? Considering memory keys are thread-specific we >>> could debate that kernel thread can be considered to be the one that got all access >>> allowed via keys or we could update that access is denied via kernel >>> thread for any key value other than default key (key 0). Other option >>> is to inherit the memory key restrictions when doing >>> io_uring_submit() and use the same when accessing the userspace from >>> kernel thread. >>> >>> Any thoughts here with respect to what should be behaviour? >> >> It this a powerpc thing? I get -EFAULT on x86 for both reads, io_uring >> and regular syscall. That includes SQPOLL, not using SQPOLL, or >> explicitly setting IOSQE_ASYNC on the sqe. >> > > Interesting, I didn't check x86 because i don't have hardware that > supports memory keys. I am trying to make ppc64 behavior compatible with > other archs here. > > IIUC, in your test io_wqe/sqe kernel thread did hit access fault when > touching the buffer on x86? That is different from what Dave explained > earlier. Yes, all four methods (task inline, task_work, SQPOLL, io-wq offload) return -EFAULT for me on x86. > With the patch 8c511eff1827 ("powerpc/kuap: Allow kernel thread to > access userspace after kthread_use_mm") I now have key 0 access allowed > but all other keys denied with ppc64. I was planning to change that to > allow all key access based on reply from Dave. I would be curious to > understand what made x86 deny the access and how did kthread inherit the > key details. I'm not very familiar with the memory protection for pkeys and how it's done on various archs, so not going to be of much help there... But io_uring assumes the right mm for any of these accesses, so if it's tied to that, then it should work as it does on x86. -- Jens Axboe