On 1/28/19 6:07 PM, Jann Horn wrote: > On Mon, Jan 28, 2019 at 10:35 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> The submission queue (SQ) and completion queue (CQ) rings are shared >> between the application and the kernel. This eliminates the need to >> copy data back and forth to submit and complete IO. > [...] >> +static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s) >> +{ >> + struct io_sq_ring *ring = ctx->sq_ring; >> + unsigned head; >> + >> + /* >> + * The cached sq head (or cq tail) serves two purposes: >> + * >> + * 1) allows us to batch the cost of updating the user visible >> + * head updates. >> + * 2) allows the kernel side to track the head on its own, even >> + * though the application is the one updating it. >> + */ >> + head = ctx->cached_sq_head; >> + smp_rmb(); >> + if (head == READ_ONCE(ring->r.tail)) >> + return false; >> + >> + head = ring->array[head & ctx->sq_mask]; >> + if (head < ctx->sq_entries) { >> + s->index = head; >> + s->sqe = &ctx->sq_sqes[head]; > > ring->array can be mapped writable into userspace, right? If so: This > looks like a double-read issue; the compiler might assume that > ring->array is not modified concurrently and perform separate memory > accesses for the "if (head < ctx->sq_entries)" check and the > "&ctx->sq_sqes[head]" computation. Please use READ_ONCE()/WRITE_ONCE() > for all accesses to memory that userspace could concurrently modify in > a malicious way. > > There have been some pretty severe security bugs caused by missing > READ_ONCE() annotations around accesses to shared memory; see, for > example, https://www.blackhat.com/docs/us-16/materials/us-16-Wilhelm-Xenpwn-Breaking-Paravirtualized-Devices.pdf > . Slides 35-48 show how the code "switch (op->cmd)", where "op" is a > pointer to shared memory, allowed an attacker to break out of a Xen > virtual machine because the compiler generated multiple memory > accesses. Thanks, I'll update these to use READ/WRITE_ONCE. >> + ctx->cached_sq_head++; >> + return true; >> + } >> + >> + /* drop invalid entries */ >> + ctx->cached_sq_head++; >> + ring->dropped++; >> + smp_wmb(); >> + return false; >> +} > [...] >> +SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, >> + u32, min_complete, u32, flags, const sigset_t __user *, sig, >> + size_t, sigsz) >> +{ >> + struct io_ring_ctx *ctx; >> + long ret = -EBADF; >> + struct fd f; >> + >> + f = fdget(fd); >> + if (!f.file) >> + return -EBADF; >> + >> + ret = -EOPNOTSUPP; >> + if (f.file->f_op != &io_uring_fops) >> + goto out_fput; > > Oh, by the way: If you feel like it, maybe you could add a helper > fdget_typed(int fd, struct file_operations *f_op), or something like > that, so that there is less boilerplate code for first doing fdget(), > then checking ->f_op, and then coding an extra bailout path for that? > But that doesn't really have much to do with your patchset, feel free > to ignore this comment. That's not a bad idea, I think this is a fairly common code pattern. I'll look into it. -- Jens Axboe