On 5/1/19 9:09 AM, Mark Rutland wrote: > On Wed, May 01, 2019 at 06:41:43AM -0600, Jens Axboe wrote: >> On 5/1/19 4:30 AM, Mark Rutland wrote: >>> On Tue, Apr 30, 2019 at 12:11:59PM -0600, Jens Axboe wrote: >>>> On 4/30/19 11:03 AM, Mark Rutland wrote: >>>>> I've just had a go at that, but when using kvmalloc() with or without >>>>> GFP_KERNEL_ACCOUNT I hit OOM and my system hangs within a few seconds with the >>>>> syzkaller prog below: >>>>> >>>>> ---- >>>>> Syzkaller reproducer: >>>>> # {Threaded:false Collide:false Repeat:false RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false EnableNetDev:false EnableNetReset:false EnableCgroups:false EnableBinfmtMisc:false EnableCloseFds:false UseTmpDir:false HandleSegv:false Repro:false Trace:false} >>>>> r0 = io_uring_setup(0x378, &(0x7f00000000c0)) >>>>> sendmsg$SEG6_CMD_SET_TUNSRC(0xffffffffffffffff, &(0x7f0000000240)={&(0x7f0000000000)={0x10, 0x0, 0x0, 0x40000000}, 0xc, 0x0, 0x1, 0x0, 0x0, 0x10}, 0x800) >>>>> io_uring_register$IORING_REGISTER_BUFFERS(r0, 0x0, &(0x7f0000000000), 0x1) >>>>> ---- >>>>> >>>>> ... I'm a bit worried that opens up a trivial DoS. >>>>> >>>>> Thoughts? >>>> >>>> Can you post the patch you used? >>> >>> Diff below. >> >> And the reproducer, that was never posted. > > It was; the "Syzakller reproducer" above is the reproducer I used with > syz-repro. > > I've manually minimized that to C below. AFAICT, that hits a leak, which > is what's triggering the OOM after the program is run a number of times > with the previously posted kvmalloc patch. > > Per /proc/meminfo, that memory isn't accounted anywhere. > >> Patch looks fine to me. Note >> that buffer registration is under the protection of RLIMIT_MEMLOCK. >> That's usually very limited for non-root, as root you can of course >> consume as much as you want and OOM the system. > > Sure. > > As above, it looks like there's a leak, regardless. The leak is that we're not releasing imu->bvec in case of error. I fixed a missing kfree -> kvfree as well in your patch, with this rolled up version it works for me. diff --git a/fs/io_uring.c b/fs/io_uring.c index 18cecb6a0151..3e817d40fb96 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2443,7 +2443,7 @@ static int io_sqe_buffer_unregister(struct io_ring_ctx *ctx) if (ctx->account_mem) io_unaccount_mem(ctx->user, imu->nr_bvecs); - kfree(imu->bvec); + kvfree(imu->bvec); imu->nr_bvecs = 0; } @@ -2533,11 +2533,11 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, ret = 0; if (!pages || nr_pages > got_pages) { - kfree(vmas); - kfree(pages); - pages = kmalloc_array(nr_pages, sizeof(struct page *), + kvfree(vmas); + kvfree(pages); + pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); - vmas = kmalloc_array(nr_pages, + vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *), GFP_KERNEL); if (!pages || !vmas) { @@ -2549,7 +2549,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, got_pages = nr_pages; } - imu->bvec = kmalloc_array(nr_pages, sizeof(struct bio_vec), + imu->bvec = kvmalloc_array(nr_pages, sizeof(struct bio_vec), GFP_KERNEL); ret = -ENOMEM; if (!imu->bvec) { @@ -2588,6 +2588,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, } if (ctx->account_mem) io_unaccount_mem(ctx->user, nr_pages); + kvfree(imu->bvec); goto err; } @@ -2610,12 +2611,12 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, ctx->nr_user_bufs++; } - kfree(pages); - kfree(vmas); + kvfree(pages); + kvfree(vmas); return 0; err: - kfree(pages); - kfree(vmas); + kvfree(pages); + kvfree(vmas); io_sqe_buffer_unregister(ctx); return ret; } -- Jens Axboe