On 25/06/2020 21:27, Jens Axboe wrote: > Currently links are always done in an async fashion, unless we > catch them inline after we successfully complete a request without > having to resort to blocking. This isn't necessarily the most efficient > approach, it'd be more ideal if we could just use the task_work handling > for this. Well, you beat me on this. As mentioned, I was going to rebase it after lending iopoll fixes. Nice numbers! A small comment below, but LGTM. I'll review more formally on a fresh head. Could you push it to a branch? My other patches would conflict. > > Outside of saving an async jump, we can also do less prep work for > these kinds of requests. > > Running dependent links from the task_work handler yields some nice > performance benefits. As an example, examples/link-cp from the liburing > repository uses read+write links to implement a copy operation. Without > this patch, the a cache fold 4G file read from a VM runs in about > 3 seconds: > > $ time examples/link-cp /data/file /dev/null > > real 0m2.986s > user 0m0.051s > sys 0m2.843s > > and a subsequent cache hot run looks like this: > > $ time examples/link-cp /data/file /dev/null > > real 0m0.898s > user 0m0.069s > sys 0m0.797s > > With this patch in place, the cold case takes about 2.4 seconds: > > $ time examples/link-cp /data/file /dev/null > > real 0m2.400s > user 0m0.020s > sys 0m2.366s > > and the cache hot case looks like this: > > $ time examples/link-cp /data/file /dev/null > > real 0m0.676s > user 0m0.010s > sys 0m0.665s > > As expected, the (mostly) cache hot case yields the biggest improvement, > running about 25% faster with this change, while the cache cold case > yields about a 20% increase in performance. Outside of the performance > increase, we're using less CPU as well, as we're not using the async > offload threads at all for this anymore. > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > --- > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 0bba12e4e559..389274a078c8 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c ... > > +static void io_sq_thread_drop_mm(struct io_ring_ctx *ctx) > +{ > + struct mm_struct *mm = current->mm; > + > + if (mm) { > + kthread_unuse_mm(mm); > + mmput(mm); > + } > +} > + > +static int io_sq_thread_acquire_mm(struct io_ring_ctx *ctx, > + struct io_kiocb *req) > +{ > + if (io_op_defs[req->opcode].needs_mm && !current->mm) { > + if (unlikely(!mmget_not_zero(ctx->sqo_mm))) > + return -EFAULT; > + kthread_use_mm(ctx->sqo_mm); > + } > + > + return 0; > +} ... > +static void __io_req_task_submit(struct io_kiocb *req) > +{ > + struct io_ring_ctx *ctx = req->ctx; > + > + __set_current_state(TASK_RUNNING); > + if (!io_sq_thread_acquire_mm(ctx, req)) { My last patch replaced it with "__" version. Is it merge problems or intended as this? > + mutex_lock(&ctx->uring_lock); > + __io_queue_sqe(req, NULL, NULL); > + mutex_unlock(&ctx->uring_lock); > + } else { > + __io_req_task_cancel(req, -EFAULT); > + } > +} > + -- Pavel Begunkov