On 18/07/2020 17:37, Jens Axboe wrote: > On 7/18/20 2:32 AM, Pavel Begunkov wrote: >> For my a bit exaggerated test case perf continues to show high CPU >> cosumption by io_dismantle(), and so calling it io_iopoll_complete(). >> Even though the patch doesn't yield throughput increase for my setup, >> probably because the effect is hidden behind polling, but it definitely >> improves relative percentage. And the difference should only grow with >> increasing number of CPUs. Another reason to have this is that atomics >> may affect other parallel tasks (e.g. which doesn't use io_uring) >> >> before: >> io_iopoll_complete: 5.29% >> io_dismantle_req: 2.16% >> >> after: >> io_iopoll_complete: 3.39% >> io_dismantle_req: 0.465% > > Still not seeing a win here, but it's clean and it _should_ work. For > some reason I end up getting the offset in task ref put growing the > fput_many(). Which doesn't (on the surface) make a lot of sense, but > may just mean that we have some weird side effects. It grows because the patch is garbage, the second condition is always false. See the diff. Could you please drop both patches? diff --git a/fs/io_uring.c b/fs/io_uring.c index 87a772eee0c4..2f02f85269eb 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1847,8 +1847,9 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req) io_queue_next(req); if (req->flags & REQ_F_TASK_PINNED) { - if (req->task != rb->task && rb->task) { - put_task_struct_many(rb->task, rb->task_refs); + if (req->task != rb->task) { + if (rb->task) + put_task_struct_many(rb->task, rb->task_refs); rb->task = req->task; rb->task_refs = 0; } -- Pavel Begunkov