On 7/18/20 2:32 AM, Pavel Begunkov wrote: > For my a bit exaggerated test case perf continues to show high CPU > cosumption by io_dismantle(), and so calling it io_iopoll_complete(). > Even though the patch doesn't yield throughput increase for my setup, > probably because the effect is hidden behind polling, but it definitely > improves relative percentage. And the difference should only grow with > increasing number of CPUs. Another reason to have this is that atomics > may affect other parallel tasks (e.g. which doesn't use io_uring) > > before: > io_iopoll_complete: 5.29% > io_dismantle_req: 2.16% > > after: > io_iopoll_complete: 3.39% > io_dismantle_req: 0.465% Still not seeing a win here, but it's clean and it _should_ work. For some reason I end up getting the offset in task ref put growing the fput_many(). Which doesn't (on the surface) make a lot of sense, but may just mean that we have some weird side effects. I have applied it, thanks. -- Jens Axboe