On 7/20/20 9:22 AM, Pavel Begunkov wrote: > On 18/07/2020 17:37, Jens Axboe wrote: >> On 7/18/20 2:32 AM, Pavel Begunkov wrote: >>> For my a bit exaggerated test case perf continues to show high CPU >>> cosumption by io_dismantle(), and so calling it io_iopoll_complete(). >>> Even though the patch doesn't yield throughput increase for my setup, >>> probably because the effect is hidden behind polling, but it definitely >>> improves relative percentage. And the difference should only grow with >>> increasing number of CPUs. Another reason to have this is that atomics >>> may affect other parallel tasks (e.g. which doesn't use io_uring) >>> >>> before: >>> io_iopoll_complete: 5.29% >>> io_dismantle_req: 2.16% >>> >>> after: >>> io_iopoll_complete: 3.39% >>> io_dismantle_req: 0.465% >> >> Still not seeing a win here, but it's clean and it _should_ work. For >> some reason I end up getting the offset in task ref put growing the >> fput_many(). Which doesn't (on the surface) make a lot of sense, but >> may just mean that we have some weird side effects. > > It grows because the patch is garbage, the second condition is always false. > See the diff. Could you please drop both patches? Hah, indeed. With this on top, it looks like it should in terms of performance and profiles. I can just fold this into the existing one, if you'd like. -- Jens Axboe