On 20/07/2020 18:49, Jens Axboe wrote: > On 7/20/20 9:22 AM, Pavel Begunkov wrote: >> On 18/07/2020 17:37, Jens Axboe wrote: >>> On 7/18/20 2:32 AM, Pavel Begunkov wrote: >>>> For my a bit exaggerated test case perf continues to show high CPU >>>> cosumption by io_dismantle(), and so calling it io_iopoll_complete(). >>>> Even though the patch doesn't yield throughput increase for my setup, >>>> probably because the effect is hidden behind polling, but it definitely >>>> improves relative percentage. And the difference should only grow with >>>> increasing number of CPUs. Another reason to have this is that atomics >>>> may affect other parallel tasks (e.g. which doesn't use io_uring) >>>> >>>> before: >>>> io_iopoll_complete: 5.29% >>>> io_dismantle_req: 2.16% >>>> >>>> after: >>>> io_iopoll_complete: 3.39% >>>> io_dismantle_req: 0.465% >>> >>> Still not seeing a win here, but it's clean and it _should_ work. For >>> some reason I end up getting the offset in task ref put growing the >>> fput_many(). Which doesn't (on the surface) make a lot of sense, but >>> may just mean that we have some weird side effects. >> >> It grows because the patch is garbage, the second condition is always false. >> See the diff. Could you please drop both patches? > > Hah, indeed. With this on top, it looks like it should in terms of > performance and profiles. It just shows, that it doesn't really matters for a single-threaded app, as expected. Worth to throw some contention though. I'll think about finding some time to get/borrow a multi-threaded one. > > I can just fold this into the existing one, if you'd like. Would be nice. I'm going to double-check the counter and re-measure anyway. BTW, how did you find it? A tool or a proc file would be awesome. -- Pavel Begunkov