On 3/1/20 9:18 AM, Pavel Begunkov wrote: > There are several independent parts in the patchset, but bundled > to make a point. > 1-2: random stuff, that implicitly used later. > 3-5: restore @nxt propagation > 6-8: optimise locking in io_worker_handle_work() > 9: optimise io_uring refcounting > > The next propagation bits are done similarly as it was before, but > - nxt stealing is now at top-level, but not hidden in handlers > - ensure there is no with REQ_F_DONT_STEAL_NEXT > > [6-8] is the reason to dismiss the previous @nxt propagation appoach, > I didn't found a good way to do the same. Even though it looked > clearer and without new flag. > > Performance tested it with link-of-nops + IOSQE_ASYNC: > > link size: 100 > orig: 501 (ns per nop) > 0-8: 446 > 0-9: 416 > > link size: 10 > orig: 826 > 0-8: 776 > 0-9: 756 This looks nice, I'll take a closer look tomorrow or later today. Seems that at least patch 2 should go into 5.6 however, so may make sense to order the series like that. BTW, Andres brought up a good point, and that's hashed file write works. Generally they complete super fast (just copying into the page cache), which means that that worker will be hammering the wq lock a lot. Since work N+1 can't make any progress before N completes (since that's how hashed work works), we should pull a bigger batch of these work items instead of just one at the time. I think that'd potentially make a huge difference for the performance of buffered writes. Just throwing it out there, since you're working in that space anyway and the rewards will be much larger. -- Jens Axboe