On 25/01/2021 16:31, Jens Axboe wrote: > On 1/25/21 9:25 AM, Pavel Begunkov wrote: >> On 25/01/2021 16:00, Jens Axboe wrote: >>> On 1/25/21 4:42 AM, Pavel Begunkov wrote: >>>> struct io_submit_state is quite big (168 bytes) and going to grow. It's >>>> better to not keep it on stack as it is now. Move it to context, it's >>>> always protected by uring_lock, so it's fine to have only one instance >>>> of it. >>> >>> I don't like this one. Unless you have plans to make it much bigger, >>> I think it should stay on the stack. On the stack, the ownership is >>> clear. >> >> Thinking of it, it's not needed for this series, just traversing a list >> twice is not nice but bearable. >> >> For experiments I was using its persistency across syscalls + grew it >> to 32 to match up completion flush (allocating still by 8) to add req >> memory reuse, but that's out of scope of these patches. >> I haven't got a strong opinion on that one yet, even though >> alloc/dealloc are pretty heavy, this approach may loose allocation >> locality. > > Agree on all of that. Locality is important, but reuse usually gets > pretty useful as long as the total number (and life time) can be > managed. That all was about reqs completed inline, and for those it is pretty easy and without any extra synchronisation. Depending on QD/etc. it slashes 5-25% of overhead (~5-33% t-put boost), from what's left with this series. There are also other tricks extending it to async reqs, but that's rather for hi QD with plugging off and ultra-fast devices. Let's forget about these patches for now and I'll wrap experiments into a patchset sometime later. -- Pavel Begunkov