On 8/10/21 6:25 AM, Kanchan Joshi wrote: > On Tue, Aug 10, 2021 at 6:40 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> Initialize a bio allocation cache, and mark it as being used for >> IOPOLL. We could use it for non-polled IO as well, but it'd need some >> locking and probably would negate much of the win in that case. > > For regular (non-polled) IO, will it make sense to tie a bio-cache to > each fixed-buffer slot (ctx->user_bufs array). > One bio cache (along with the lock) per slot. That may localize the > lock contention. And it will happen only when multiple IOs are spawned > from the same fixed-buffer concurrently? I don't think it's worth it - the slub overhead is already pretty low, basically turning into a cmpxchg16 for the fast path. But that's a big enough hit for polled IO of this magnitude that it's worth getting rid of. I've attempted bio caches before for non-polled, but the lock + irq dance required for them just means it ends up being moot. Or even if you have per-cpu caches, just doing irq enable/disable means you're back at the same perf where you started, except now you've got extra code... Here's an example from a few years ago: https://git.kernel.dk/cgit/linux-block/log/?h=cpu-alloc-cache -- Jens Axboe