On 4/26/21 10:15 AM, Christoph Hellwig wrote: > On Mon, Apr 26, 2021 at 09:12:09AM -0600, Jens Axboe wrote: >> Here's the series. It's not super clean (yet), but basically allows >> users like io_uring to setup a bio cache, and pass that in through >> iocb->ki_bi_cache. With that, we can recycle them instead of going >> through free+alloc continually. If you look at profiles for high iops, >> we're spending more time than desired doing just that. >> >> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-bio-cache > > So where do you spend the cycles? The do not memset the whole bio > optimization is pretty obvious and is someting we should do independent > of the allocator. memset is just a small optimization on top. If we look at current profiles, the alloc+free looks something ala: + 2.71% io_uring [kernel.vmlinux] [k] bio_alloc_bioset + 2.03% io_uring [kernel.vmlinux] [k] kmem_cache_alloc and + 2.82% io_uring [kernel.vmlinux] [k] __slab_free + 1.73% io_uring [kernel.vmlinux] [k] kmem_cache_free 0.36% io_uring [kernel.vmlinux] [k] mempool_free_slab 0.27% io_uring [kernel.vmlinux] [k] mempool_free Which is a substantial amount of cycles that is needed just to repeatedly use the same set of bios for doing IO. Using the caching patchset, all of the above are completely eliminated, and the only thing we dynamically allocate is a request which is a lot cheaper (ends up being 1-2% for either kernel). > The other thing that sucks is the mempool implementation, as it forces > each allocation and free to do an indirect call. I think it might be > worth to try to frontend it with a normal slab cache and only fall back > to the mempool if that fails. Also minor I believe, but yes it'll eat cycles too. FWIW, the testing above is done without RETPOLINE. -- Jens Axboe