On Jul 14, 2019, at 9:38 PM, Zhengyuan Liu <liuzhengyuan@xxxxxxxxxx> wrote: > > >> On 7/14/19 5:44 AM, Jens Axboe wrote: >>> On 7/12/19 10:54 PM, Zhengyuan Liu wrote: >>> As we introduced three lists(async, defer, link), there could been >>> many sqe allocation. A natural idea is using kmem_cache to satisfy >>> the allocation just like io_kiocb does. >> A change like this needs to come with some performance numbers >> or utilization numbers showing the benefit. I have considered >> doing this before, but just never got around to testing if it's >> worth while or not. >> Have you? > I only did some simple testing with fio. The benefit was deeply depend on the IO scenarios. For random and direct IO , it appears to be nearly no seq copying, but for buffered sequential rw, it appears to be more than 60% copying compared to original submition. Right, which is great as it’s then working as designed! But my question was, for that sequential case, what kind of speed up (or reduction in overhead) do you see from allocating the unit out of slab vs kmalloc? There has to be a win there for the change to be worthwhile. — Jens Axboe