Pavel Begunkov <asml.silence@xxxxxxxxx> writes: > On 4/1/23 01:04, Gabriel Krisman Bertazi wrote: >> Pavel Begunkov <asml.silence@xxxxxxxxx> writes: >>> I didn't try it, but kmem_cache vs kmalloc, IIRC, doesn't bring us >>> much, definitely doesn't spare from locking, and the overhead >>> definitely wasn't satisfactory for requests before. >> There is no locks in the fast path of slub, as far as I know. it has >> a >> per-cpu cache that is refilled once empty, quite similar to the fastpath >> of this cache. I imagine the performance hit in slub comes from the >> barrier and atomic operations? > > Yeah, I mean all kinds of synchronisation. And I don't think > that's the main offender here, the test is single threaded without > contention and the system was mostly idle. > >> kmem_cache works fine for most hot paths of the kernel. I think this > > It doesn't for io_uring. There are caches for the net side and now > in the block layer as well. I wouldn't say it necessarily halves > performance but definitely takes a share of CPU. Right. My point is that all these caches (block, io_uring) duplicate what the slab cache is meant to do. Since slab became a bottleneck, I'm looking at how to improve the situation on their side, to see if we can drop the caching here and in block/. >> If it is indeed a significant performance improvement, I guess it is >> fine to have another user of the cache. But I'd be curious to know how >> much of the performance improvement you mentioned in the cover letter is >> due to this patch! > > It was definitely sticking out in profiles, 5-10% of cycles, maybe > more That's surprisingly high. Hopefully we will can avoid this caching soon. For now, feel free to add to this patch: Reviewed-by: Gabriel Krisman Bertazi <krisman@xxxxxxx> -- Gabriel Krisman Bertazi