On Fri, 8 Jan 2016 12:03:48 +0900 Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote: > On Thu, Jan 07, 2016 at 03:04:23PM +0100, Jesper Dangaard Brouer wrote: > > This patch introduce a new API call kfree_bulk() for bulk freeing > > memory objects not bound to a single kmem_cache. > > > > Christoph pointed out that it is possible to implement freeing of > > objects, without knowing the kmem_cache pointer as that information is > > available from the object's page->slab_cache. Proposing to remove the > > kmem_cache argument from the bulk free API. > > > > Jesper demonstrated that these extra steps per object comes at a > > performance cost. It is only in the case CONFIG_MEMCG_KMEM is > > compiled in and activated runtime that these steps are done anyhow. > > The extra cost is most visible for SLAB allocator, because the SLUB > > allocator does the page lookup (virt_to_head_page()) anyhow. > > > > Thus, the conclusion was to keep the kmem_cache free bulk API with a > > kmem_cache pointer, but we can still implement a kfree_bulk() API > > fairly easily. Simply by handling if kmem_cache_free_bulk() gets > > called with a kmem_cache NULL pointer. > > > > This does increase the code size a bit, but implementing a separate > > kfree_bulk() call would likely increase code size even more. > > > > Below benchmarks cost of alloc+free (obj size 256 bytes) on > > CPU i7-4790K @ 4.00GHz, no PREEMPT and CONFIG_MEMCG_KMEM=y. > > > > Code size increase for SLAB: > > > > add/remove: 0/0 grow/shrink: 1/0 up/down: 74/0 (74) > > function old new delta > > kmem_cache_free_bulk 660 734 +74 > > > > SLAB fastpath: 85 cycles(tsc) 21.468 ns (step:0) > > sz - fallback - kmem_cache_free_bulk - kfree_bulk > > 1 - 101 cycles 25.291 ns - 41 cycles 10.499 ns - 130 cycles 32.522 ns > > This looks experimental error. Why does kfree_bulk() takes more time > than fallback? This does look like an experimental error. Sometimes instabilities occurs, when slab_caches gets merged, but I tried to counter that by using boot param slab_nomerge. In the case for SLAB kfree_bulk() single object, then it can be slower than the fallback, because it will likely always hit a branch mispredict for the kfree case (which is okay, as that is not the case we optimize for, single obj free). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>