Hint, this leads up to discussing if current bulk *ALLOC* API need to be changed... Alex and I have been working hard on practical use-case for SLAB bulking (mostly slUb), in the network stack. Here is a summary of what we have learned so far. Bulk free'ing SKBs during TX completion is a big and easy win. Specifically for slUb, normal path for freeing these objects (which are not on c->freelist) require a locked double_cmpxchg per object. The bulk free (via detached freelist patch) allow to free all objects belonging to the same slab-page, to be free'ed with a single locked double_cmpxchg. Thus, the bulk free speedup is quite an improvement. The slUb alloc is hard to beat on speed: * accessing c->freelist, local cmpxchg 9 cycles (38% of cost) * c->freelist is refilled with single locked cmpxchg In micro benchmarking it looks like we can beat alloc, because we do a local_irq_{disable,enable} (cost 7 cycles). And then pull out all objects in c->freelist. Thus, saving 9 cycles per object (counting from the 2nd object). However, in practical use-cases we are seeing the single object alloc win over bulk alloc, we believe this to be due to prefetching. When c->freelist get (semi) cache-cold, then it gets more expensive to walk the freelist (which is a basic single linked list to next free object). For bulk alloc the full freelist is walked (right-way) and objects pulled out into the array. For normal single object alloc only a single object is returned, but it does a prefetch on the next object pointer. Thus, next time single alloc is called the object will have been prefetched. Doing prefetch in bulk alloc only helps a little, as it does not have enough "time" between accessing/walking the freelist for objects. So, how can we solve this and make bulk alloc faster? Alex and I had the idea of bulk alloc returns an "allocator specific cache" data-structure (and we add some helpers to access this). In the slUb case, the freelist is a single linked pointer list. In the network stack the skb objects have a skb->next pointer, which is located at the same position as freelist pointer. Thus, simply returning the freelist directly, could be interpreted as a skb-list. The helper API would then do the prefetching, when pulling out objects. For the slUb case, we would simply cmpxchg either c->freelist or page->freelist with a NULL ptr, and then own all objects on the freelist. This also reduce the time we keep IRQs disabled. API wise, we don't (necessary) know how many objects are on the freelist (without first walking the list, which would cause stalls on data, which we are trying to avoid). Thus, the API of always returning the exact number of requested objects will not work... -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer (related to http://thread.gmane.org/gmane.linux.kernel.mm/137469) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>