On Thu, 2009-01-22 at 11:47 +0200, Pekka Enberg wrote: > On Thu, 2009-01-22 at 17:28 +0800, Zhang, Yanmin wrote: > > On Thu, 2009-01-22 at 11:15 +0200, Pekka Enberg wrote: > > > On Thu, 2009-01-22 at 16:36 +0800, Zhang, Yanmin wrote: > > > > On Wed, 2009-01-21 at 18:58 -0500, Christoph Lameter wrote: > > > > > On Tue, 20 Jan 2009, Zhang, Yanmin wrote: > > > > > > > > > > > kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache > > > > > > with :0000256. Their order is 1 which means every slab consists of 2 physical pages. > > > > > > > > > > That order can be changed. Try specifying slub_max_order=0 on the kernel > > > > > command line to force an order 0 alloc. > > > > I tried slub_max_order=0 and there is no improvement on this UDP-U-4k issue. > > > > Both get_page_from_freelist and __free_pages_ok's cpu time are still very high. > > > > > > > > I checked my instrumentation in kernel and found it's caused by large object allocation/free > > > > whose size is more than PAGE_SIZE. Here its order is 1. > > > > > > > > The right free callchain is __kfree_skb => skb_release_all => skb_release_data. > > > > > > > > So this case isn't the issue that batch of allocation/free might erase partial page > > > > functionality. > > > > > > So is this the kfree(skb->head) in skb_release_data() or the put_page() > > > calls in the same function in a loop? > > It's kfree(skb->head). > > > > > > > > If it's the former, with big enough size passed to __alloc_skb(), the > > > networking code might be taking a hit from the SLUB page allocator > > > pass-through. > > Do we know what kind of size is being passed to __alloc_skb() in this > case? In function __alloc_skb, original parameter size=4155, SKB_DATA_ALIGN(size)=4224, sizeof(struct skb_shared_info)=472, so __kmalloc_track_caller's parameter size=4696. > Maybe we want to do something like this. > > Pekka > > SLUB: revert page allocator pass-through This patch amost fixes the netperf UDP-U-4k issue. #slabinfo -AD Name Objects Alloc Free %Fast :0000256 1658 70350463 70348946 99 99 kmalloc-8192 31 70322309 70322293 99 99 :0000168 2592 143154 140684 93 28 :0004096 1456 91072 89644 99 96 :0000192 3402 63838 60491 89 11 :0000064 6177 49635 43743 98 77 So kmalloc-8192 appears. Without the patch, kmalloc-8192 hides. kmalloc-8192's default order on my 8-core stoakley is 2. 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's; 2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result is about 10% better than SLUB's. I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it? > > This is a revert of commit aadb4bc4a1f9108c1d0fbd121827c936c2ed4217 ("SLUB: > direct pass through of page size or higher kmalloc requests"). > --- > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 2f5c16b..3bd3662 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html