Re: [patch 0/3] Allow inlined spinlocks again V3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 14, 2009 at 09:04:49AM -0700, Linus Torvalds wrote:
> On Fri, 14 Aug 2009, Heiko Carstens wrote:
> > 
> > All of this started when we compared a 2.6.27 based distro kernel
> > to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
> > cycles than 2.6.16 does for the same workloads.
> > 
> > As an example: we took a simple one connection ping pong network load.
> > (client sends packet to server and waits until packet from server
> >  returns and game starts again). It uses more cpu cycles on 2.6.27.
> > 
> > Using ftrace we figured out that on 2.6.27 it takes more than 500 function
> > calls on the client for a single roundtrip while on 2.6.16 it took only
> > ~230 calls.
> 
> Hmm. But the spinlock part of this seems to not really have changed.
> 
> Removing everything but the actual callchain info, and then doing a "diff" 
> between your two roundtrips, and they look very different, and basically 
> none of the difference seems to be due to spinlocks.
> 
> It seems to be due to a number of different things, but the bulk of the 
> new costs seem to be in networking (and to some degree scheduling). There 
> are smaller differences elsewhere, but the networking code _really_ blows 
> up.
> 
> I don't know how much of some of these are "real" kernel changes, and how 
> much of it is less inlining, but spinlocks don't seem to be the issue. 
> Here's a quick walk-through (some of these get repeated a couple of times 
> in your traces)

That's true. I just wanted to give some background how all of this started.
Spinlocks are equally good or bad for s390 on both kernel versions.

However quite a few of the additional function calls in networking code
come from uninlining:

c2aa270a [NET]: uninline skb_push, de-bloats a lot
6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
419ae74e [NET]: uninline skb_trim, de-bloats

Christian Ehrhardt identified these and more, but hasn't posted patches
yet. Reverting these patches does increase performance.

That's when I started wondering what we would gain when we would inline
spinlock code again.

>  * the timer code doing more:
> 
> 	__timer_stats_timer_set_start_info <-mod_timer
> 	__mod_timer <-mod_timer
> 	__timer_stats_timer_set_start_info <-__mod_timer
> 	lock_timer_base <-__mod_timer
> 	_spin_lock_irqsave <-lock_timer_base
> 	internal_add_timer <-__mod_timer
> 	_spin_unlock_irqrestore <-__mod_timer
> 	qdio_perf_stat_inc <-qdio_outbound_processing
> 	account_system_vtime <-__do_softirq

This should be better now:

507e1231 timer stats: Optimize by adding quick check to avoid function calls

We did also address our device driver with several patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux