On Fri, Aug 14, 2009 at 09:04:49AM -0700, Linus Torvalds wrote: > On Fri, 14 Aug 2009, Heiko Carstens wrote: > > > > All of this started when we compared a 2.6.27 based distro kernel > > to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu > > cycles than 2.6.16 does for the same workloads. > > > > As an example: we took a simple one connection ping pong network load. > > (client sends packet to server and waits until packet from server > > returns and game starts again). It uses more cpu cycles on 2.6.27. > > > > Using ftrace we figured out that on 2.6.27 it takes more than 500 function > > calls on the client for a single roundtrip while on 2.6.16 it took only > > ~230 calls. > > Hmm. But the spinlock part of this seems to not really have changed. > > Removing everything but the actual callchain info, and then doing a "diff" > between your two roundtrips, and they look very different, and basically > none of the difference seems to be due to spinlocks. > > It seems to be due to a number of different things, but the bulk of the > new costs seem to be in networking (and to some degree scheduling). There > are smaller differences elsewhere, but the networking code _really_ blows > up. > > I don't know how much of some of these are "real" kernel changes, and how > much of it is less inlining, but spinlocks don't seem to be the issue. > Here's a quick walk-through (some of these get repeated a couple of times > in your traces) That's true. I just wanted to give some background how all of this started. Spinlocks are equally good or bad for s390 on both kernel versions. However quite a few of the additional function calls in networking code come from uninlining: c2aa270a [NET]: uninline skb_push, de-bloats a lot 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot 419ae74e [NET]: uninline skb_trim, de-bloats Christian Ehrhardt identified these and more, but hasn't posted patches yet. Reverting these patches does increase performance. That's when I started wondering what we would gain when we would inline spinlock code again. > * the timer code doing more: > > __timer_stats_timer_set_start_info <-mod_timer > __mod_timer <-mod_timer > __timer_stats_timer_set_start_info <-__mod_timer > lock_timer_base <-__mod_timer > _spin_lock_irqsave <-lock_timer_base > internal_add_timer <-__mod_timer > _spin_unlock_irqrestore <-__mod_timer > qdio_perf_stat_inc <-qdio_outbound_processing > account_system_vtime <-__do_softirq This should be better now: 507e1231 timer stats: Optimize by adding quick check to avoid function calls We did also address our device driver with several patches. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html