* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, 17 Nov 2008, Ingo Molnar wrote: > > > > this function _really_ hurts from a 16-bit op: > > > > ffffffff8048943e: 6503 66 c7 83 a8 00 00 00 movw $0x0,0xa8(%rbx) > > ffffffff80489445: 0 00 00 > > ffffffff80489447: 174101 5b pop %rbx > > I don't think that is it, actually. The 16-bit store just before it > had a zero count, even though anything that executes the second one > will always execute the first one too. yeah - look at the followup bits that identify the likely real source of that overhead: >> _But_, the real overhead probably comes from: >> >> ffffffff804b7210: 10867 48 8b 54 24 58 mov 0x58(%rsp),%rdx >> >> which is the next line, the ttl field: >> >> 373 iph->ttl = ip_select_ttl(inet, &rt->u.dst); >> >> this shows that we are doing a hard cachemiss on the net-localhost >> route dst structure cacheline. We do a plain load instruction from >> it here and get a hefty cachemiss. (because 16 CPUs are banging on >> that single route) >> >> And let make sure we see this in perspective as well: that single >> cachemiss is _1.0 percent_ of the total tbench cost. (!) We could >> make the scheduler 10% slower straight away and it would have less >> of a real-life effect than this single iph->ttl field setting. -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html