* Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote: >> It all looks like pure old-fashioned straight overhead in the >> networking layer to me. Do we still touch the same global cacheline >> for every localhost packet we process? Anything like that would >> show up big time. > > Yes we do, I find strange we dont see dst_release() in your NMI > profile > > I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387 > net: make sure struct dst_entry refcount is aligned on 64 bytes) (in > net-next-2.6 tree) to properly align struct dst_entry refcounter and > got 4% speedup on tbench on my machine. Ouch, +4% from a oneliner networking change? That's a _huge_ speedup compared to the things we were after in scheduler land. A lot of scheduler folks worked hard to squeeze the last 1-2% out of the scheduler fastpath (which was not trivial at all). The _full_ scheduler accounts for only about 7% of the total system overhead here on a 16-way box... So why should we be handling this anything but a plain networking performance regression/weakness? The localhost scalability bottleneck has been reported a _long_ time ago. Ingo -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html