Ingo Molnar a écrit :
* David Miller <davem@xxxxxxxxxxxxx> wrote:
From: Ingo Molnar <mingo@xxxxxxx>
Date: Mon, 17 Nov 2008 22:26:57 +0100
eth->h_proto access.
Yes, this is the first time a packet is touched on receive.
Given that this workload does localhost networking, my guess would be
that eth->h_proto is bouncing around between 16 CPUs? At minimum this
read-mostly field should be separated from the bouncing bits.
It's the packet contents, there is no way to "seperate it".
And it should be unlikely bouncing on your system under tbench, the
senders and receivers should hang out on the same cpu unless the
something completely stupid is happening.
That's why I like running tbench with a num_threads command line
argument equal to the number of cpus, every cpu gets the two thread
talking to eachother over the TCP socket.
yeah - and i posted the numbers for that too - it's the same
throughput, within ~1% of noise.
Thinking once again about loopback driver, I recall a previous attempt
to call netif_receive_skb() instead of netif_rx() and pay the price
of cache line ping-pongs between cpus.
http://kerneltrap.org/mailarchive/linux-netdev/2008/2/21/939644
Maybe we could do that, with a temporary percpu stack, like we do in softirq
when CONFIG_4KSTACKS=y
(arch/x86/kernel/irq_32.c : call_on_stack(func, stack)
And do this only if the current cpu doesnt already use its softirq_stack
(think about loopback re-entering loopback xmit because of TCP ACK for example)
Oh well... black magic, you are going to kill me :)
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html