Re: gigabit Ethernet with slow PowerPC VME processor

Fred Gray <fegray@socrates.berkeley.edu> · Mon, 28 Apr 2003 00:48:22 -0700

On Sun, Apr 27, 2003 at 08:35:01PM -0700, Feldman, Scott wrote:
> > Also, I wasn't able to find anything relating to transmit 
> > checksum offloading under any condition in the e1000 driver 
> > source code.  Is it there somewhere, 
> > but just hiding? 
> 
> See NETIF_F_HW_CSUM.  This is where we advertise Tx checksum offload
> support.  Also see e100_tx_csum().  This routine checks skb->ip_summed
> to see if we need to schedule the hardware to do the checksum calcs.

(Apologies to any of you who are also on linuxppc-dev who are seeing these
results twice.)

Yes, thanks, I understand this path now.  Also, some profiling measurements
made it clear that the checksum is really being offloaded.  Using ordinary
socket calls, these are the leading entries:

  5838 total                                      0.0059
  3263 ppc_irq_dispatch_handler                   5.7855
  1645 csum_partial_copy_generic                  7.4773
   133 e1000_intr                                 0.8750
    89 do_softirq                                 0.3477
    69 tcp_sendmsg                                0.0149

In zero-copy mode, writing data to a temporary file and using sendfile() to
transmit it, this is the situation--the copy and checksum have been
successfully offloaded to the interface:

  5983 total                                      0.0061
  4740 ppc_irq_dispatch_handler                   8.4043
   614 e1000_intr                                 4.0395
    61 e1000_clean_tx_irq                         0.1113
    52 do_tcp_sendpages                           0.0179
    51 do_softirq                                 0.1992

The two morals of this story are that the checksum is not the main problem, 
but that interrupt handling is.  Moreover, it's not the e1000 interrupt 
handler that takes the time, it's the generic powerpc interrupt handling.

To add to the confusion, I tried increasing the interrupt throttle rate by a 
factor of 4.  Throughput improved for small MTUs, but there was no change
in throughput for an MTU of 16000.  For this large MTU, there were only about
1250 interrupts per second arriving, and the profile traces didn't look any
different.  I really don't understand this.  I've asked about it on a 
powerpc specific list, since it doesn't seem to necessarily be a networking
issue per se.  However, any suggestions about what to check next would
certainly be welcome.

Thanks again for your help, and especially many thanks to John Heffner for 
extensive off-list guidance,

-- Fred

-- Fred Gray / Visiting Postdoctoral Researcher                         --
-- Department of Physics / University of California, Berkeley           --
-- fegray@socrates.berkeley.edu / phone 510-642-4057 / fax 510-642-9811 --
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html