Re: How do I find the cause of these errors?

Donald Becker <becker@scyld.com> · Sun, 9 Dec 2001 12:27:09 -0500 (EST)

On Fri, 7 Dec 2001, Kenneth Stephen wrote:

> Subject: How do I find the cause of these errors?
> 	Here is the output of my ifconfig :
> 
> eth0      Link encap:Ethernet  HWaddr 00:20:78:05:EA:0B                
>           RX packets:9162662 errors:0 dropped:0 overruns:0 frame:0     
>           TX packets:11040532 errors:1 dropped:0 overruns:1 carrier:0  
>           collisions:0 txqueuelen:100                                  

> 	The kernel used is 2.2.20 and the driver is (for both cards) the
> tulip driver (Comet chip).

The Tx error could be a transient error caused by the driver not
initializing the chip correctly, but it's more likely that the driver is
correctly reporting a Tx FIFO underrun, and has taken action to prevent
a future occurrence.

Where the hardware supports it, my drivers have dynamic Tx FIFO tuning
code.  A typical Ethernet chip has a Tx FIFO that holds data from the
bus before it is transmitted on the wire.  The way this FIFO is
controlled is important for performance.

Ideally you would like to start transmitting as soon as the first Tx
packet data arrives at the chip.  The "Tx FIFO threshold" is a parameter
that specifies "start transmitting when N bytes have arrived at the NIC
chip".  This parameter is initially set for a typical configuration.  But
if a video card or SCSI controller is doing long PCI bursts, the NIC
chip will run out of buffered data before it can get access to the bus
again.  This causes a FIFO underrun.

The driver responds to the FIFO underrun by changing the Tx FIFO
threshold to a higher value.  If this happens enough eventually the
chip will end up in store-and-forward mode, where it doesn't start
transmitting until the whole packet has been transferred.

Some designs, such as the Adaptec Starfire, go one step further and
provide an indication that the FIFO almost ran out of data.  This allows
the driver to tune the setting without risking a Tx error.

It should be rare to see more than one or two Tx FIFO underruns.  Either
the chip has very coarse Tx threshold settings, or the driver increases
the setting in large chunks to keep the PCI bursts on natural boundaries.

> Packet forwarding takes place between eth0 and
> eth1. Given that the one error seems to be on the TX line for both
> interfaces, it seems that the errors originated from this machine. Are
> there any tools I could use / parameters for the driver which I could set
> to allow me to track down what is the cause of these errors?

Depending on the driver you might be able to set the debug level to
report transmit errors.  This will be easier in the future with the
netif_msg_level setting, which will allow enabling just Tx error messages.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html