Intel 1000/PRO GT (e1000 driver) and "Detect Tx Unit Hang" error with 4GB RAM

Steven Alexson <steve@xxxxxxxxxxx> · Fri, 09 Nov 2007 09:57:55 -0500

My system configuration:
ASUS M2A-VM motherboard
AMD Athlon 64
X2 4200+ 2.2 GHz
4x A-DATA 1GB DDR2 800 memory
2x Intel 10/100/1000
Pro/1000 GT Desktop Network Adapter
2x Seagate Barracuda 250GB HD (RAID 1,
software RAID)
CentOS 5 x86_64; Kernel 2.6.23 (custom built); Version
7.6.9.2 e1000 driver

The symptoms of this problem are outlined at:

http://e1000.sourceforge.net/wiki/index.php/Issues
http://e1000.sourceforge.net/wiki/in...p/Tx_Unit_Hang

Last night I started experiencing the "Detected Tx Unit Hang"
problem with the Intel 
e1000 NIC. This happened after I upgraded my system
to 4GB RAM (previously 2GB). I have 2 
of these cards in the system. I
updated the Linux kernel to 2.6.23 and I downloaded from 
Sourceforge and
installed the most recent stable version of the e1000 driver for Linux, 
version 7.6.9.2. I still experiencing the "Detected Tx Unit Hang"
message. I had to 
recompile the kernel because upgrading to 4GB with the
current kernel for CentOS 5 
(2.6.18.8-1) causes an error, ata1: softreset
failed (1st FIS failed), which results in a 
kernel panic. Upgrading the
kernel to 2.6.23 fixed that problem, but now I have a problem 
with my
network cards.

Searching around, I found posts saying that disabling
acpi with the kernel options 
"acpi=off noacpi" would fix it, but
it did not. I tried added explicit modprobe options 
for the driver in
/etc/modprobe.conf (options e1000 XsumRX=0 Speed=1000 Duplex=2 
InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096
RxIntDelay=0 
TxIntDelay=0). Still no change. Still getting experiencing
the problem.

I then tried another suggestion I found in a forum
discussion `ethtool -K eth0 tso=off`. 
Seems to have had no effect on the
problem.

This problem occurs immediately when the system is trying to
bring the device up. I 
cannot even get to a point to try sending traffic
over the network interface because it 
never negotiates an IP address from
DHCP. If I specify a static IP address, the address 
is assigned, but I
still experience the problem, and I cannot even ping another host.

Now, if I reduce the amount of RAM to 3GB or less, everything works fine! So,
this leads 
me to believe that my kernel and driver are configured,
compiled, and functioning 
correctly. It also leads me to believe that
there are no problems with the network cards. 
So, I though perhaps a bad
memory module, but no matter which 3 modules of the 4 I leave 
in, I get
the same results. Everything works fine until I add the 4th module.

Then I found an article on Intel's site saying that some older EEPROM have the
power 
management option turned on, and that could cause the problem. So, I
downloaded the 
script that would fix the bit in the EEPROM (turning off
power management). The script 
says that it does not apply to my version of
the EEPROM. When I run `ethtool -e 
(eth0|eth1)` I do not have the bit on
0x0010 that is set to "de", so I must believe that 
the script is
correct in assessing that it does not apply to my NICs.

So, I thought
that perhaps my power supply could be the problem. Perhaps the PSU doesn't 
supply enough power to power everything when I add the 4th memory module. It
is just a 
generic 300W PSU that came with the case (I have new 500W PSUs
on the way). So, I pulled 
out one of the NICs and disconnected the DVD
drive. That is about all I can eliminate. 
Reducing the hardware installed
made no difference.

I am running the 64-bit kernel, so I should have
no trouble supporting the 4GM RAM, 
correct?.

Now, I am out of
ideas, and I seem to have hit a brick wall. One of the things that 
disturbs me is that all of the articles I have found concerning this problem
are dated 
1-2 years ago.

Can anyone offer me any assistance?

----------------------------------------------------------------

This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos