Re: 2.6.19 tg3 Broadcom 5704 problems/questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

MCE's, it's looking very very likely to be a power issue to me guys.
Try a much beefier PSU in one of the machines causing trouble. Or of
cause it could be an underlying problem with power distribution
and/or de-coupling on the motherboard. Check the power/de-coupling
capacitors for "bulging". Let me know how you get on.

Dan...

Wednesday, March 28, 2007, 5:59:15 AM, you wrote:

> Hi Brecht,

> On Wed, 28 Mar 2007, Brecht Vermeulen wrote:

>> we are running multiple systems with same motherboard and NICs and get
>> the same problems under heavy load, with e.g. rsync and network block
>> device. I debugged already the hell out of it with all options of the
>> NICs (offloading on/off, ASF, jumbo frames and normal frames, ...), 32
>> bit/64 bit but could always get network lockups, sometimes only after 4
>> hours of heavy load. I got e.g. also MCE errors sometimes, but also
>> machines without those errors got the locks.

> Sorry to hear you're in the same boat.

> We're still having the problem, but haven't had a chance recently to take
> another look.  I'm hoping to before the end of this week.

> Were you able to notice any difference between having ASF enabled vs. ASF
> disabled?  We noticed that the driver could reset the adapter with ASF
> disabled (I don't know how consistantly this could happen), but seemed to
> NOT be able to reset with ASF enabled.

> We're also having trouble with MCE's on other systems (bad memory), after
> which our compute nodes (also H8SSL-i's) start spraying invalid crap onto
> the network (after crash, attach another system w/ crossover cable and
> watch from another machine, byte counters increases, packet counters do
> not).

>> So, I guess there is something wrong with that motherboard (not sure if
>> it's only the NICs, only the motherboard, or the combination of both).

> I'll bring you into a conversation I'm having with someone from SuperMicro
> in another email thread.

>> For one of our production servers, we've put a 32 bit intel nic in a PCI
>> slot and it is stable now (although 1Gb/s is out of sight :-( ).

> We're trying to avoid having to do this.

> I'll send the other email shortly.

> Thanks!
> Paul



>> Paul Armor wrote:
>>> Hi,
>>>
>>> On Tue, 13 Mar 2007, Neil Horman wrote:
>>>>> I'll summarize what our problems and config's are.
>>>>>
>>>>> Problems - lockups on ethernet controllers under heavy NFS loads
>>>>>          (sometimes driver can/will reset, sometimes not)
>>>>>        systems completely lock up
>>>>> Hardware - Supermicro H8SSL-i with onboard Broadcom 5704's (both clients
>>>>>          and servers)
>>>>> Server config - 2.6.19 kernel (thus tg3 ver 3.69)
>>>>>        nfs-utils-1.0.7-13 FC4
>>>>>        NIC running at 4500 MTU
>>>> What on earth is that?  I assume you are configured for jumbo frames
>>>> through your whole network, but why not bump your mtu all the way up
>>>> to 9000 then?
>>>
>>> yes, we're configured to allow upto 9000 MTU, but we're using 4500 as
>>> that was the intersection of performance with regards to switch topology
>>> (don't ask), cpu overhead with the tg3 driver (in 2.6.11, at least), and
>>> throughput (using a variety of canned benchmarky things).
>>>
>>>> Does the problem persist if you only use a 1500 byte MTU?
>>>
>>> Don't know, we're theoretically in production mode (when the machines
>>> are all up that the same time).
>>>
>>>>> Failure caused by users building software in automounted FS's.
>>>> Can you get a sysrq-t when the system locks up?
>>>
>>> Will try the next time it craps out, and I can still get console access.
>>>
>>> Thanks,
>>> Paul
>>>
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-net" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>



--

Dan Searle
Adelix Ltd
dan.searle@xxxxxxxxxx web: www.adelix.com
tel: 0845 230 9590 / fax: 0845 230 9591 / support: 0845 230 9592
snail: The Old Post Office, Bristol Rd, Hambrook, Bristol BS16 1RY. UK.

Adelix Ltd is a registered company in England & Wales No. 4232156
VAT registration number 779 4232 91
Adelix Ltd is BS EN ISO 9001:2000 Certified (No. GB 12763)

Any views expressed in this email communication are those
of the individual sender, except where the sender specifically states
them to be the views of a member of Adelix Ltd.  Adelix Ltd. does not
represent, warrant or guarantee that the integrity of this communication
has been maintained nor that the communication is free of errors or
interference.


------------------------------------------------------------------------------------
Scanned for viruses, spam and offensive content by CensorNet MailSafe

Professional Web & E-mail Filtering from www.censornet.com
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux