Re: CentOS based router dropping connections

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



To reply to myself, I'm pulling my hair out about this one, here's some more information:

I've simplified the problem into just simply wanting to download files from the server at the hosting facility. No iptables, no port forwarding, just download a file through apache directly from the server. I was still getting errors even trying to do that from the Dell 860 server, which (among all the other things I tested and read about) made me think it was that server (well, the driver on the server).

So yesterday, I built up a simple/cheap replacement server to stand in while I fix this one, went to the hosting facility, pulled out the "problem" server, and brought it back to the office. Everything seemed to work fine with the replacement server, confirming my suspicions that it was the TG3 driver... but only for a couple of hours. Now I'm right back to square-one, dropping connections! The replacement server is having the exact same problems! Arg!

The problem only seems to exhibit itself when the server is "busy" (which is most of the time, so it's hard to diagnose). Right after I'd replaced the "problem" server, the site stayed non-busy for a few hours, and everything seemed to work just fine. Just FYI, it's a 10 Mbit drop from the hosting facility, and during the daytime we're at around 100% use from about 10AM to 8PM.

So basically, what I can figure from all of the evidence at this point is the problem is either: default configuration of the network in CentOS isn't proper for what I'm doing (can't handle the traffic or number of connections). I get a decent amount of traffic, maxing out a 10 Mbit connection all day long. I don't know exactly where to check to diagnose if this is the case though. Can anybody point me where to find things like the system usage of the network (memory, any buffers, # of connections, etc)? the things I know to check look normal, but that's basically just ifconfig, and your standard /var/log/message and dmesg log files.
or:
the network drop from the hosting facility is "bad" somehow, either the cable physically, or the way in which they are limiting me to 10 Mbit.

Any ideas?

Thanks for all your help, and any help in advance,
-Jesse

Jesse Cantara wrote:
Actually, I spoke too soon.

Setting the NIC to 100 Mbit did not fix the issue, I just happened to misdiagnose a fix, because it seemed to be working for quite some time, but it is back to the old problems.

Basically, I'm at wits end right now. I'm going to go down to the colocation and see if they can test the network drop into our cabinet. If it's not that, then I'm convinced it's the tg3 driver.

-Jesse

Jesse Cantara wrote:
The problem ended up being the "tg3" Broadcom NIC kernel module driver. It doesn't work properly at Gigabit speeds. Turning it down to 100 Megabit fixed the issue. Does anybody know where I should report this bug?

Thanks for all your help,
-Jesse

William L. Maltby wrote:
On Fri, 2007-07-20 at 12:29 -0400, Jesse Cantara wrote:
Hi Bob,

<snip>

The issue I'm having is that external traffic is being forwarded properly, BUT that it drops the connection occasionally. It's not consistent (maybe 2 out of 5 downloads from the internet through the router to the webserver will drop), and the connections are being made, so it's not a fundamental configuration issue. It's something more sneaky. I'm thinking that there's something in the kernel or network driver that isn't functioning properly, or maybe a buffer that is becoming full and abandoning the connection?

<snip>

-Jesse

Bob Chiodini wrote:

Jesse Cantara wrote:
Hello,

I am trying to figure out a problem I'm having using CentOS on a machine as a router. The short story is: any traffic routed through the router seems to get disconnected at random occasionally.

<snip>

Someone recently posted a thread about a similar complaint to the lists
recently. IIRC, the [SOLVED] post mentioned a problem with MTU being
smaller than some of the packets received at one point, causing
fragmentation, and the next step not being to reassemble the packet
because of a certain flag being set.

I don't remember which bit the flag was and no little about this, but I
remember the general gist.

Maybe your problem is similar?

HTH
--
Bill

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux