Wierd network outages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,
a few weeks ago, I replaced an old server machine (network gateway and
mail server for a small office network) with a new one, installing linux
from scratch (via RedHat 8.0). The machine is a 1300 MHz Duron on a
Gigabyte mainboard w/512 megs of DDR RAM. I can dig up the model numer
on request - I don't have the manual here.

Since then, we experience wierd network outages on the local network
interface only. The problem comes out of the blue, sometimes
disappearing a few hours later if we don't reboot the machine before
that. Rebooting always help for the moment, but after one to five days
of runtime, the problem resurfaces.

I have problem pinpointing what is actually happening when the outage
occurs. The interface is still up, the media connect LEDs on the switch
are on, the kernel reports no errors whatsoever - it's just that there
are suddenly no incoming packets anymore, and any outgoing packets
remain unanswered. mii-tool, route and ifconfig all report nothing has
changed.

We've tried recompiling the kernel so that the network interface drivers
are built-in rather than built as modules. We replaces the RedHat 2.4.18
kernel with a vanilla 2.4.19 (patched by the netfilter patch-o-matic to
include the most current set of netfilter patches). We even replaced the
no-name Realtek card ("Realtek Semiconductor Co., Ltd. RTL-8139/8139C
(rev 16)" in /proc/pci) with the very same card that worked in the old
server for three years without problems ("Intel Corp. 82557/8/9
[Ethernet Pro 100] (rev 8)"). All those changes had no measurable effect
whatsoever.

The problems appear at random, we have yet to find any pattern
whatsoever; the interface seems to prefer to go down at night when no
one's in the office, but there are at least two machines behind the
gateway that run 24/7 and regularly contact the gateway - in fact, I
ping one of those machines every minute for 55 packets to notify me when
the link goes down. Plus, the problems only appear on the internal
interface - the external one works like a charm, even though that is a
Realtak network card, the same type that was first used for the internal
interface befor replacing it with the Intel one.

Oh yes, the problem always seems to appear close to the full minute. As
I've said, I start a script every minute to ping an internal machine,
and this was the last outage start:

|Sun Apr  6 05:50:01 CEST 2003
|PING 192.168.168.4 (192.168.168.4) from 192.168.168.200 : 56(84) bytes of data.
|64 bytes from 192.168.168.4: icmp_seq=1 ttl=128 time=1.14 ms
|64 bytes from 192.168.168.4: icmp_seq=2 ttl=128 time=0.283 ms
|64 bytes from 192.168.168.4: icmp_seq=3 ttl=128 time=0.197 ms
|64 bytes from 192.168.168.4: icmp_seq=4 ttl=128 time=0.197 ms
|64 bytes from 192.168.168.4: icmp_seq=5 ttl=128 time=0.191 ms
|64 bytes from 192.168.168.4: icmp_seq=6 ttl=128 time=0.270 ms
|64 bytes from 192.168.168.4: icmp_seq=7 ttl=128 time=0.271 ms
|64 bytes from 192.168.168.4: icmp_seq=8 ttl=128 time=0.271 ms
|64 bytes from 192.168.168.4: icmp_seq=9 ttl=128 time=0.267 ms
.
.
.
|64 bytes from 192.168.168.4: icmp_seq=53 ttl=128 time=0.256 ms
|64 bytes from 192.168.168.4: icmp_seq=54 ttl=128 time=0.264 ms
|64 bytes from 192.168.168.4: icmp_seq=55 ttl=128 time=0.268 ms
|
|--- 192.168.168.4 ping statistics ---
|55 packets transmitted, 55 received, 0% loss, time 54013ms
|rtt min/avg/max/mdev = 0.191/0.275/1.143/0.120 ms
|Sun Apr  6 05:51:00 CEST 2003
|Sun Apr  6 05:52:00 CEST 2003
|PING 192.168.168.4 (192.168.168.4) from 192.168.168.200 : 56(84) bytes
|of data.
|From 192.168.168.200 icmp_seq=15 Destination Host Unreachable
|From 192.168.168.200 icmp_seq=16 Destination Host Unreachable
|From 192.168.168.200 icmp_seq=17 Destination Host Unreachable
|From 192.168.168.200 icmp_seq=18 Destination Host Unreachable
|From 192.168.168.200 icmp_seq=19 Destination Host Unreachable


So all packets from the 05:50 try came through (which took 54 seconds),
but none of the 05:51, 05:52 or any later tries.

I've run out of ideas now - I honestly don't know what to try, what to
change or what to replace anymore, as nothing has had the slightest
effect whatsoever. So, I would be immensely grateful for any insight
into the matter that could help me get rid of the outages, a problem
I've never encountered before in six years of linux administration.
I've tried to find anything on this in the archives or with google, but
either no one else ever witnessed problems like these or I haven't uses
the right search words.

Thanks very much in advance for any help.

Greetings
Karim Senoucci

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux