Hi, I have a problem with ARP on Linux 2.4.20 (RedHat 2.4.20-18.8 if it matters) which I believe to be a bug. While I'm willing to upgrade the kernel, it appears to be a generic problem. Our web servers are load-balanced via a Foundry ServerIron using DSR - which means the return path of the packets doesn't go through the ServerIron. To allow this to work, the Linux servers have the ServerIron's valid IP address on a loopback interface and the ServerIron routes packets rather than the usual address rewriting that goes on. The relevant interfaces look like this: eth0 Link encap:Ethernet HWaddr 00:04:75:CA:C4:EF inet addr:10.10.10.14 Bcast:10.10.10.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1623551911 errors:0 dropped:0 overruns:1 frame:0 TX packets:1575017402 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:2905003530 (2770.4 Mb) TX bytes:3337437145 (3182.8 Mb) Interrupt:10 Base address:0x8400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:355748 errors:0 dropped:0 overruns:0 frame:0 TX packets:355748 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:237452671 (226.4 Mb) TX bytes:237452671 (226.4 Mb) lo:0 Link encap:Local Loopback inet addr:212.xxx.yyy.9 Mask:255.255.255.255 UP LOOPBACK RUNNING MTU:16436 Metric:1 The default gateway is 10.10.10.1. All this works very well - except we have problems with ARP. After shutting down the web server for a while, the load balancer sees it come back up, but the web server can't route packets outbound at all. Looking into it, the following demonstrates the problem: # arp -d 10.10.10.1 # ping -I 212.xxx.yyy.9 eff.org PING eff.org (209.237.229.14) from 212.xxx.yyy.9 : 56(84) bytes of data. ^C # arp -a | grep 10.10.10.1 ? (10.10.10.1) at <incomplete> on eth0 On eth0, we see: 11:23:55.650514 0:4:75:ca:c4:ef Broadcast arp 42: arp who-has 10.10.10.1 tell 212.xxx.yyy.9 0001 0800 0604 0001 0004 75ca c4ef d4xx yy09 0000 0000 0000 0a0a 0a01 The <incomplete> ARP entry remains, blocking all access via the default gateway. If I miss off the -I 212.xxx.yyy.9, the ARP request originates from 10.10.10.14 instead and everything works fine. The problem only occurs after a time of inactivity, and only if the first ARP request is due to traffic to the 212.xxx.yyy.9 address. Because the incomplete ARP entry remains, traffic that would normally cause valid ARP requests don't generate new requests, causing a complete loss of connectivity. As I understand it, sending an ARP request with a reply address that isn't on the local subnet simply doesn't make sense. Section A.3 of RFC985 also suggests such packets should be dropped by the next hop. The temporary solution is to add static ARP entries for the next hop, which I will do - however, I believe this is a bug with the Linux implementation of ARP and should be fixed. Thanks, Richard - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html