Hy Linux-net... I experienced some strange behaviour with linux and the arp protocol. 1.) Kernel-Version: 2.6.11.7 plus grsec-patches 2.) Setup: +--------+ | Router | +---+----+ | | +------+-------+ | | | | Transitnet for | | Cluster/Router +----+-----+ +-----+------+ | Primary | | Secondary | +----+-----+ +-----+------+ | | | | LAN +--------------+ Router: C2600 router from ISP Primary: First(active) linux router Secondary: Secondary(standby) linux router Primary/Secondary are configured as a cluster with the heartbeat package. The cluster shares a IP-Alias in the transitnet and many IPs in the LAN-segments. The IP-Alias is always bound to one node at the same time. Following IPs and MACs are used for this example: transit-net: Router: 10.0.0.1/24 | 00:10:F3:09:10:70 Primary: 10.0.0.10/24 | 00:10:F3:09:11:71 Secondary: 10.0.0.11/24 | 00:10:F3:09:12:72 IP-Alias: 10.0.0.20/24 | depends where it ist bound to lan: Primary: 10.1.0.10/24 | 00:10:F3:10:11:71 Secondary: 10.1.0.11/24 | 00:10:F3:10:12:72 IP-Alias: 10.1.0.20/24 | depends where it ist bound to 3.) The Problem First everything works fine. If I fail the primary node, the secondary does the take over. The ARP-Entrys are changing to the MAC of the secondary, and everything is fine. Now if you want to ping/ssh/somewhat the shared IP-Alias in the LAN from the networks behind the C2600 everthing begins: I. The C2600 is able to deliver the IP-packet to the node because it has a valid arp-entry. II. The Linux-machine (secondary) does not have any arp-entrys (because it was inactive for a while) so it has to initiate ARP before it can deliver the answer IP-packet. Then IT HAPPENS: The Linux Box asks in the transit net: 0.000000 10.1.0.20 -> Broadcast ARP Who has 10.0.0.1? Tell 10.1.0.20 Why does Linux make ARP-requests with SRC-IPs from a different subnet? This can't be the expected behaviour... :( BTW: The C2600 is so "smart" to put an entry with "10.1.0.20 -> 00:10:F3:09:12:72" in its ARP-Cache, based on this single ARP-Broadcast from 10.1.0.20 and after a failback to the primary nobody can reach the 10.1.0.20... :-) 4.) Solution: Dirty Userspace Fix Ping the C2600 from the primary/secondary infinitely. The same does a ping-group in heartbeat. This can't be the real truth... ;-) 5.) Solution: Dirty Kernel-Patch With my skillful hands I wrote a dirty hack: <patch> --- arp.c Fri Jan 13 16:44:06 2006 +++ arp.c.new Fri Jan 13 16:43:52 2006 @@ -342,9 +342,9 @@ switch (IN_DEV_ARP_ANNOUNCE(in_dev)) { default: case 0: /* By default announce any local IP */ - if (skb && inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL) + /* if (skb && inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL) saddr = skb->nh.iph->saddr; - break; + break; */ case 1: /* Restrict announcements of saddr in same subnet */ if (!skb) break; </patch> 6.) Solution: Clean Kernel-Patch Can anybody improve this patch above to a clean one so that it finds it way to the vanilla kernel? bye richard -- Richard Müller Geschäftsführer Technik team(ix) GmbH Powering Enterprise Linux Networks Südwestpark 35 90449 Nürnberg fon: +49 (911) 30999- 0 fax: +49 (911) 30999-99 mail: rm@xxxxxxxxx web: http://www.teamix.de vcf: http://www.teamix.de/vcf/rm.vcf gpg: 296C 0BAF 8FC8 DCE2 99BD 5777 FA73 ECDC F9F1 8FF7 - : send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html