The configuration of ipvs at Facebook is relatively straightforward. All ipvs instances bgp advertise a set of VIPs and the network prefers the nearest one or uses ECMP in the event of a tie. For the uninitiated, ECMP deterministically and statelessly load balances by hashing the packet (usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using that number as an index (basic hash table type logic). The problem is that ICMP packets (which contain really important information like whether or not an MTU has been exceeded) will get a different hash value and may end up at a different ipvs instance. With no information about where to route these packets, they are dropped, creating ICMP black holes and breaking Path MTU discovery. Suddenly, my mom's pictures can't load and I'm fielding midday calls that I want nothing to do with. To address this, this patch set introduces the ability to schedule icmp packets which is gated by a sysctl net.ipv4.vs.schedule_icmp. If set to 0, the old behavior is maintained -- otherwise ICMP packets are scheduled. Alex Gartrell (12): ipvs: pull out ip_vs_try_to_schedule function ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_off ipvs: Add hdr_flags to iphdr ipvs: drop inverse argument to conn_{in,out}_get ipvs: Make ip_vs_schedule aware of inverse iph'es ipvs: add schedule_icmp sysctl ipvs: Use outer header in ip_vs_bypass_xmit_v6 ipvs: attempt to schedule icmp packets ipvs: ensure that ICMP cannot be sent in reply to ICMP ipvs: support scheduling inverse and icmp TCP packets ipvs: support scheduling inverse and icmp UDP packets ipvs: support scheduling inverse and icmp SCTP packets include/net/ip_vs.h | 101 ++++++++++++----- net/netfilter/ipvs/ip_vs_conn.c | 12 +- net/netfilter/ipvs/ip_vs_core.c | 190 +++++++++++++++++++------------- net/netfilter/ipvs/ip_vs_ctl.c | 8 +- net/netfilter/ipvs/ip_vs_proto_ah_esp.c | 17 ++- net/netfilter/ipvs/ip_vs_proto_sctp.c | 35 ++++-- net/netfilter/ipvs/ip_vs_proto_tcp.c | 37 +++++-- net/netfilter/ipvs/ip_vs_proto_udp.c | 26 ++++- net/netfilter/ipvs/ip_vs_xmit.c | 9 +- 9 files changed, 287 insertions(+), 148 deletions(-) -- Alex Gartrell <agartrell@xxxxxx> -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html