On Thu, Jan 10, 2008 at 03:50:37PM +0100, Raphael Vallazza wrote: > Hello, > >>> i wrote a very simple patch for ipvs that enables a kernel config >>> option that allows to choose where IPVS intercepts incoming >>> connections. These are the options: >>> - LOCAL_IN (default: works as usual) >>> - PRE_ROUTING (puts LVS input right after the mangle PREROUTING and >>> before the nat PREROUTING chain) >> >> neat. I thought it was hard enough to move that it wouldn't be just an >> option :-) > > Hehe, yes, it was pretty easy ;) First up, thanks for the patch. Its clean and looks obviously correct. I think we should aim to get this in 2.4.25 (I think its a bit late for 2.4.24, but perhaps it can go in there). >> what we'd really like is ipvs hooked into the FORWARD chain. Can you >> do this too? > > To be honest i don't understand the reason for hooking LVS into the > FORWARD chain, because this way it would not get the LOCAL_IN traffic > and at the same time it would have the same NAT problem as with the > LOCAL_IN hook. Maybe i'm missing something, but it seems that PREROUTING > is the best point for LVS to act like a real router, because it gets > packets that haven't been NATed yet. > The only negative thing is that traffic can't be filtered in a regular > way, but using fwmark and the mangle table the user can select the > traffic that has to be handled by LVS. > > If you like i can also add this option, but i'm not sure if it's really > useful :) I had always thought in terms of FORWARD being a good place as a) I'm not particularly interested in LVS handling LOCAL_IN traffic, even though that it currently all it can do and b) I hadn't considered NAT. But you do make some good points there, and I'm begining to think that PREROUTING is correct. The main thing that I would like to see is that LVS can load-balance traffic without needing the VIP to be a local address. PREROUTING seems to allow this, so it does look like a good idea to me. >>> I tried it on some test boxes and it seems to work pretty well, i'll >>> do some stress testing in the next few days. I could send you a setup >>> example if you like... Great. My other techincal question was do you think it would be possible/desirable to make it a run-time option. I'm particularly thinking of people with distro kernels and an adversion to recompiling, as is the case for enterprise-types. Specifically about the patch. Below is a rediffed version that addresses a few cosmetic problems. * Rediffed for net-2.6.25, available from git.kernel.org. This or its cousin net-2.6 are the trees that the patch is likely to go through to get into the kernel. - As a breif explanation. net-2.6.X is for new code. net-2.6 is for bug features and orthoganal features. Your patch could probably go into either tree. * Whitespace - There seem to be a few places where 8 spaces have been used instead of a tab - There seems to be a few places where a line ends in trailing whitespace - And there were a few places where random leading whistepace was present in the diff context. - Perhaps your local version of the code has mangled whitespace, perhaps your mail-reader mangled it on sending, or perhaps I did something silly at my end. * The commenting style in LVS often doesn't follow the prefered style for linux networking code. I believe that for multi-line comments that is: /* some * text */ Could you provide a sign-off line as described in section 5 of http://linux.yyz.us/patch-format.html ? That would be great. Lastly, if you run your patches through ./scripts/checkpatch.pl it will pick up a lot of trivial style errors. Which is good because its really easy to forget something simple. Thanks again. -- Horms Index: net-2.6.25/net/ipv4/ipvs/ip_vs_core.c =================================================================== --- net-2.6.25.orig/net/ipv4/ipvs/ip_vs_core.c 2008-01-11 12:57:55.000000000 +0900 +++ net-2.6.25/net/ipv4/ipvs/ip_vs_core.c 2008-01-11 13:00:09.000000000 +0900 @@ -1026,6 +1026,7 @@ ip_vs_forward_icmp(unsigned int hooknum, static struct nf_hook_ops ip_vs_ops[] __read_mostly = { +#ifdef CONFIG_IP_VS_INPUT_LOCAL_IN /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be * applied to IPVS. */ @@ -1036,6 +1037,20 @@ static struct nf_hook_ops ip_vs_ops[] __ .hooknum = NF_INET_LOCAL_IN, .priority = 100, }, +#endif +#ifdef CONFIG_IP_VS_INPUT_PRE_ROUTING + /* Intercept incoming connections before DNAT and input filtering + * has been applied, this enables ransparent proxying on realnodes + * and localnode. Hook right after MANGLE and before NAT_DST. + */ + static struct nf_hook_ops ip_vs_in_ops = { + .hook = ip_vs_in, + .owner = THIS_MODULE, + .pf = PF_INET, + .hooknum = NF_IP_PRE_ROUTING, + .priority = NF_IP_PRI_NAT_DST - 1, + }, +#endif /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_out, Index: net-2.6.25/net/ipv4/ipvs/Kconfig =================================================================== --- net-2.6.25.orig/net/ipv4/ipvs/Kconfig 2008-01-02 16:46:53.000000000 +0900 +++ net-2.6.25/net/ipv4/ipvs/Kconfig 2008-01-11 13:02:40.000000000 +0900 @@ -24,6 +24,34 @@ menuconfig IP_VS if IP_VS +choice + prompt "IPVS incoming connection interception" + default IP_VS_INPUT_LOCAL_IN + help + This option sets the position at which IPVS intercepts incoming + connections from Netfilter. If in doubt select 'LOCAL_IN'. + +config IP_VS_INPUT_LOCAL_IN + bool "LOCAL_IN" + ---help--- + After packet filtering, forward packet through VS/DR, VS/TUN, + or VS/NAT(change destination), so that filtering rules can be + applied to IPVS + +config IP_VS_INPUT_PRE_ROUTING + bool "PRE_ROUTING" + ---help--- + Intercept incoming connections before DNAT and input filtering + has been applied, this enables ransparent proxying on realnodes + and localnode. + + Incoming connections are intercepted right after the mangle + PREROUTING table and before the nat PREROUTING table. This way + packets are intercepted without any modifications by netfilter + and NAT (if required) can be done on the realservers. + +endchoice + config IP_VS_DEBUG bool "IP virtual server debugging" ---help--- - To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html