On Mon, 2010-10-04 at 11:44 +0200, Simon Horman wrote: > On Mon, Oct 04, 2010 at 08:34:59AM +0200, Hans Schillstrom wrote: > > Hi > > > > On Sat, 2010-10-02 at 10:30 +0200, Simon Horman wrote: > > > On Wed, Sep 29, 2010 at 01:01:37AM +0300, Julian Anastasov wrote: > > > > > > Hi Julian, Hi all, > > > > > > > > > > > Hello, > > > > > > > > From the recent discussion about loaded backup server > > > > it looks like we do not properly assign forwarding method > > > > to connections in backup server. If backup is used in master > > > > as real server, eg. DR, then backup should use LOCALNODE > > > > for its IP. May be ip_vs_find_dest should allow real server > > > > with port 0 to be used as default server? And if real server > > > > is found its forwarding method should be used for the > > > > connection? So, backup should have the same IP and Port but > > > > it can choose to use different forwarding method? For example, > > > > master uses DR but backup TUN for the same real server. > > > > > > > > Because now when server is added its method can > > > > be converted to LOCALNODE but when such connections > > > > are created in backup server we should use DR or NAT > > > > or whatever the method is configured there. The same is > > > > when backup is added as DR server in master but the > > > > connections should be LOCALNODE when created in backup. > > > > > > > > If we still allow DR/NAT/TUN connections in backup > > > > to work without real server then all such xmitters should > > > > check RTCF_LOCAL and assume LOCALNODE if needed. This is > > > > needed for the case when we do not know the fwmark used > > > > by connection and we can not find the virtual service. > > > > > > > > Then __ip_vs_update_dest should not replace the > > > > configured forwarding method with IP_VS_CONN_F_LOCALNODE > > > > to allow backup to see this method in fwmark connections. > > > > If needed, we can remember that it is local in some > > > > new dest flag, eg. IP_VS_DEST_F_LOCAL. But better to > > > > show it as it was configured? > > > > > > > > So, how to fix these problems? May be: > > > > > > > > - ip_vs_find_dest to find svc and dest in more complex way > > > > > > > > - if backup has dest it should assign its forwarding method > > > > to the connection (ip_vs_bind_dest) > > > > > > > > - allow some transmitters to deliver traffic locally to support > > > > fwmark setups, eg. when no dest is assigned to connection > > > > > > This seems rather tricky to say the least. > > > I prefer the 2nd version of struct ip_vs_sync_conn option... > > > > > > > There is also an option to create 2nd version > > > > of struct ip_vs_sync_conn. For example, size in > > > > struct ip_vs_sync_mesg can be moved after new field > > > > version which will be in place of size. Old backups will > > > > think the small version number as some short size and will > > > > ignore the message. New backup servers can support both > > > > formats. The new format can add new fields for fwmark, > > > > IPv6 addresses, 1 byte af (AF_INET/AF_INET6), 1 byte len > > > > for easy skipping of messages if af or protocol are not > > > > supported. > > > > >From my narrow view of the LVS: > > If you use Network name spaces there is no need of LOCAL NODE since the > > entire LVS could be placed in it's own netns.... > > (I know people will use what they always have been using.) > > I'm not quite sure what you are getting at there. > > LOCAL NODE is basically an optimisation in the transmit path for > the case where the real-server is the local host. But I think > that most of the problem with it relates to it being determined > at the time that a real-server is added. > > I'm unclear about how name spaces can help here, > but I'm certainly very happy to learn. > If the LVS run in it's own network name-space on a real-server there is no need for LOCAL_NODE. From the LVS point of view it's runing on "another machine" (i.e netns) > > > It funny that you should mention that. I need to extend the synchronisation > > > protocol to allow the synchronisation of persistence engine data. And I > > > came up with more or less the same scheme for extending the protocol > > > without breaking old implementations - set the current size field to 0 (or > > > any other value that doesn't match the packet length), add a new size field > > > and a version field. > > > > Why not change port ? > > I considered that too. But I think changing the protocol is easy enough. > And in any case new kernels will need to understand both the new and > old ways of doing things. > > > > Lets spend a bit of time thinking out a v2 of the protocol that solves the > > > outstanding problems that we have. > > > > > > * No version field > > > * Only 16 bits of flags > > > * No space for IPv6 addresses > > > * No space fwmarks > > > (* No space for persistence engine data) > > > > > > > I have stared to implement IPv6 backup using IPv6 multicast > > My Idea was to keep the IPv4 and IPv6 separated, i.e. send IPv4 over its > > own socket and IPv6 over another just to keep IPv4 untouched. > > If there is a need for changes I vote for - "keep them together". > > > > I think a version 2 would be nice, where IPv6 is a part. > > > > Needed new fields > > * Version must be there > > * next field (offset to next filed, IPv4, fwmark, IPv6) > > * flags/type field > > > > > > Divide the messages into required no of fields ex. > > IPv4 > > fwmark > > IPv6 > > Perhaps we just need an addrlen field somewhere. > Or if we wanted to save space, an addr type field. > > If you have some firm ideas perhaps you could send > them here, perhaps in the form of a C structure or a diagram? > This is the structures that I work with right now, (have a look at them and see them as a source for discussion ) The connections is only modified in the IP address i.e. IPv6 struct ip_vs_sync_conn_v6 { __u8 reserved; /* Protocol, addresses and port numbers */ __u8 protocol; /* Which protocol (TCP/UDP) */ __be16 cport; __be16 vport; __be16 dport; struct in6_addr caddr; /* client address */ struct in6_addr vaddr; /* virtual address */ struct in6_addr daddr; /* destination address */ /* Flags and state transition */ __be16 flags; /* status flags */ __be16 state; /* state info */ /* The sequence options start here */ }; struct ipvs_synchdr { __u8 version; __u8 type; __u8 nexthdr; __u8 size; }; New 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | type | next hdr | size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload data ex IPv4 Connections | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | next header | header len | type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Payload data ex IPv6 Connections | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Old 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Count Conns | SyncID | Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | IPVS Sync Connection (1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | IPVS Sync Connection (n) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > Individually those problems don't seem to warrant a new protocol. > > > But when combined it seems worthwhile to me. > > > > > > > Simon, may be now ip_vs_nat_xmit should see > > > > RTCF_LOCAL flag and we should check all NAT handlers > > > > to support the LOCALNODE fallback where the port can > > > > be changed too. > > > > > > I'm not quite sure what you are describing there. > > > > > > Is the idea that if the forwarding mechanism is NAT > > > then packets will always go via ip_vs_nat_xmit, even if > > > the IP is local (at config time). And that ip_vs_nat_xmit() > > > will use local xmit if RTCF_LOCAL is set? > > > > IPv6 also have a number of other issues not related to the backup > > protocol like Usage of IPv6 or IPv4 multicast address etc. > > Could you elaborate? If I think about it, they do shrink into nothing if a common solution will be used. My first approach was a separate sync thread for IPv6 with it's own socket and don't touch the IPv4 part, if that approach should be used new sysctls is needed and new switches to ipvsadm. > > -- > To unsubscribe from this list: send the line "unsubscribe lvs-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html