On Thu, Jul 09, 2015 at 09:55:19PM +0200, Michael Tuexen wrote: > > On 09 Jul 2015, at 18:54, Marcelo Ricardo Leitner <mleitner@xxxxxxxxxx> wrote: > > > > Cc'ing Michael too. > I'm not familiar with the Linux kernel code, so I can't comment on it. > But making sure to use a source address belonging to the emitting > interface makes sense for me. > > Best regards > Michael That's pretty much what I was looking for, in case I missed something on SCTP RFCs. Thanks Michael! > > On Tue, Jul 07, 2015 at 02:42:25PM -0300, Marcelo Ricardo Leitner wrote: > >> Hi folks, > >> > >> This is an attempt to better choose a src address for sctp packets as > >> peers with rp_filter could be dropping our packets in some situations. > >> With this patch, we try to respect and use a src address that belongs to > >> the interface we are putting the packet out. > >> > >> I have that feeling that there is be a better way to do this, but I > >> just couldn't see it. > >> > >> This patch has been tested with and without gateways between the peers > >> and also just two peers connected via two subnets and results were > >> pretty good. > >> > >> One could think that this limits the address combination we can use, but > >> such combinations probably are just bogus anyway. Like, if you have an > >> host with address A1 and B1 and another with A2 and B2, you cannot > >> expect that A can use A1 to reach B2 through subnet B, because the > >> return path would be via the other link which, when this switch happens, > >> we are thinking it's broken. > >> > >> Thanks, > >> Marcelo > >> > >> ---8<--- > >> > >> In short, sctp is likely to incorrectly choose src address if socket is > >> bound to secondary addresses. This patch fixes it by adding a new check > >> that tries to anticipate if the src address would be expected by the > >> next hop/peer on this interface by doing reverse routing. > >> > >> Also took the shot to reduce the indentation level on this code. > >> > >> Details: > >> > >> Currently, sctp will do a routing attempt without specifying the src > >> address and compare the returned value (preferred source) with the > >> addresses that the socket is bound to. When using secondary addresses, > >> this will not match. > >> > >> Then it will try specifying each of the addresses that the socket is > >> bound to and re-routing, checking if that address is valid as src for > >> that dst. Thing is, this check alone is weak: > >> > >> # ip r l > >> 192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.149 > >> 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.147 > >> > >> # ip a l > >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default > >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >> inet 127.0.0.1/8 scope host lo > >> valid_lft forever preferred_lft forever > >> inet6 ::1/128 scope host > >> valid_lft forever preferred_lft forever > >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 > >> link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff > >> inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0 > >> valid_lft 2160sec preferred_lft 2160sec > >> inet 192.168.122.148/24 scope global secondary eth0 > >> valid_lft forever preferred_lft forever > >> inet6 fe80::5054:ff:fe15:186a/64 scope link > >> valid_lft forever preferred_lft forever > >> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 > >> link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff > >> inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1 > >> valid_lft 2162sec preferred_lft 2162sec > >> inet 192.168.100.148/24 scope global secondary eth1 > >> valid_lft forever preferred_lft forever > >> inet6 fe80::5054:ff:feb3:9146/64 scope link > >> valid_lft forever preferred_lft forever > >> 4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 > >> link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff > >> inet6 fe80::5054:ff:fe05:47ee/64 scope link > >> valid_lft forever preferred_lft forever > >> > >> # ip r g 192.168.100.193 from 192.168.122.148 > >> 192.168.100.193 from 192.168.122.148 dev eth1 > >> cache > >> > >> Even if you specify an interface: > >> > >> # ip r g 192.168.100.193 from 192.168.122.148 oif eth1 > >> 192.168.100.193 from 192.168.122.148 dev eth1 > >> cache > >> > >> Although this would be valid, peers using rp_filter will drop such > >> packets as their src doesn't match the routes for that interface. > >> > >> So we fix this by adding an extra check, we try to do the reverse > >> routing and check if the interface used would be the same. If not, we > >> skip such address. If yes, we use it. > >> > >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > >> --- > >> net/sctp/protocol.c | 55 +++++++++++++++++++++++++++++++++++++++-------------- > >> 1 file changed, 41 insertions(+), 14 deletions(-) > >> > >> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c > >> index 59e80356672bdf89777265ae1f8c384792dfb98c..e52fd6f77963426a7cf3e83ca01a9cdae1cb2c01 100644 > >> --- a/net/sctp/protocol.c > >> +++ b/net/sctp/protocol.c > >> @@ -53,6 +53,7 @@ > >> #include <net/net_namespace.h> > >> #include <net/protocol.h> > >> #include <net/ip.h> > >> +#include <net/ip_fib.h> > >> #include <net/ipv6.h> > >> #include <net/route.h> > >> #include <net/sctp/sctp.h> > >> @@ -487,23 +488,49 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr, > >> */ > >> rcu_read_lock(); > >> list_for_each_entry_rcu(laddr, &bp->address_list, list) { > >> + struct flowi4 in; > >> + struct fib_result res; > >> + > >> if (!laddr->valid) > >> continue; > >> - if ((laddr->state == SCTP_ADDR_SRC) && > >> - (AF_INET == laddr->a.sa.sa_family)) { > >> - fl4->fl4_sport = laddr->a.v4.sin_port; > >> - flowi4_update_output(fl4, > >> - asoc->base.sk->sk_bound_dev_if, > >> - RT_CONN_FLAGS(asoc->base.sk), > >> - daddr->v4.sin_addr.s_addr, > >> - laddr->a.v4.sin_addr.s_addr); > >> - > >> - rt = ip_route_output_key(sock_net(sk), fl4); > >> - if (!IS_ERR(rt)) { > >> - dst = &rt->dst; > >> - goto out_unlock; > >> - } > >> + if (laddr->state != SCTP_ADDR_SRC || > >> + AF_INET != laddr->a.sa.sa_family) > >> + continue; > >> + > >> + fl4->fl4_sport = laddr->a.v4.sin_port; > >> + flowi4_update_output(fl4, > >> + asoc->base.sk->sk_bound_dev_if, > >> + RT_CONN_FLAGS(asoc->base.sk), > >> + daddr->v4.sin_addr.s_addr, > >> + laddr->a.v4.sin_addr.s_addr); > >> + > >> + rt = ip_route_output_key(sock_net(sk), fl4); > >> + if (IS_ERR(rt)) > >> + continue; > >> + > >> + dst = &rt->dst; > >> + > >> + /* Double check if this is really expected. We simulate > >> + * rp_filter by swapping src and dst. If interfaces are > >> + * different, means that this src wouldn't be expected > >> + * by the other host on this interface. > >> + */ > >> + memcpy(&in, fl4, sizeof(in)); > >> + in.daddr = fl4->saddr; > >> + in.saddr = fl4->daddr; > >> + in.flowi4_iif = fl4->flowi4_oif; > >> + in.flowi4_oif = 0; > >> + > >> + if (fib_lookup(sock_net(sk), &in, &res) || > >> + res.type != RTN_LOCAL || > >> + fl4->flowi4_oif != FIB_RES_DEV(res)->ifindex) { > >> + /* Failed, so this was a false hit */ > >> + dst_release(dst); > >> + dst = NULL; > >> + continue; > >> } > >> + > >> + break; > >> } > >> > >> out_unlock: > >> -- > >> 2.4.1 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html