Please apply the following networking bug fixes to 3.2, 3.4, 3.10, and 3.14 -stable, respectively. Thanks!
>From 6f3b21b447ac665867e5f78ecc31845560c66372 Mon Sep 17 00:00:00 2001 From: Dmitry Petukhov <dmgenp@xxxxxxxxx> Date: Wed, 9 Apr 2014 02:23:20 +0600 Subject: [PATCH 01/20] l2tp: take PMTU from tunnel UDP socket [ Upstream commit f34c4a35d87949fbb0e0f31eba3c054e9f8199ba ] When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/l2tp/l2tp_ppp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 969cd3e..e0f0934 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -772,9 +772,9 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, session->deref = pppol2tp_session_sock_put; /* If PMTU discovery was enabled, use the MTU that was discovered */ - dst = sk_dst_get(sk); + dst = sk_dst_get(tunnel->sock); if (dst != NULL) { - u32 pmtu = dst_mtu(__sk_dst_get(sk)); + u32 pmtu = dst_mtu(__sk_dst_get(tunnel->sock)); if (pmtu != 0) session->mtu = session->mru = pmtu - PPPOL2TP_HEADER_OVERHEAD; -- 1.9.3 >From 829994ab6e69fed3362c067dde92616a8dc0c0d6 Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Wed, 9 Apr 2014 10:28:50 +0200 Subject: [PATCH 02/20] net: core: don't account for udp header size when computing seglen [ Upstream commit 6d39d589bb76ee8a1c6cde6822006ae0053decff ] In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx> Reported-by: Tobias Brunner <tobias@xxxxxxxxxxxxxx> Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/skbuff.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8ac4a0f..8ae2e43 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3197,12 +3197,14 @@ EXPORT_SYMBOL(__skb_warn_lro_forwarding); unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) { const struct skb_shared_info *shinfo = skb_shinfo(skb); - unsigned int hdr_len; if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) - hdr_len = tcp_hdrlen(skb); - else - hdr_len = sizeof(struct udphdr); - return hdr_len + shinfo->gso_size; + return tcp_hdrlen(skb) + shinfo->gso_size; + + /* UFO sets gso_size to the size of the fragmentation + * payload, i.e. the size of the L4 (UDP) header is already + * accounted for. + */ + return shinfo->gso_size; } EXPORT_SYMBOL_GPL(skb_gso_transport_seglen); -- 1.9.3 >From 6d9777e618d1b1f04faca56cf7de780bf155a23a Mon Sep 17 00:00:00 2001 From: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Date: Wed, 9 Apr 2014 12:52:59 +0200 Subject: [PATCH 03/20] bonding: Remove debug_fs files when module init fails [ Upstream commit db29868653394937037d71dc3545768302dda643 ] Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Signed-off-by: Jay Vosburgh <j.vosburgh@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1bf36ac..5af2a8f 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4914,6 +4914,7 @@ static int __init bonding_init(void) out: return res; err: + bond_destroy_debugfs(); rtnl_link_unregister(&bond_link_ops); err_link: unregister_pernet_subsys(&bond_net_ops); -- 1.9.3 >From 82f3308b31828377d918fca9498f44ddadce9f82 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 10 Apr 2014 21:23:36 -0700 Subject: [PATCH 04/20] ipv6: Limit mtu to 65575 bytes [ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@xxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: YOSHIFUJI Hideaki <yoshfuji@xxxxxxxxxxxxxx> Acked-by: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/ip6_route.h | 5 +++++ net/ipv6/route.c | 5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 5e91b72..4913dac 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -34,6 +34,11 @@ struct route_info { #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 +/* We do not (yet ?) support IPv6 jumbograms (RFC 2675) + * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header + */ +#define IP6_MAX_MTU (0xFFFF + sizeof(struct ipv6hdr)) + /* * rt6_srcprefs2flags() and rt6_flags2srcprefs() translate * between IPV6_ADDR_PREFERENCES socket option values diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 39e11f9..782f67a 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1056,7 +1056,7 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); if (mtu) - return mtu; + goto out; mtu = IPV6_MIN_MTU; @@ -1066,7 +1066,8 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) mtu = idev->cnf.mtu6; rcu_read_unlock(); - return mtu; +out: + return min_t(unsigned int, mtu, IP6_MAX_MTU); } static struct dst_entry *icmp6_dst_gc_list; -- 1.9.3 >From f2f60a690675d3cc0a1c15aa73a1ee29fdb3a043 Mon Sep 17 00:00:00 2001 From: "Wang, Xiaoming" <xiaoming.wang@xxxxxxxxx> Date: Mon, 14 Apr 2014 12:30:45 -0400 Subject: [PATCH 05/20] net: ipv4: current group_info should be put after using. [ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ] Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> Signed-off-by: Zhang Dongxing <dongxing.zhang@xxxxxxxxx> Signed-off-by: xiaoming wang <xiaoming.wang@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ping.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 00975b6..d495d4b 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -203,26 +203,33 @@ static int ping_init_sock(struct sock *sk) struct net *net = sock_net(sk); gid_t group = current_egid(); gid_t range[2]; - struct group_info *group_info = get_current_groups(); - int i, j, count = group_info->ngroups; + struct group_info *group_info; + int i, j, count; + int ret = 0; inet_get_ping_group_range_net(net, range, range+1); if (range[0] <= group && group <= range[1]) return 0; + group_info = get_current_groups(); + count = group_info->ngroups; for (i = 0; i < group_info->nblocks; i++) { int cp_count = min_t(int, NGROUPS_PER_BLOCK, count); for (j = 0; j < cp_count; j++) { group = group_info->blocks[i][j]; if (range[0] <= group && group <= range[1]) - return 0; + goto out_release_group; } count -= cp_count; } - return -EACCES; + ret = -EACCES; + +out_release_group: + put_group_info(group_info); + return ret; } static void ping_close(struct sock *sk, long timeout) -- 1.9.3 >From bf49a7227231431c0763e24a5589b8b515e20083 Mon Sep 17 00:00:00 2001 From: Mathias Krause <minipli@xxxxxxxxxxxxxx> Date: Sun, 13 Apr 2014 18:23:33 +0200 Subject: [PATCH 06/20] filter: prevent nla extensions to peek beyond the end of the message [ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@xxxxxxxxx> Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx> Acked-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/filter.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 5dea452..5b4d7ec 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -336,11 +336,15 @@ load_b: if (skb_is_nonlinear(skb)) return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; if (A > skb->len - sizeof(struct nlattr)) return 0; nla = (struct nlattr *)&skb->data[A]; - if (nla->nla_len > A - skb->len) + if (nla->nla_len > skb->len - A) return 0; nla = nla_find_nested(nla, X); -- 1.9.3 >From b6b52c9dc7c4883caee480ccf3619a8407c0298c Mon Sep 17 00:00:00 2001 From: Ivan Vecera <ivecera@xxxxxxxxxx> Date: Thu, 17 Apr 2014 14:51:08 +0200 Subject: [PATCH 07/20] tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled The patch fixes a problem with dropped jumbo frames after usage of 'ethtool -G ... rx'. Scenario: 1. ip link set eth0 up 2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo 3. ip link set mtu 9000 dev eth0 The ethtool command set rx_jumbo_pending to zero so any received jumbo packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N' to workaround the issue. The patch changes the logic so rx_jumbo_pending value is changed only if jumbo frames are enabled (MTU > 1500). Signed-off-by: Ivan Vecera <ivecera@xxxxxxxxxx> Acked-by: Michael Chan <mchan@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/ethernet/broadcom/tg3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index c77c462..2615433 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -10656,7 +10656,9 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e if (tg3_flag(tp, MAX_RXPEND_64) && tp->rx_pending > 63) tp->rx_pending = 63; - tp->rx_jumbo_pending = ering->rx_jumbo_pending; + + if (tg3_flag(tp, JUMBO_RING_ENABLE)) + tp->rx_jumbo_pending = ering->rx_jumbo_pending; for (i = 0; i < tp->irq_max; i++) tp->napi[i].tx_pending = ering->tx_pending; -- 1.9.3 >From 05b8be9b9e0badb364ae0710858171fe2cf59b27 Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:35 +1000 Subject: [PATCH 08/20] rtnetlink: Warn when interface's information won't fit in our packet [ Upstream commit 973462bbde79bb827824c73b59027a0aed5c9ca6 ] Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jpirko@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 5b7d5f2..978a367 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1057,6 +1057,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_node *node; struct nlattr *tb[IFLA_MAX+1]; u32 ext_filter_mask = 0; + int err; s_h = cb->args[0]; s_idx = cb->args[1]; @@ -1077,11 +1078,17 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, node, head, index_hlist) { if (idx < s_idx) goto cont; - if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, - NETLINK_CB(cb->skb).pid, - cb->nlh->nlmsg_seq, 0, - NLM_F_MULTI, - ext_filter_mask) <= 0) + err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, + NETLINK_CB(cb->skb).pid, + cb->nlh->nlmsg_seq, 0, + NLM_F_MULTI, + ext_filter_mask); + /* If we ran out of room on the first message, + * we're in trouble + */ + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); + + if (err <= 0) goto out; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); -- 1.9.3 >From bd381617bec67d6281bfe32d71fd2816bc48cac5 Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:36 +1000 Subject: [PATCH 09/20] rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set [ Upstream commit c53864fd60227de025cb79e05493b13f69843971 ] Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jiri@xxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 978a367..7beaf10 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -744,7 +744,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, return 0; } -static size_t rtnl_port_size(const struct net_device *dev) +static size_t rtnl_port_size(const struct net_device *dev, + u32 ext_filter_mask) { size_t port_size = nla_total_size(4) /* PORT_VF */ + nla_total_size(PORT_PROFILE_MAX) /* PORT_PROFILE */ @@ -760,7 +761,8 @@ static size_t rtnl_port_size(const struct net_device *dev) size_t port_self_size = nla_total_size(sizeof(struct nlattr)) + port_size; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; if (dev_num_vf(dev->dev.parent)) return port_self_size + vf_ports_size + @@ -791,7 +793,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(ext_filter_mask & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */ + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ - + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ + rtnl_link_get_af_size(dev); /* IFLA_AF_SPEC */ } @@ -851,11 +853,13 @@ static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev) return 0; } -static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev) +static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev, + u32 ext_filter_mask) { int err; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; err = rtnl_port_self_fill(skb, dev); @@ -1002,7 +1006,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, nla_nest_end(skb, vfinfo); } - if (rtnl_port_fill(skb, dev)) + if (rtnl_port_fill(skb, dev, ext_filter_mask)) goto nla_put_failure; if (dev->rtnl_link_ops) { -- 1.9.3 >From e452dd5f6e6ae15c855c7197adde797871fd303a Mon Sep 17 00:00:00 2001 From: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Date: Fri, 25 Apr 2014 17:01:18 +0900 Subject: [PATCH 10/20] bridge: Handle IFLA_ADDRESS correctly when creating bridge device [ Upstream commit 30313a3d5794472c3548d7288e306a5492030370 ] When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@xxxxxxx> Signed-off-by: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/bridge/br_netlink.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index cbf9ccd..99a48a3 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -211,11 +211,26 @@ static int br_validate(struct nlattr *tb[], struct nlattr *data[]) return 0; } +static int br_dev_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct net_bridge *br = netdev_priv(dev); + + if (tb[IFLA_ADDRESS]) { + spin_lock_bh(&br->lock); + br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS])); + spin_unlock_bh(&br->lock); + } + + return register_netdevice(dev); +} + struct rtnl_link_ops br_link_ops __read_mostly = { .kind = "bridge", .priv_size = sizeof(struct net_bridge), .setup = br_dev_setup, .validate = br_validate, + .newlink = br_dev_newlink, .dellink = br_dev_delete, }; -- 1.9.3 >From d43a0f1c48bc56b7f43943f075afed5c78507d66 Mon Sep 17 00:00:00 2001 From: Xufeng Zhang <xufeng.zhang@xxxxxxxxxxxxx> Date: Fri, 25 Apr 2014 16:55:41 +0800 Subject: [PATCH 11/20] sctp: reset flowi4_oif parameter on route lookup [ Upstream commit 85350871317a5adb35519d9dc6fc9e80809d42ad ] commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit e6b45241c (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang <xufeng.zhang@xxxxxxxxxxxxx> Signed-off-by: Julian Anastasov <ja@xxxxxx> Acked-by: Vlad Yasevich <vyasevich@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/protocol.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 6f6ad86..de35e01 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -528,8 +528,13 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr, continue; if ((laddr->state == SCTP_ADDR_SRC) && (AF_INET == laddr->a.sa.sa_family)) { - fl4->saddr = laddr->a.v4.sin_addr.s_addr; fl4->fl4_sport = laddr->a.v4.sin_port; + flowi4_update_output(fl4, + asoc->base.sk->sk_bound_dev_if, + RT_CONN_FLAGS(asoc->base.sk), + daddr->v4.sin_addr.s_addr, + laddr->a.v4.sin_addr.s_addr); + rt = ip_route_output_key(&init_net, fl4); if (!IS_ERR(rt)) { dst = &rt->dst; -- 1.9.3 >From d9b5e6a4c92d74af24501a39dd2a000e64a64cf6 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Tue, 29 Apr 2014 10:09:51 -0400 Subject: [PATCH 12/20] Revert "macvlan : fix checksums error when we are in bridge mode" [ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ] This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019. The commit above doesn't appear to be necessary any more as the checksums appear to be correctly computed/validated. Additionally the above commit breaks kvm configurations where one VM is using a device that support checksum offload (virtio) and the other VM does not. In this case, packets leaving virtio device will have CHECKSUM_PARTIAL set. The packets is forwarded to a macvtap that has offload features turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not update the checksum and thus a bad checksum is passed up to the guest. CC: Daniel Lezcano <daniel.lezcano@xxxxxxx> CC: Patrick McHardy <kaber@xxxxxxxxx> CC: Andrian Nord <nightnord@xxxxxxxxx> CC: Eric Dumazet <eric.dumazet@xxxxxxxxx> CC: Michael S. Tsirkin <mst@xxxxxxxxxx> CC: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> Acked-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 301b39e..ab5786b 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -236,11 +236,9 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) const struct macvlan_dev *vlan = netdev_priv(dev); const struct macvlan_port *port = vlan->port; const struct macvlan_dev *dest; - __u8 ip_summed = skb->ip_summed; if (vlan->mode == MACVLAN_MODE_BRIDGE) { const struct ethhdr *eth = (void *)skb->data; - skb->ip_summed = CHECKSUM_UNNECESSARY; /* send to other bridge ports directly */ if (is_multicast_ether_addr(eth->h_dest)) { @@ -258,7 +256,6 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) } xmit_world: - skb->ip_summed = ip_summed; skb->dev = vlan->lowerdev; return dev_queue_xmit(skb); } -- 1.9.3 >From 87cbbdab1302f90f1ae59ccf05fdbd1f64c8377d Mon Sep 17 00:00:00 2001 From: Liu Yu <allanyuliu@xxxxxxxxxxx> Date: Wed, 30 Apr 2014 17:34:09 +0800 Subject: [PATCH 13/20] tcp_cubic: fix the range of delayed_ack [ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ] commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@xxxxxxxxxx> Signed-off-by: Liu Yu <allanyuliu@xxxxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: Neal Cardwell <ncardwell@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/tcp_cubic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index b78eac2..ed3d6d4 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -406,7 +406,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT; ratio += cnt; - ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT); + ca->delayed_ack = clamp(ratio, 1U, ACK_RATIO_LIMIT); } /* Some calls are for duplicates without timetamps */ -- 1.9.3 >From 06437df12b86b8cb01bd9f7f357a614114f5a65d Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Sun, 4 May 2014 23:24:31 +0200 Subject: [PATCH 14/20] net: ipv4: ip_forward: fix inverted local_df test [ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ] local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_forward.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index e0d9f02..7593f3a 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -42,12 +42,12 @@ static bool ip_may_fragment(const struct sk_buff *skb) { return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) || - !skb->local_df; + skb->local_df; } static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) { - if (skb->len <= mtu || skb->local_df) + if (skb->len <= mtu) return false; if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) -- 1.9.3 >From 2cc44568cbbf1d1a675dd1ec317b1ec3be1470cd Mon Sep 17 00:00:00 2001 From: Sergey Popovich <popovich_sergei@xxxxxxx> Date: Tue, 6 May 2014 18:23:08 +0300 Subject: [PATCH 15/20] ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation [ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ] Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich <popovich_sergei@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/fib_semantics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index d01f9c6..76da979 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -752,13 +752,13 @@ struct fib_info *fib_create_info(struct fib_config *cfg) fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL); if (fi == NULL) goto failure; + fib_info_cnt++; if (cfg->fc_mx) { fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL); if (!fi->fib_metrics) goto failure; } else fi->fib_metrics = (u32 *) dst_default_metrics; - fib_info_cnt++; fi->fib_net = hold_net(net); fi->fib_protocol = cfg->fc_protocol; -- 1.9.3 >From a786ad8181da8415d2e45070a572618d1c435221 Mon Sep 17 00:00:00 2001 From: Peter Christensen <pch@xxxxxxxxxxxx> Date: Thu, 8 May 2014 11:15:37 +0200 Subject: [PATCH 16/20] macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces. [ Upstream commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 ] Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Peter Christensen <pch@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index ab5786b..b74cdf6 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -391,8 +391,10 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change) struct macvlan_dev *vlan = netdev_priv(dev); struct net_device *lowerdev = vlan->lowerdev; - if (change & IFF_ALLMULTI) - dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); + if (dev->flags & IFF_UP) { + if (change & IFF_ALLMULTI) + dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); + } } static void macvlan_set_multicast_list(struct net_device *dev) -- 1.9.3 >From 7bcc4524eb294b5cb0e75ccb44b46c2e9becf3fc Mon Sep 17 00:00:00 2001 From: Jason Wang <jasowang@xxxxxxxxxx> Date: Wed, 15 Aug 2012 20:44:27 +0000 Subject: [PATCH 17/20] act_mirred: do not drop packets when fails to mirror it [ Upstream commit 16c0b164bd24d44db137693a36b428ba28970c62 ] We drop packet unconditionally when we fail to mirror it. This is not intended in some cases. Consdier for kvm guest, we may mirror the traffic of the bridge to a tap device used by a VM. When kernel fails to mirror the packet in conditions such as when qemu crashes or stop polling the tap, it's hard for the management software to detect such condition and clean the the mirroring before. This would lead all packets to the bridge to be dropped and break the netowrk of other virtual machines. To solve the issue, the patch does not drop packets when kernel fails to mirror it, and only drop the redirected packets. Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Jamal Hadi Salim <jhs@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sched/act_mirred.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e051398..d067ed1 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -201,13 +201,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, out: if (err) { m->tcf_qstats.overlimits++; - /* should we be asking for packet to be dropped? - * may make sense for redirect case only - */ - retval = TC_ACT_SHOT; - } else { + if (m->tcfm_eaction != TCA_EGRESS_MIRROR) + retval = TC_ACT_SHOT; + else + retval = m->tcf_action; + } else retval = m->tcf_action; - } spin_unlock(&m->tcf_lock); return retval; -- 1.9.3 >From 0a3c38f0bc6a43402a53d38a11c66d5e47b4f7e6 Mon Sep 17 00:00:00 2001 From: Li RongQing <roy.qing.li@xxxxxxxxx> Date: Thu, 22 May 2014 16:36:55 +0800 Subject: [PATCH 18/20] ipv4: initialise the itag variable in __mkroute_input [ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ] the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing <roy.qing.li@xxxxxxxxx> Acked-by: Alexei Starovoitov <ast@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 6768ce2..6526110 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2142,7 +2142,7 @@ static int __mkroute_input(struct sk_buff *skb, struct in_device *out_dev; unsigned int flags = 0; __be32 spec_dst; - u32 itag; + u32 itag = 0; /* get a working reference to the output device */ out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res)); -- 1.9.3 >From a8562d3ee753fb67e8e13ecd73fb716bcb82497d Mon Sep 17 00:00:00 2001 From: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Date: Fri, 4 May 2012 14:26:56 +0000 Subject: [PATCH 19/20] skb: Add inline helper for getting the skb end offset from head [ Upstream commit ec47ea82477404631d49b8e568c71826c9b663ac ] With the recent changes for how we compute the skb truesize it occurs to me we are probably going to have a lot of calls to skb_end_pointer - skb->head. Instead of running all over the place doing that it would make more sense to just make it a separate inline skb_end_offset(skb) that way we can return the correct value without having gcc having to do all the optimization to cancel out skb->head - skb->head. Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/atm/ambassador.c | 2 +- drivers/atm/idt77252.c | 2 +- drivers/net/wimax/i2400m/usb-rx.c | 2 +- drivers/staging/octeon/ethernet-tx.c | 2 +- include/linux/skbuff.h | 12 +++++++++++- net/core/skbuff.c | 9 ++++----- 6 files changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c index f8f41e0..89b30f3 100644 --- a/drivers/atm/ambassador.c +++ b/drivers/atm/ambassador.c @@ -802,7 +802,7 @@ static void fill_rx_pool (amb_dev * dev, unsigned char pool, } // cast needed as there is no %? for pointer differences PRINTD (DBG_SKB, "allocated skb at %p, head %p, area %li", - skb, skb->head, (long) (skb_end_pointer(skb) - skb->head)); + skb, skb->head, (long) skb_end_offset(skb)); rx.handle = virt_to_bus (skb); rx.host_address = cpu_to_be32 (virt_to_bus (skb->data)); if (rx_give (dev, &rx, pool)) diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c index b0e75ce..81845fa 100644 --- a/drivers/atm/idt77252.c +++ b/drivers/atm/idt77252.c @@ -1258,7 +1258,7 @@ idt77252_rx_raw(struct idt77252_dev *card) tail = readl(SAR_REG_RAWCT); pci_dma_sync_single_for_cpu(card->pcidev, IDT77252_PRV_PADDR(queue), - skb_end_pointer(queue) - queue->head - 16, + skb_end_offset(queue) - 16, PCI_DMA_FROMDEVICE); while (head != tail) { diff --git a/drivers/net/wimax/i2400m/usb-rx.c b/drivers/net/wimax/i2400m/usb-rx.c index e325768..b78ee67 100644 --- a/drivers/net/wimax/i2400m/usb-rx.c +++ b/drivers/net/wimax/i2400m/usb-rx.c @@ -277,7 +277,7 @@ retry: d_printf(1, dev, "RX: size changed to %d, received %d, " "copied %d, capacity %ld\n", rx_size, read_size, rx_skb->len, - (long) (skb_end_pointer(new_skb) - new_skb->head)); + (long) skb_end_offset(new_skb)); goto retry; } /* In most cases, it happens due to the hardware scheduling a diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c index 2542c37..c5da0d2 100644 --- a/drivers/staging/octeon/ethernet-tx.c +++ b/drivers/staging/octeon/ethernet-tx.c @@ -344,7 +344,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev) } if (unlikely (skb->truesize != - sizeof(*skb) + skb_end_pointer(skb) - skb->head)) { + sizeof(*skb) + skb_end_offset(skb))) { /* printk("TX buffer truesize has been changed\n"); */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 13bd6d0..c445e52 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -617,11 +617,21 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) { return skb->head + skb->end; } + +static inline unsigned int skb_end_offset(const struct sk_buff *skb) +{ + return skb->end; +} #else static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) { return skb->end; } + +static inline unsigned int skb_end_offset(const struct sk_buff *skb) +{ + return skb->end - skb->head; +} #endif /* Internal */ @@ -2549,7 +2559,7 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size) return false; skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD); - if (skb_end_pointer(skb) - skb->head < skb_size) + if (skb_end_offset(skb) < skb_size) return false; if (skb_shared(skb) || skb_cloned(skb)) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8ae2e43..9204d9b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -743,7 +743,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) { int headerlen = skb_headroom(skb); - unsigned int size = (skb_end_pointer(skb) - skb->head) + skb->data_len; + unsigned int size = skb_end_offset(skb) + skb->data_len; struct sk_buff *n = alloc_skb(size, gfp_mask); if (!n) @@ -843,7 +843,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, { int i; u8 *data; - int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail; + int size = nhead + skb_end_offset(skb) + ntail; long off; bool fastpath; @@ -2642,14 +2642,13 @@ struct sk_buff *skb_segment(struct sk_buff *skb, u32 features) if (unlikely(!nskb)) goto err; - hsize = skb_end_pointer(nskb) - nskb->head; + hsize = skb_end_offset(nskb); if (skb_cow_head(nskb, doffset + headroom)) { kfree_skb(nskb); goto err; } - nskb->truesize += skb_end_pointer(nskb) - nskb->head - - hsize; + nskb->truesize += skb_end_offset(nskb) - hsize; skb_release_head_state(nskb); __skb_push(nskb, doffset); } else { -- 1.9.3 >From 73be3acd680265f85617730eeea467b87bf867f9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 3 Apr 2014 09:28:10 -0700 Subject: [PATCH 20/20] net-gro: reset skb->truesize in napi_reuse_skb() [ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/dev.c b/net/core/dev.c index 7bcf37d..854da15 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3648,6 +3648,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->vlan_tci = 0; skb->dev = napi->dev; skb->skb_iif = 0; + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); napi->skb = skb; } -- 1.9.3
>From 0b767767ac3ebb6032c410095601b64a374b0595 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov <oleg@xxxxxxxxxx> Date: Tue, 12 Nov 2013 15:10:01 -0800 Subject: [PATCH 01/20] list: introduce list_next_entry() and list_prev_entry() [ Upstream commit 008208c6b26f21c2648c250a09c55e737c02c5f8 ] Add two trivial helpers list_next_entry() and list_prev_entry(), they can have a lot of users including list.h itself. In fact the 1st one is already defined in events/core.c and bnx2x_sp.c, so the patch simply moves the definition to list.h. Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Eilon Greenstein <eilong@xxxxxxxxxxxx> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 3 --- include/linux/list.h | 16 ++++++++++++++++ kernel/events/core.c | 3 --- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index 5135733..05ec7f1 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c @@ -1030,9 +1030,6 @@ static void bnx2x_set_one_vlan_mac_e1h(struct bnx2x *bp, ETH_VLAN_FILTER_CLASSIFY, config); } -#define list_next_entry(pos, member) \ - list_entry((pos)->member.next, typeof(*(pos)), member) - /** * bnx2x_vlan_mac_restore - reconfigure next MAC/VLAN/VLAN-MAC element * diff --git a/include/linux/list.h b/include/linux/list.h index cc6d2aa..1712c7e 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -362,6 +362,22 @@ static inline void list_splice_tail_init(struct list_head *list, list_entry((ptr)->next, type, member) /** + * list_next_entry - get the next element in list + * @pos: the type * to cursor + * @member: the name of the list_struct within the struct. + */ +#define list_next_entry(pos, member) \ + list_entry((pos)->member.next, typeof(*(pos)), member) + +/** + * list_prev_entry - get the prev element in list + * @pos: the type * to cursor + * @member: the name of the list_struct within the struct. + */ +#define list_prev_entry(pos, member) \ + list_entry((pos)->member.prev, typeof(*(pos)), member) + +/** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. * @head: the head for your list. diff --git a/kernel/events/core.c b/kernel/events/core.c index eba82e2..16cba0d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1973,9 +1973,6 @@ static void __perf_event_sync_stat(struct perf_event *event, perf_event_update_userpage(next_event); } -#define list_next_entry(pos, member) \ - list_entry(pos->member.next, typeof(*pos), member) - static void perf_event_sync_stat(struct perf_event_context *ctx, struct perf_event_context *next_ctx) { -- 1.9.3 >From aef2a2604cd2d67e93063c858e9fbba364fb43ec Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Tue, 8 Apr 2014 17:26:13 +0200 Subject: [PATCH 02/20] net: sctp: wake up all assocs if sndbuf policy is per socket [ Upstream commit 52c35befb69b005c3fc5afdaae3a5717ad013411 ] SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Thomas Graf <tgraf@xxxxxxx> Cc: Neil Horman <nhorman@xxxxxxxxxxxxx> Cc: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/socket.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 0ed156b..34d330c 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6369,6 +6369,40 @@ static void __sctp_write_space(struct sctp_association *asoc) } } +static void sctp_wake_up_waiters(struct sock *sk, + struct sctp_association *asoc) +{ + struct sctp_association *tmp = asoc; + + /* We do accounting for the sndbuf space per association, + * so we only need to wake our own association. + */ + if (asoc->ep->sndbuf_policy) + return __sctp_write_space(asoc); + + /* Accounting for the sndbuf space is per socket, so we + * need to wake up others, try to be fair and in case of + * other associations, let them have a go first instead + * of just doing a sctp_write_space() call. + * + * Note that we reach sctp_wake_up_waiters() only when + * associations free up queued chunks, thus we are under + * lock and the list of associations on a socket is + * guaranteed not to change. + */ + for (tmp = list_next_entry(tmp, asocs); 1; + tmp = list_next_entry(tmp, asocs)) { + /* Manually skip the head element. */ + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) + continue; + /* Wake up association. */ + __sctp_write_space(tmp); + /* We've reached the end. */ + if (tmp == asoc) + break; + } +} + /* Do accounting for the sndbuf space. * Decrement the used sndbuf space of the corresponding association by the * data size which was just transmitted(freed). @@ -6396,7 +6430,7 @@ static void sctp_wfree(struct sk_buff *skb) sk_mem_uncharge(sk, skb->truesize); sock_wfree(skb); - __sctp_write_space(asoc); + sctp_wake_up_waiters(sk, asoc); sctp_association_put(asoc); } -- 1.9.3 >From e3b82c55cebd001bd0f690b7236c663040be1d53 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Wed, 9 Apr 2014 16:10:20 +0200 Subject: [PATCH 03/20] net: sctp: test if association is dead in sctp_wake_up_waiters [ Upstream commit 1e1cdf8ac78793e0875465e98a648df64694a8d0 ] In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Acked-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/socket.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 34d330c..0c0bd2f 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6380,6 +6380,12 @@ static void sctp_wake_up_waiters(struct sock *sk, if (asoc->ep->sndbuf_policy) return __sctp_write_space(asoc); + /* If association goes down and is just flushing its + * outq, then just normally notify others. + */ + if (asoc->base.dead) + return sctp_write_space(sk); + /* Accounting for the sndbuf space is per socket, so we * need to wake up others, try to be fair and in case of * other associations, let them have a go first instead -- 1.9.3 >From 2f76f6842ea77f1a3990ad349c0cecfb2adfb541 Mon Sep 17 00:00:00 2001 From: Dmitry Petukhov <dmgenp@xxxxxxxxx> Date: Wed, 9 Apr 2014 02:23:20 +0600 Subject: [PATCH 04/20] l2tp: take PMTU from tunnel UDP socket [ Upstream commit f34c4a35d87949fbb0e0f31eba3c054e9f8199ba ] When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/l2tp/l2tp_ppp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 2211275..4e38a81 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -772,9 +772,9 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, session->deref = pppol2tp_session_sock_put; /* If PMTU discovery was enabled, use the MTU that was discovered */ - dst = sk_dst_get(sk); + dst = sk_dst_get(tunnel->sock); if (dst != NULL) { - u32 pmtu = dst_mtu(__sk_dst_get(sk)); + u32 pmtu = dst_mtu(__sk_dst_get(tunnel->sock)); if (pmtu != 0) session->mtu = session->mru = pmtu - PPPOL2TP_HEADER_OVERHEAD; -- 1.9.3 >From 839142ed7af387888f25ac52b3797d21cdcc994a Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Wed, 9 Apr 2014 10:28:50 +0200 Subject: [PATCH 05/20] net: core: don't account for udp header size when computing seglen [ Upstream commit 6d39d589bb76ee8a1c6cde6822006ae0053decff ] In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx> Reported-by: Tobias Brunner <tobias@xxxxxxxxxxxxxx> Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/skbuff.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7a597d4..0f6c1a8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3297,12 +3297,14 @@ EXPORT_SYMBOL(__skb_warn_lro_forwarding); unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) { const struct skb_shared_info *shinfo = skb_shinfo(skb); - unsigned int hdr_len; if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) - hdr_len = tcp_hdrlen(skb); - else - hdr_len = sizeof(struct udphdr); - return hdr_len + shinfo->gso_size; + return tcp_hdrlen(skb) + shinfo->gso_size; + + /* UFO sets gso_size to the size of the fragmentation + * payload, i.e. the size of the L4 (UDP) header is already + * accounted for. + */ + return shinfo->gso_size; } EXPORT_SYMBOL_GPL(skb_gso_transport_seglen); -- 1.9.3 >From fcd9930259aa6f1b382a47dde2752e2f93212f7f Mon Sep 17 00:00:00 2001 From: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Date: Wed, 9 Apr 2014 12:52:59 +0200 Subject: [PATCH 06/20] bonding: Remove debug_fs files when module init fails [ Upstream commit db29868653394937037d71dc3545768302dda643 ] Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Signed-off-by: Jay Vosburgh <j.vosburgh@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 098581a..2402af3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4930,6 +4930,7 @@ static int __init bonding_init(void) out: return res; err: + bond_destroy_debugfs(); rtnl_link_unregister(&bond_link_ops); err_link: unregister_pernet_subsys(&bond_net_ops); -- 1.9.3 >From b9655e106cc4c1fc67196af561805081145047f2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 10 Apr 2014 21:23:36 -0700 Subject: [PATCH 07/20] ipv6: Limit mtu to 65575 bytes [ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@xxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: YOSHIFUJI Hideaki <yoshfuji@xxxxxxxxxxxxxx> Acked-by: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/ip6_route.h | 5 +++++ net/ipv6/route.c | 5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 2ad92ca..7cd3caa 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -34,6 +34,11 @@ struct route_info { #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 +/* We do not (yet ?) support IPv6 jumbograms (RFC 2675) + * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header + */ +#define IP6_MAX_MTU (0xFFFF + sizeof(struct ipv6hdr)) + /* * rt6_srcprefs2flags() and rt6_flags2srcprefs() translate * between IPV6_ADDR_PREFERENCES socket option values diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b685f99..c8643a3 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1092,7 +1092,7 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); if (mtu) - return mtu; + goto out; mtu = IPV6_MIN_MTU; @@ -1102,7 +1102,8 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) mtu = idev->cnf.mtu6; rcu_read_unlock(); - return mtu; +out: + return min_t(unsigned int, mtu, IP6_MAX_MTU); } static struct dst_entry *icmp6_dst_gc_list; -- 1.9.3 >From 08de8fbd4cd9918b88eb52dbed58113b99a12724 Mon Sep 17 00:00:00 2001 From: "Wang, Xiaoming" <xiaoming.wang@xxxxxxxxx> Date: Mon, 14 Apr 2014 12:30:45 -0400 Subject: [PATCH 08/20] net: ipv4: current group_info should be put after using. [ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ] Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> Signed-off-by: Zhang Dongxing <dongxing.zhang@xxxxxxxxx> Signed-off-by: xiaoming wang <xiaoming.wang@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ping.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index e80db1e..cb90852 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -203,26 +203,33 @@ static int ping_init_sock(struct sock *sk) struct net *net = sock_net(sk); gid_t group = current_egid(); gid_t range[2]; - struct group_info *group_info = get_current_groups(); - int i, j, count = group_info->ngroups; + struct group_info *group_info; + int i, j, count; + int ret = 0; inet_get_ping_group_range_net(net, range, range+1); if (range[0] <= group && group <= range[1]) return 0; + group_info = get_current_groups(); + count = group_info->ngroups; for (i = 0; i < group_info->nblocks; i++) { int cp_count = min_t(int, NGROUPS_PER_BLOCK, count); for (j = 0; j < cp_count; j++) { group = group_info->blocks[i][j]; if (range[0] <= group && group <= range[1]) - return 0; + goto out_release_group; } count -= cp_count; } - return -EACCES; + ret = -EACCES; + +out_release_group: + put_group_info(group_info); + return ret; } static void ping_close(struct sock *sk, long timeout) -- 1.9.3 >From b13a8aff68d6439e8f7651dd46f0e0e02446bea6 Mon Sep 17 00:00:00 2001 From: Mathias Krause <minipli@xxxxxxxxxxxxxx> Date: Sun, 13 Apr 2014 18:23:33 +0200 Subject: [PATCH 09/20] filter: prevent nla extensions to peek beyond the end of the message [ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@xxxxxxxxx> Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx> Acked-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/filter.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 6f755cc..4d7cfc0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -338,11 +338,15 @@ load_b: if (skb_is_nonlinear(skb)) return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; if (A > skb->len - sizeof(struct nlattr)) return 0; nla = (struct nlattr *)&skb->data[A]; - if (nla->nla_len > A - skb->len) + if (nla->nla_len > skb->len - A) return 0; nla = nla_find_nested(nla, X); -- 1.9.3 >From 63ddd023247b5b491dbdecc7b80f24ad84bd2c97 Mon Sep 17 00:00:00 2001 From: Ivan Vecera <ivecera@xxxxxxxxxx> Date: Thu, 17 Apr 2014 14:51:08 +0200 Subject: [PATCH 10/20] tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled The patch fixes a problem with dropped jumbo frames after usage of 'ethtool -G ... rx'. Scenario: 1. ip link set eth0 up 2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo 3. ip link set mtu 9000 dev eth0 The ethtool command set rx_jumbo_pending to zero so any received jumbo packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N' to workaround the issue. The patch changes the logic so rx_jumbo_pending value is changed only if jumbo frames are enabled (MTU > 1500). Signed-off-by: Ivan Vecera <ivecera@xxxxxxxxxx> Acked-by: Michael Chan <mchan@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/ethernet/broadcom/tg3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 237e2a4..cbc6a62 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -10861,7 +10861,9 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e if (tg3_flag(tp, MAX_RXPEND_64) && tp->rx_pending > 63) tp->rx_pending = 63; - tp->rx_jumbo_pending = ering->rx_jumbo_pending; + + if (tg3_flag(tp, JUMBO_RING_ENABLE)) + tp->rx_jumbo_pending = ering->rx_jumbo_pending; for (i = 0; i < tp->irq_max; i++) tp->napi[i].tx_pending = ering->tx_pending; -- 1.9.3 >From 67c712a2e42c93dfbf2f3a49fe58542b3b7cdfe8 Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:35 +1000 Subject: [PATCH 11/20] rtnetlink: Warn when interface's information won't fit in our packet [ Upstream commit 973462bbde79bb827824c73b59027a0aed5c9ca6 ] Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jpirko@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a133427..f7c2554 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1059,6 +1059,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_node *node; struct nlattr *tb[IFLA_MAX+1]; u32 ext_filter_mask = 0; + int err; s_h = cb->args[0]; s_idx = cb->args[1]; @@ -1079,11 +1080,17 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, node, head, index_hlist) { if (idx < s_idx) goto cont; - if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, - NETLINK_CB(cb->skb).pid, - cb->nlh->nlmsg_seq, 0, - NLM_F_MULTI, - ext_filter_mask) <= 0) + err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, + NETLINK_CB(cb->skb).pid, + cb->nlh->nlmsg_seq, 0, + NLM_F_MULTI, + ext_filter_mask); + /* If we ran out of room on the first message, + * we're in trouble + */ + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); + + if (err <= 0) goto out; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); -- 1.9.3 >From ddb48302a2318f1401252a9e2dfed5feb1329e84 Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:36 +1000 Subject: [PATCH 12/20] rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set [ Upstream commit c53864fd60227de025cb79e05493b13f69843971 ] Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jiri@xxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f7c2554..3cd37e9 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -746,7 +746,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, return 0; } -static size_t rtnl_port_size(const struct net_device *dev) +static size_t rtnl_port_size(const struct net_device *dev, + u32 ext_filter_mask) { size_t port_size = nla_total_size(4) /* PORT_VF */ + nla_total_size(PORT_PROFILE_MAX) /* PORT_PROFILE */ @@ -762,7 +763,8 @@ static size_t rtnl_port_size(const struct net_device *dev) size_t port_self_size = nla_total_size(sizeof(struct nlattr)) + port_size; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; if (dev_num_vf(dev->dev.parent)) return port_self_size + vf_ports_size + @@ -793,7 +795,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(ext_filter_mask & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */ + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ - + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ + rtnl_link_get_af_size(dev); /* IFLA_AF_SPEC */ } @@ -853,11 +855,13 @@ static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev) return 0; } -static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev) +static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev, + u32 ext_filter_mask) { int err; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; err = rtnl_port_self_fill(skb, dev); @@ -1004,7 +1008,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, nla_nest_end(skb, vfinfo); } - if (rtnl_port_fill(skb, dev)) + if (rtnl_port_fill(skb, dev, ext_filter_mask)) goto nla_put_failure; if (dev->rtnl_link_ops) { -- 1.9.3 >From 8a7366f9dcda2aca25b6dc3a267a4400b33ba5eb Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Tue, 29 Apr 2014 10:09:51 -0400 Subject: [PATCH 13/20] Revert "macvlan : fix checksums error when we are in bridge mode" [ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ] This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019. The commit above doesn't appear to be necessary any more as the checksums appear to be correctly computed/validated. Additionally the above commit breaks kvm configurations where one VM is using a device that support checksum offload (virtio) and the other VM does not. In this case, packets leaving virtio device will have CHECKSUM_PARTIAL set. The packets is forwarded to a macvtap that has offload features turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not update the checksum and thus a bad checksum is passed up to the guest. CC: Daniel Lezcano <daniel.lezcano@xxxxxxx> CC: Patrick McHardy <kaber@xxxxxxxxx> CC: Andrian Nord <nightnord@xxxxxxxxx> CC: Eric Dumazet <eric.dumazet@xxxxxxxxx> CC: Michael S. Tsirkin <mst@xxxxxxxxxx> CC: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> Acked-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 7160523..c4bb95b 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -237,11 +237,9 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) const struct macvlan_dev *vlan = netdev_priv(dev); const struct macvlan_port *port = vlan->port; const struct macvlan_dev *dest; - __u8 ip_summed = skb->ip_summed; if (vlan->mode == MACVLAN_MODE_BRIDGE) { const struct ethhdr *eth = (void *)skb->data; - skb->ip_summed = CHECKSUM_UNNECESSARY; /* send to other bridge ports directly */ if (is_multicast_ether_addr(eth->h_dest)) { @@ -259,7 +257,6 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) } xmit_world: - skb->ip_summed = ip_summed; skb->dev = vlan->lowerdev; return dev_queue_xmit(skb); } -- 1.9.3 >From 644ecaeb67bb606cbe53ada792693c83d49151ea Mon Sep 17 00:00:00 2001 From: Liu Yu <allanyuliu@xxxxxxxxxxx> Date: Wed, 30 Apr 2014 17:34:09 +0800 Subject: [PATCH 14/20] tcp_cubic: fix the range of delayed_ack [ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ] commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@xxxxxxxxxx> Signed-off-by: Liu Yu <allanyuliu@xxxxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: Neal Cardwell <ncardwell@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/tcp_cubic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index b6ae92a..894b7ce 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -408,7 +408,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT; ratio += cnt; - ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT); + ca->delayed_ack = clamp(ratio, 1U, ACK_RATIO_LIMIT); } /* Some calls are for duplicates without timetamps */ -- 1.9.3 >From 2027289edec93549a2f7fe19daba5d66f1f57efb Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Sun, 4 May 2014 23:24:31 +0200 Subject: [PATCH 15/20] net: ipv4: ip_forward: fix inverted local_df test [ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ] local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_forward.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index e0d9f02..7593f3a 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -42,12 +42,12 @@ static bool ip_may_fragment(const struct sk_buff *skb) { return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) || - !skb->local_df; + skb->local_df; } static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) { - if (skb->len <= mtu || skb->local_df) + if (skb->len <= mtu) return false; if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) -- 1.9.3 >From 1dfe560062636d8d79b46ca62c1ad159a2971859 Mon Sep 17 00:00:00 2001 From: Sergey Popovich <popovich_sergei@xxxxxxx> Date: Tue, 6 May 2014 18:23:08 +0300 Subject: [PATCH 16/20] ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation [ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ] Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich <popovich_sergei@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/fib_semantics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 8861f91..8d244ea 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -751,13 +751,13 @@ struct fib_info *fib_create_info(struct fib_config *cfg) fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL); if (fi == NULL) goto failure; + fib_info_cnt++; if (cfg->fc_mx) { fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL); if (!fi->fib_metrics) goto failure; } else fi->fib_metrics = (u32 *) dst_default_metrics; - fib_info_cnt++; fi->fib_net = hold_net(net); fi->fib_protocol = cfg->fc_protocol; -- 1.9.3 >From 28909174fab7aa22017f2743489aa70955c0847e Mon Sep 17 00:00:00 2001 From: Jason Wang <jasowang@xxxxxxxxxx> Date: Wed, 15 Aug 2012 20:44:27 +0000 Subject: [PATCH 17/20] act_mirred: do not drop packets when fails to mirror it [ Upstream commit 16c0b164bd24d44db137693a36b428ba28970c62 ] We drop packet unconditionally when we fail to mirror it. This is not intended in some cases. Consdier for kvm guest, we may mirror the traffic of the bridge to a tap device used by a VM. When kernel fails to mirror the packet in conditions such as when qemu crashes or stop polling the tap, it's hard for the management software to detect such condition and clean the the mirroring before. This would lead all packets to the bridge to be dropped and break the netowrk of other virtual machines. To solve the issue, the patch does not drop packets when kernel fails to mirror it, and only drop the redirected packets. Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Jamal Hadi Salim <jhs@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sched/act_mirred.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index e051398..d067ed1 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -201,13 +201,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, out: if (err) { m->tcf_qstats.overlimits++; - /* should we be asking for packet to be dropped? - * may make sense for redirect case only - */ - retval = TC_ACT_SHOT; - } else { + if (m->tcfm_eaction != TCA_EGRESS_MIRROR) + retval = TC_ACT_SHOT; + else + retval = m->tcf_action; + } else retval = m->tcf_action; - } spin_unlock(&m->tcf_lock); return retval; -- 1.9.3 >From 906b6573c865730470cc00ffa37ab5325d08f7fb Mon Sep 17 00:00:00 2001 From: Li RongQing <roy.qing.li@xxxxxxxxx> Date: Thu, 22 May 2014 16:36:55 +0800 Subject: [PATCH 18/20] ipv4: initialise the itag variable in __mkroute_input [ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ] the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing <roy.qing.li@xxxxxxxxx> Acked-by: Alexei Starovoitov <ast@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 108c73d..90bc88b 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2129,7 +2129,7 @@ static int __mkroute_input(struct sk_buff *skb, struct in_device *out_dev; unsigned int flags = 0; __be32 spec_dst; - u32 itag; + u32 itag = 0; /* get a working reference to the output device */ out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res)); -- 1.9.3 >From 7e9f307cf3c9d05d18174fab85f62046acc54b11 Mon Sep 17 00:00:00 2001 From: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Date: Fri, 4 May 2012 14:26:56 +0000 Subject: [PATCH 19/20] skb: Add inline helper for getting the skb end offset from head [ Upstream commit ec47ea82477404631d49b8e568c71826c9b663ac ] With the recent changes for how we compute the skb truesize it occurs to me we are probably going to have a lot of calls to skb_end_pointer - skb->head. Instead of running all over the place doing that it would make more sense to just make it a separate inline skb_end_offset(skb) that way we can return the correct value without having gcc having to do all the optimization to cancel out skb->head - skb->head. Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/atm/ambassador.c | 2 +- drivers/atm/idt77252.c | 2 +- drivers/net/wimax/i2400m/usb-rx.c | 2 +- drivers/staging/octeon/ethernet-tx.c | 2 +- include/linux/skbuff.h | 12 +++++++++++- net/core/skbuff.c | 9 ++++----- 6 files changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c index f8f41e0..89b30f3 100644 --- a/drivers/atm/ambassador.c +++ b/drivers/atm/ambassador.c @@ -802,7 +802,7 @@ static void fill_rx_pool (amb_dev * dev, unsigned char pool, } // cast needed as there is no %? for pointer differences PRINTD (DBG_SKB, "allocated skb at %p, head %p, area %li", - skb, skb->head, (long) (skb_end_pointer(skb) - skb->head)); + skb, skb->head, (long) skb_end_offset(skb)); rx.handle = virt_to_bus (skb); rx.host_address = cpu_to_be32 (virt_to_bus (skb->data)); if (rx_give (dev, &rx, pool)) diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c index b0e75ce..81845fa 100644 --- a/drivers/atm/idt77252.c +++ b/drivers/atm/idt77252.c @@ -1258,7 +1258,7 @@ idt77252_rx_raw(struct idt77252_dev *card) tail = readl(SAR_REG_RAWCT); pci_dma_sync_single_for_cpu(card->pcidev, IDT77252_PRV_PADDR(queue), - skb_end_pointer(queue) - queue->head - 16, + skb_end_offset(queue) - 16, PCI_DMA_FROMDEVICE); while (head != tail) { diff --git a/drivers/net/wimax/i2400m/usb-rx.c b/drivers/net/wimax/i2400m/usb-rx.c index e325768..b78ee67 100644 --- a/drivers/net/wimax/i2400m/usb-rx.c +++ b/drivers/net/wimax/i2400m/usb-rx.c @@ -277,7 +277,7 @@ retry: d_printf(1, dev, "RX: size changed to %d, received %d, " "copied %d, capacity %ld\n", rx_size, read_size, rx_skb->len, - (long) (skb_end_pointer(new_skb) - new_skb->head)); + (long) skb_end_offset(new_skb)); goto retry; } /* In most cases, it happens due to the hardware scheduling a diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c index 91a97b3..5877b2c 100644 --- a/drivers/staging/octeon/ethernet-tx.c +++ b/drivers/staging/octeon/ethernet-tx.c @@ -345,7 +345,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev) } if (unlikely (skb->truesize != - sizeof(*skb) + skb_end_pointer(skb) - skb->head)) { + sizeof(*skb) + skb_end_offset(skb))) { /* printk("TX buffer truesize has been changed\n"); */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3a7b87e..0884db3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -640,11 +640,21 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) { return skb->head + skb->end; } + +static inline unsigned int skb_end_offset(const struct sk_buff *skb) +{ + return skb->end; +} #else static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) { return skb->end; } + +static inline unsigned int skb_end_offset(const struct sk_buff *skb) +{ + return skb->end - skb->head; +} #endif /* Internal */ @@ -2574,7 +2584,7 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size) return false; skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD); - if (skb_end_pointer(skb) - skb->head < skb_size) + if (skb_end_offset(skb) < skb_size) return false; if (skb_shared(skb) || skb_cloned(skb)) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0f6c1a8..fe42834 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -821,7 +821,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) { int headerlen = skb_headroom(skb); - unsigned int size = (skb_end_pointer(skb) - skb->head) + skb->data_len; + unsigned int size = skb_end_offset(skb) + skb->data_len; struct sk_buff *n = alloc_skb(size, gfp_mask); if (!n) @@ -922,7 +922,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, { int i; u8 *data; - int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail; + int size = nhead + skb_end_offset(skb) + ntail; long off; bool fastpath; @@ -2721,14 +2721,13 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) if (unlikely(!nskb)) goto err; - hsize = skb_end_pointer(nskb) - nskb->head; + hsize = skb_end_offset(nskb); if (skb_cow_head(nskb, doffset + headroom)) { kfree_skb(nskb); goto err; } - nskb->truesize += skb_end_pointer(nskb) - nskb->head - - hsize; + nskb->truesize += skb_end_offset(nskb) - hsize; skb_release_head_state(nskb); __skb_push(nskb, doffset); } else { -- 1.9.3 >From 80b7f162878f018119aefb699dc7e4ab4b8dcfdb Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 3 Apr 2014 09:28:10 -0700 Subject: [PATCH 20/20] net-gro: reset skb->truesize in napi_reuse_skb() [ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/dev.c b/net/core/dev.c index cebdc15..b47375d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3574,6 +3574,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->vlan_tci = 0; skb->dev = napi->dev; skb->skb_iif = 0; + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); napi->skb = skb; } -- 1.9.3
From b373174e9889834c9c9bc143ca1dec4ff4bc480c Mon Sep 17 00:00:00 2001 From: Oleg Nesterov <oleg@xxxxxxxxxx> Date: Tue, 12 Nov 2013 15:10:01 -0800 Subject: [PATCH 01/49] list: introduce list_next_entry() and list_prev_entry() [ Upstream commit 008208c6b26f21c2648c250a09c55e737c02c5f8 ] Add two trivial helpers list_next_entry() and list_prev_entry(), they can have a lot of users including list.h itself. In fact the 1st one is already defined in events/core.c and bnx2x_sp.c, so the patch simply moves the definition to list.h. Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: Eilon Greenstein <eilong@xxxxxxxxxxxx> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 3 --- include/linux/list.h | 16 ++++++++++++++++ kernel/events/core.c | 3 --- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index 32a9609..fd50781 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c @@ -1038,9 +1038,6 @@ static void bnx2x_set_one_vlan_mac_e1h(struct bnx2x *bp, ETH_VLAN_FILTER_CLASSIFY, config); } -#define list_next_entry(pos, member) \ - list_entry((pos)->member.next, typeof(*(pos)), member) - /** * bnx2x_vlan_mac_restore - reconfigure next MAC/VLAN/VLAN-MAC element * diff --git a/include/linux/list.h b/include/linux/list.h index b83e565..83a9576 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -373,6 +373,22 @@ static inline void list_splice_tail_init(struct list_head *list, (!list_empty(ptr) ? list_first_entry(ptr, type, member) : NULL) /** + * list_next_entry - get the next element in list + * @pos: the type * to cursor + * @member: the name of the list_struct within the struct. + */ +#define list_next_entry(pos, member) \ + list_entry((pos)->member.next, typeof(*(pos)), member) + +/** + * list_prev_entry - get the prev element in list + * @pos: the type * to cursor + * @member: the name of the list_struct within the struct. + */ +#define list_prev_entry(pos, member) \ + list_entry((pos)->member.prev, typeof(*(pos)), member) + +/** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. * @head: the head for your list. diff --git a/kernel/events/core.c b/kernel/events/core.c index f8eb2b1..ac9b8cc 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2016,9 +2016,6 @@ static void __perf_event_sync_stat(struct perf_event *event, perf_event_update_userpage(next_event); } -#define list_next_entry(pos, member) \ - list_entry(pos->member.next, typeof(*pos), member) - static void perf_event_sync_stat(struct perf_event_context *ctx, struct perf_event_context *next_ctx) { -- 1.9.3 From 18a014e91aaa659bb33689da08df35a83b4db933 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Tue, 8 Apr 2014 17:26:13 +0200 Subject: [PATCH 02/49] net: sctp: wake up all assocs if sndbuf policy is per socket [ Upstream commit 52c35befb69b005c3fc5afdaae3a5717ad013411 ] SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Thomas Graf <tgraf@xxxxxxx> Cc: Neil Horman <nhorman@xxxxxxxxxxxxx> Cc: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/socket.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 8554e5e..4b0fe74 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6582,6 +6582,40 @@ static void __sctp_write_space(struct sctp_association *asoc) } } +static void sctp_wake_up_waiters(struct sock *sk, + struct sctp_association *asoc) +{ + struct sctp_association *tmp = asoc; + + /* We do accounting for the sndbuf space per association, + * so we only need to wake our own association. + */ + if (asoc->ep->sndbuf_policy) + return __sctp_write_space(asoc); + + /* Accounting for the sndbuf space is per socket, so we + * need to wake up others, try to be fair and in case of + * other associations, let them have a go first instead + * of just doing a sctp_write_space() call. + * + * Note that we reach sctp_wake_up_waiters() only when + * associations free up queued chunks, thus we are under + * lock and the list of associations on a socket is + * guaranteed not to change. + */ + for (tmp = list_next_entry(tmp, asocs); 1; + tmp = list_next_entry(tmp, asocs)) { + /* Manually skip the head element. */ + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) + continue; + /* Wake up association. */ + __sctp_write_space(tmp); + /* We've reached the end. */ + if (tmp == asoc) + break; + } +} + /* Do accounting for the sndbuf space. * Decrement the used sndbuf space of the corresponding association by the * data size which was just transmitted(freed). @@ -6609,7 +6643,7 @@ static void sctp_wfree(struct sk_buff *skb) sk_mem_uncharge(sk, skb->truesize); sock_wfree(skb); - __sctp_write_space(asoc); + sctp_wake_up_waiters(sk, asoc); sctp_association_put(asoc); } -- 1.9.3 From b70bea0872966996b0b1fc7a6e1e0873e08b8ccf Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Wed, 9 Apr 2014 16:10:20 +0200 Subject: [PATCH 03/49] net: sctp: test if association is dead in sctp_wake_up_waiters [ Upstream commit 1e1cdf8ac78793e0875465e98a648df64694a8d0 ] In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Acked-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/socket.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 4b0fe74..9713641 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6593,6 +6593,12 @@ static void sctp_wake_up_waiters(struct sock *sk, if (asoc->ep->sndbuf_policy) return __sctp_write_space(asoc); + /* If association goes down and is just flushing its + * outq, then just normally notify others. + */ + if (asoc->base.dead) + return sctp_write_space(sk); + /* Accounting for the sndbuf space is per socket, so we * need to wake up others, try to be fair and in case of * other associations, let them have a go first instead -- 1.9.3 From c83aff941c9e9e51a29a314f67b9cfb7df8025ce Mon Sep 17 00:00:00 2001 From: Dmitry Petukhov <dmgenp@xxxxxxxxx> Date: Wed, 9 Apr 2014 02:23:20 +0600 Subject: [PATCH 04/49] l2tp: take PMTU from tunnel UDP socket [ Upstream commit f34c4a35d87949fbb0e0f31eba3c054e9f8199ba ] When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/l2tp/l2tp_ppp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 44441c0..9a0e587 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -754,9 +754,9 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, session->deref = pppol2tp_session_sock_put; /* If PMTU discovery was enabled, use the MTU that was discovered */ - dst = sk_dst_get(sk); + dst = sk_dst_get(tunnel->sock); if (dst != NULL) { - u32 pmtu = dst_mtu(__sk_dst_get(sk)); + u32 pmtu = dst_mtu(__sk_dst_get(tunnel->sock)); if (pmtu != 0) session->mtu = session->mru = pmtu - PPPOL2TP_HEADER_OVERHEAD; -- 1.9.3 From 2f40f11e9c8b9d0451c8a91792ad8375efbdb909 Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Wed, 9 Apr 2014 10:28:50 +0200 Subject: [PATCH 05/49] net: core: don't account for udp header size when computing seglen [ Upstream commit 6d39d589bb76ee8a1c6cde6822006ae0053decff ] In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx> Reported-by: Tobias Brunner <tobias@xxxxxxxxxxxxxx> Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/skbuff.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 79143b7..66f722b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3487,12 +3487,14 @@ EXPORT_SYMBOL(skb_try_coalesce); unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) { const struct skb_shared_info *shinfo = skb_shinfo(skb); - unsigned int hdr_len; if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) - hdr_len = tcp_hdrlen(skb); - else - hdr_len = sizeof(struct udphdr); - return hdr_len + shinfo->gso_size; + return tcp_hdrlen(skb) + shinfo->gso_size; + + /* UFO sets gso_size to the size of the fragmentation + * payload, i.e. the size of the L4 (UDP) header is already + * accounted for. + */ + return shinfo->gso_size; } EXPORT_SYMBOL_GPL(skb_gso_transport_seglen); -- 1.9.3 From 1c766e5066499e1156e2de0e8b9bb2dc0a6564d0 Mon Sep 17 00:00:00 2001 From: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Date: Wed, 9 Apr 2014 12:52:59 +0200 Subject: [PATCH 06/49] bonding: Remove debug_fs files when module init fails [ Upstream commit db29868653394937037d71dc3545768302dda643 ] Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Signed-off-by: Jay Vosburgh <j.vosburgh@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8395b09..b143ce9 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4995,6 +4995,7 @@ static int __init bonding_init(void) out: return res; err: + bond_destroy_debugfs(); rtnl_link_unregister(&bond_link_ops); err_link: unregister_pernet_subsys(&bond_net_ops); -- 1.9.3 From 92b562fab478d6ed3fd09f90d764963197d4e6fd Mon Sep 17 00:00:00 2001 From: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Date: Wed, 9 Apr 2014 17:00:30 +0900 Subject: [PATCH 07/49] bridge: Fix double free and memory leak around br_allowed_ingress [ Upstream commit eb7076182d1ae4bc4641534134ed707100d76acc ] br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/bridge/br_input.c | 2 +- net/bridge/br_vlan.c | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 828e2bc..0a3bc82 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -71,7 +71,7 @@ int br_handle_frame_finish(struct sk_buff *skb) goto drop; if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) - goto drop; + goto out; /* insert into forwarding database after filtering to avoid spoofing */ br = p->br; diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index 9a9ffe7..d8deb8b 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -202,7 +202,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, * rejected. */ if (!v) - return false; + goto drop; if (br_vlan_get_tag(skb, vid)) { u16 pvid = br_get_pvid(v); @@ -212,7 +212,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, * traffic belongs to. */ if (pvid == VLAN_N_VID) - return false; + goto drop; /* PVID is set on this port. Any untagged ingress * frame is considered to belong to this vlan. @@ -224,7 +224,8 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, /* Frame had a valid vlan tag. See if vlan is allowed */ if (test_bit(*vid, v->vlan_bitmap)) return true; - +drop: + kfree_skb(skb); return false; } -- 1.9.3 From 923526dd310da109f40e2388ebf45927b921da06 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 10 Apr 2014 21:23:36 -0700 Subject: [PATCH 08/49] ipv6: Limit mtu to 65575 bytes [ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@xxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: YOSHIFUJI Hideaki <yoshfuji@xxxxxxxxxxxxxx> Acked-by: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/ip6_route.h | 5 +++++ net/ipv6/route.c | 5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index b906f4a..8d977b3 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -32,6 +32,11 @@ struct route_info { #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 +/* We do not (yet ?) support IPv6 jumbograms (RFC 2675) + * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header + */ +#define IP6_MAX_MTU (0xFFFF + sizeof(struct ipv6hdr)) + /* * rt6_srcprefs2flags() and rt6_flags2srcprefs() translate * between IPV6_ADDR_PREFERENCES socket option values diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 3fde3e9..b2614b2 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1236,7 +1236,7 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); if (mtu) - return mtu; + goto out; mtu = IPV6_MIN_MTU; @@ -1246,7 +1246,8 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) mtu = idev->cnf.mtu6; rcu_read_unlock(); - return mtu; +out: + return min_t(unsigned int, mtu, IP6_MAX_MTU); } static struct dst_entry *icmp6_dst_gc_list; -- 1.9.3 From 59f19546c0ea2b0b87ebe585d7df1714c8c70b46 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Date: Fri, 11 Apr 2014 15:51:18 +0200 Subject: [PATCH 09/49] gre: don't allow to add the same tunnel twice [ Upstream commit 5a4552752d8f7f4cef1d98775ece7adb7616fde2 ] Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar <pshelar@xxxxxxxxxx> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_gre.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 828b2e8..fae5a84 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -652,6 +652,7 @@ static const struct net_device_ops ipgre_netdev_ops = { static void ipgre_tunnel_setup(struct net_device *dev) { dev->netdev_ops = &ipgre_netdev_ops; + dev->type = ARPHRD_IPGRE; ip_tunnel_setup(dev, ipgre_net_id); } @@ -690,7 +691,6 @@ static int ipgre_tunnel_init(struct net_device *dev) memcpy(dev->dev_addr, &iph->saddr, 4); memcpy(dev->broadcast, &iph->daddr, 4); - dev->type = ARPHRD_IPGRE; dev->flags = IFF_NOARP; dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; dev->addr_len = 4; -- 1.9.3 From b51f0fb1b365b770d2bbba7d167cad64ac46a979 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Date: Fri, 11 Apr 2014 15:51:19 +0200 Subject: [PATCH 10/49] vti: don't allow to add the same tunnel twice [ Upstream commit 8d89dcdf80d88007647945a753821a06eb6cc5a5 ] Before the patch, it was possible to add two times the same tunnel: ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41 ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code"). CC: Cong Wang <amwang@xxxxxxxxxx> CC: Steffen Klassert <steffen.klassert@xxxxxxxxxxx> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_vti.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index feb19db..4ec3427 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -579,9 +579,9 @@ static void vti_dev_free(struct net_device *dev) static void vti_tunnel_setup(struct net_device *dev) { dev->netdev_ops = &vti_netdev_ops; + dev->type = ARPHRD_TUNNEL; dev->destructor = vti_dev_free; - dev->type = ARPHRD_TUNNEL; dev->hard_header_len = LL_MAX_HEADER + sizeof(struct iphdr); dev->mtu = ETH_DATA_LEN; dev->flags = IFF_NOARP; -- 1.9.3 From 92282568f14434330224bb4780d8ad229893bbbc Mon Sep 17 00:00:00 2001 From: "Wang, Xiaoming" <xiaoming.wang@xxxxxxxxx> Date: Mon, 14 Apr 2014 12:30:45 -0400 Subject: [PATCH 11/49] net: ipv4: current group_info should be put after using. [ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ] Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> Signed-off-by: Zhang Dongxing <dongxing.zhang@xxxxxxxxx> Signed-off-by: xiaoming wang <xiaoming.wang@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ping.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 8cae28f..aa857a4 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -204,26 +204,33 @@ static int ping_init_sock(struct sock *sk) { struct net *net = sock_net(sk); kgid_t group = current_egid(); - struct group_info *group_info = get_current_groups(); - int i, j, count = group_info->ngroups; + struct group_info *group_info; + int i, j, count; kgid_t low, high; + int ret = 0; inet_get_ping_group_range_net(net, &low, &high); if (gid_lte(low, group) && gid_lte(group, high)) return 0; + group_info = get_current_groups(); + count = group_info->ngroups; for (i = 0; i < group_info->nblocks; i++) { int cp_count = min_t(int, NGROUPS_PER_BLOCK, count); for (j = 0; j < cp_count; j++) { kgid_t gid = group_info->blocks[i][j]; if (gid_lte(low, gid) && gid_lte(gid, high)) - return 0; + goto out_release_group; } count -= cp_count; } - return -EACCES; + ret = -EACCES; + +out_release_group: + put_group_info(group_info); + return ret; } static void ping_close(struct sock *sk, long timeout) -- 1.9.3 From 4581e8b1659582495ed0077c71c9c7a3ec2653dd Mon Sep 17 00:00:00 2001 From: Julian Anastasov <ja@xxxxxx> Date: Sun, 13 Apr 2014 18:08:02 +0300 Subject: [PATCH 12/49] ipv4: return valid RTA_IIF on ip route get [ Upstream commit 91146153da2feab18efab2e13b0945b6bb704ded ] Extend commit 13378cad02afc2adc6c0e07fca03903c7ada0b37 ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov <ja@xxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 1a362f3..c850647 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2306,7 +2306,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, } } else #endif - if (nla_put_u32(skb, RTA_IIF, rt->rt_iif)) + if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex)) goto nla_put_failure; } -- 1.9.3 From c1270ea6fda29a719b44a35e5529e6a33a85c309 Mon Sep 17 00:00:00 2001 From: Mathias Krause <minipli@xxxxxxxxxxxxxx> Date: Sun, 13 Apr 2014 18:23:33 +0200 Subject: [PATCH 13/49] filter: prevent nla extensions to peek beyond the end of the message [ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@xxxxxxxxx> Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx> Acked-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/filter.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 52f0122..c6c18d8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -355,6 +355,8 @@ load_b: if (skb_is_nonlinear(skb)) return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; if (A > skb->len - sizeof(struct nlattr)) return 0; @@ -371,11 +373,13 @@ load_b: if (skb_is_nonlinear(skb)) return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; if (A > skb->len - sizeof(struct nlattr)) return 0; nla = (struct nlattr *)&skb->data[A]; - if (nla->nla_len > A - skb->len) + if (nla->nla_len > skb->len - A) return 0; nla = nla_find_nested(nla, X); -- 1.9.3 From 963c02d7d468536e189d06956eb1bc654863ee27 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Date: Mon, 14 Apr 2014 17:11:38 +0200 Subject: [PATCH 14/49] ip6_gre: don't allow to remove the fb_tunnel_dev [ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ] It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@xxxxxxx> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_gre.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 1f9a1a5..7dca7c4 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1549,6 +1549,15 @@ static int ip6gre_changelink(struct net_device *dev, struct nlattr *tb[], return 0; } +static void ip6gre_dellink(struct net_device *dev, struct list_head *head) +{ + struct net *net = dev_net(dev); + struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); + + if (dev != ign->fb_tunnel_dev) + unregister_netdevice_queue(dev, head); +} + static size_t ip6gre_get_size(const struct net_device *dev) { return @@ -1626,6 +1635,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = { .validate = ip6gre_tunnel_validate, .newlink = ip6gre_newlink, .changelink = ip6gre_changelink, + .dellink = ip6gre_dellink, .get_size = ip6gre_get_size, .fill_info = ip6gre_fill_info, }; -- 1.9.3 From 0fb613ccdce5a8b5f355e5ff6a342fbd75f117c5 Mon Sep 17 00:00:00 2001 From: dingtianhong <dingtianhong@xxxxxxxxxx> Date: Thu, 17 Apr 2014 18:40:36 +0800 Subject: [PATCH 15/49] vlan: Fix lockdep warning when vlan dev handle notification [ Upstream commit dc8eaaa006350d24030502a4521542e74b5cb39f ] When I open the LOCKDEP config and run these steps: modprobe 8021q vconfig add eth2 20 vconfig add eth2.20 30 ifconfig eth2 xx.xx.xx.xx then the Call Trace happened: [32524.386288] ============================================= [32524.386293] [ INFO: possible recursive locking detected ] [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O [32524.386302] --------------------------------------------- [32524.386306] ifconfig/3103 is trying to acquire lock: [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386326] [32524.386326] but task is already holding lock: [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386341] [32524.386341] other info that might help us debug this: [32524.386345] Possible unsafe locking scenario: [32524.386345] [32524.386350] CPU0 [32524.386352] ---- [32524.386354] lock(&vlan_netdev_addr_lock_key/1); [32524.386359] lock(&vlan_netdev_addr_lock_key/1); [32524.386364] [32524.386364] *** DEADLOCK *** [32524.386364] [32524.386368] May be due to missing lock nesting notation [32524.386368] [32524.386373] 2 locks held by ifconfig/3103: [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20 [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386398] [32524.386398] stack backtrace: [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35 [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8 [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0 [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000 [32524.386435] Call Trace: [32524.386441] [<ffffffff814f68a2>] dump_stack+0x6a/0x78 [32524.386448] [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940 [32524.386454] [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940 [32524.386459] [<ffffffff810a4874>] lock_acquire+0xe4/0x110 [32524.386464] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386471] [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40 [32524.386476] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386481] [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386489] [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q] [32524.386495] [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0 [32524.386500] [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40 [32524.386506] [<ffffffff8141b3cf>] __dev_open+0xef/0x150 [32524.386511] [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190 [32524.386516] [<ffffffff8141b292>] dev_change_flags+0x32/0x80 [32524.386524] [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830 [32524.386532] [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660 [32524.386540] [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0 [32524.386550] [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60 [32524.386558] [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0 [32524.386568] [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590 [32524.386578] [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50 [32524.386586] [<ffffffff811b39e5>] ? __fget_light+0x105/0x110 [32524.386594] [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0 [32524.386604] [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b ======================================================================== The reason is that all of the addr_lock_key for vlan dev have the same class, so if we change the status for vlan dev, the vlan dev and its real dev will hold the same class of addr_lock_key together, so the warning happened. we should distinguish the lock depth for vlan dev and its real dev. v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which could support to add 8 vlan id on a same vlan dev, I think it is enough for current scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id, and the vlan dev would not meet the same class key with its real dev. The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan dev could get a suitable class key. v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev and its real dev, but it make no sense, because the difference for subclass in the lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth to distinguish the different depth for every vlan dev, the same depth of the vlan dev could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h, I think it is enough here, the lockdep should never exceed that value. v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method, we could use _nested() variants to fix the problem, calculate the depth for every vlan dev, and use the depth as the subclass for addr_lock_key. Signed-off-by: Ding Tianhong <dingtianhong@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/8021q/vlan_dev.c | 46 +++++++++++++++++++++++++++++++++++++++++----- net/core/dev.c | 1 + 2 files changed, 42 insertions(+), 5 deletions(-) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index cf35f38..698e922 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -512,10 +512,48 @@ static void vlan_dev_change_rx_flags(struct net_device *dev, int change) } } +static int vlan_calculate_locking_subclass(struct net_device *real_dev) +{ + int subclass = 0; + + while (is_vlan_dev(real_dev)) { + subclass++; + real_dev = vlan_dev_priv(real_dev)->real_dev; + } + + return subclass; +} + +static void vlan_dev_mc_sync(struct net_device *to, struct net_device *from) +{ + int err = 0, subclass; + + subclass = vlan_calculate_locking_subclass(to); + + spin_lock_nested(&to->addr_list_lock, subclass); + err = __hw_addr_sync(&to->mc, &from->mc, to->addr_len); + if (!err) + __dev_set_rx_mode(to); + spin_unlock(&to->addr_list_lock); +} + +static void vlan_dev_uc_sync(struct net_device *to, struct net_device *from) +{ + int err = 0, subclass; + + subclass = vlan_calculate_locking_subclass(to); + + spin_lock_nested(&to->addr_list_lock, subclass); + err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len); + if (!err) + __dev_set_rx_mode(to); + spin_unlock(&to->addr_list_lock); +} + static void vlan_dev_set_rx_mode(struct net_device *vlan_dev) { - dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); - dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); + vlan_dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); + vlan_dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); } /* @@ -624,9 +662,7 @@ static int vlan_dev_init(struct net_device *dev) SET_NETDEV_DEVTYPE(dev, &vlan_type); - if (is_vlan_dev(real_dev)) - subclass = 1; - + subclass = vlan_calculate_locking_subclass(dev); vlan_dev_set_lockdep_class(dev, subclass); vlan_dev_priv(dev)->vlan_pcpu_stats = alloc_percpu(struct vlan_pcpu_stats); diff --git a/net/core/dev.c b/net/core/dev.c index a0e55ff..5bf26c4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4634,6 +4634,7 @@ void __dev_set_rx_mode(struct net_device *dev) if (ops->ndo_set_rx_mode) ops->ndo_set_rx_mode(dev); } +EXPORT_SYMBOL(__dev_set_rx_mode); void dev_set_rx_mode(struct net_device *dev) { -- 1.9.3 From caf9d8a80b31f6356fcdcad77e3fd060338ebde7 Mon Sep 17 00:00:00 2001 From: Ivan Vecera <ivecera@xxxxxxxxxx> Date: Thu, 17 Apr 2014 14:51:08 +0200 Subject: [PATCH 16/49] tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled The patch fixes a problem with dropped jumbo frames after usage of 'ethtool -G ... rx'. Scenario: 1. ip link set eth0 up 2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo 3. ip link set mtu 9000 dev eth0 The ethtool command set rx_jumbo_pending to zero so any received jumbo packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N' to workaround the issue. The patch changes the logic so rx_jumbo_pending value is changed only if jumbo frames are enabled (MTU > 1500). Signed-off-by: Ivan Vecera <ivecera@xxxxxxxxxx> Acked-by: Michael Chan <mchan@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/ethernet/broadcom/tg3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 4a0db61..4942ddf 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -12073,7 +12073,9 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e if (tg3_flag(tp, MAX_RXPEND_64) && tp->rx_pending > 63) tp->rx_pending = 63; - tp->rx_jumbo_pending = ering->rx_jumbo_pending; + + if (tg3_flag(tp, JUMBO_RING_ENABLE)) + tp->rx_jumbo_pending = ering->rx_jumbo_pending; for (i = 0; i < tp->irq_max; i++) tp->napi[i].tx_pending = ering->tx_pending; -- 1.9.3 From 0ddf22c1f30eae02c2b88d9030386aa27063eddd Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Thu, 17 Apr 2014 17:26:50 +0200 Subject: [PATCH 17/49] net: sctp: cache auth_enable per endpoint [ Upstream commit b14878ccb7fac0242db82720b784ab62c467c0dc ] Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Tested-by: Joshua Kinard <kumba@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/sctp/structs.h | 4 +++- net/sctp/auth.c | 17 ++++++--------- net/sctp/endpointola.c | 3 ++- net/sctp/sm_make_chunk.c | 32 ++++++++++++++------------- net/sctp/sm_statefuns.c | 6 +++--- net/sctp/socket.c | 54 ++++++++++++++++++++++------------------------ net/sctp/sysctl.c | 38 ++++++++++++++++++++++++++++++-- 7 files changed, 93 insertions(+), 61 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 1bd4c41..da6b9a0 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1252,6 +1252,7 @@ struct sctp_endpoint { /* SCTP-AUTH: endpoint shared keys */ struct list_head endpoint_shared_keys; __u16 active_key_id; + __u8 auth_enable; }; /* Recover the outter endpoint structure. */ @@ -1280,7 +1281,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *, int sctp_has_association(struct net *net, const union sctp_addr *laddr, const union sctp_addr *paddr); -int sctp_verify_init(struct net *net, const struct sctp_association *asoc, +int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, + const struct sctp_association *asoc, sctp_cid_t, sctp_init_chunk_t *peer_init, struct sctp_chunk *chunk, struct sctp_chunk **err_chunk); int sctp_process_init(struct sctp_association *, struct sctp_chunk *chunk, diff --git a/net/sctp/auth.c b/net/sctp/auth.c index ba1dfc3..7a19117 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -393,14 +393,13 @@ nomem: */ int sctp_auth_asoc_init_active_key(struct sctp_association *asoc, gfp_t gfp) { - struct net *net = sock_net(asoc->base.sk); struct sctp_auth_bytes *secret; struct sctp_shared_key *ep_key; /* If we don't support AUTH, or peer is not capable * we don't need to do anything. */ - if (!net->sctp.auth_enable || !asoc->peer.auth_capable) + if (!asoc->ep->auth_enable || !asoc->peer.auth_capable) return 0; /* If the key_id is non-zero and we couldn't find an @@ -447,16 +446,16 @@ struct sctp_shared_key *sctp_auth_get_shkey( */ int sctp_auth_init_hmacs(struct sctp_endpoint *ep, gfp_t gfp) { - struct net *net = sock_net(ep->base.sk); struct crypto_hash *tfm = NULL; __u16 id; - /* if the transforms are already allocted, we are done */ - if (!net->sctp.auth_enable) { + /* If AUTH extension is disabled, we are done */ + if (!ep->auth_enable) { ep->auth_hmacs = NULL; return 0; } + /* If the transforms are already allocated, we are done */ if (ep->auth_hmacs) return 0; @@ -677,12 +676,10 @@ static int __sctp_auth_cid(sctp_cid_t chunk, struct sctp_chunks_param *param) /* Check if peer requested that this chunk is authenticated */ int sctp_auth_send_cid(sctp_cid_t chunk, const struct sctp_association *asoc) { - struct net *net; if (!asoc) return 0; - net = sock_net(asoc->base.sk); - if (!net->sctp.auth_enable || !asoc->peer.auth_capable) + if (!asoc->ep->auth_enable || !asoc->peer.auth_capable) return 0; return __sctp_auth_cid(chunk, asoc->peer.peer_chunks); @@ -691,12 +688,10 @@ int sctp_auth_send_cid(sctp_cid_t chunk, const struct sctp_association *asoc) /* Check if we requested that peer authenticate this chunk. */ int sctp_auth_recv_cid(sctp_cid_t chunk, const struct sctp_association *asoc) { - struct net *net; if (!asoc) return 0; - net = sock_net(asoc->base.sk); - if (!net->sctp.auth_enable) + if (!asoc->ep->auth_enable) return 0; return __sctp_auth_cid(chunk, diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c index 5fbd7bc..e09f906 100644 --- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -75,7 +75,8 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep, if (!ep->digest) return NULL; - if (net->sctp.auth_enable) { + ep->auth_enable = net->sctp.auth_enable; + if (ep->auth_enable) { /* Allocate space for HMACS and CHUNKS authentication * variables. There are arrays that we encode directly * into parameters to make the rest of the operations easier. diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 673921c..87e244b 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -199,6 +199,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, gfp_t gfp, int vparam_len) { struct net *net = sock_net(asoc->base.sk); + struct sctp_endpoint *ep = asoc->ep; sctp_inithdr_t init; union sctp_params addrs; size_t chunksize; @@ -258,7 +259,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, chunksize += vparam_len; /* Account for AUTH related parameters */ - if (net->sctp.auth_enable) { + if (ep->auth_enable) { /* Add random parameter length*/ chunksize += sizeof(asoc->c.auth_random); @@ -343,7 +344,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, } /* Add SCTP-AUTH chunks to the parameter list */ - if (net->sctp.auth_enable) { + if (ep->auth_enable) { sctp_addto_chunk(retval, sizeof(asoc->c.auth_random), asoc->c.auth_random); if (auth_hmacs) @@ -1995,7 +1996,7 @@ static void sctp_process_ext_param(struct sctp_association *asoc, /* if the peer reports AUTH, assume that he * supports AUTH. */ - if (net->sctp.auth_enable) + if (asoc->ep->auth_enable) asoc->peer.auth_capable = 1; break; case SCTP_CID_ASCONF: @@ -2087,6 +2088,7 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc, * SCTP_IERROR_NO_ERROR - continue with the chunk */ static sctp_ierror_t sctp_verify_param(struct net *net, + const struct sctp_endpoint *ep, const struct sctp_association *asoc, union sctp_params param, sctp_cid_t cid, @@ -2137,7 +2139,7 @@ static sctp_ierror_t sctp_verify_param(struct net *net, goto fallthrough; case SCTP_PARAM_RANDOM: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fallthrough; /* SCTP-AUTH: Secion 6.1 @@ -2154,7 +2156,7 @@ static sctp_ierror_t sctp_verify_param(struct net *net, break; case SCTP_PARAM_CHUNKS: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fallthrough; /* SCTP-AUTH: Section 3.2 @@ -2170,7 +2172,7 @@ static sctp_ierror_t sctp_verify_param(struct net *net, break; case SCTP_PARAM_HMAC_ALGO: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fallthrough; hmacs = (struct sctp_hmac_algo_param *)param.p; @@ -2204,10 +2206,9 @@ fallthrough: } /* Verify the INIT packet before we process it. */ -int sctp_verify_init(struct net *net, const struct sctp_association *asoc, - sctp_cid_t cid, - sctp_init_chunk_t *peer_init, - struct sctp_chunk *chunk, +int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, + const struct sctp_association *asoc, sctp_cid_t cid, + sctp_init_chunk_t *peer_init, struct sctp_chunk *chunk, struct sctp_chunk **errp) { union sctp_params param; @@ -2250,8 +2251,8 @@ int sctp_verify_init(struct net *net, const struct sctp_association *asoc, /* Verify all the variable length parameters */ sctp_walk_params(param, peer_init, init_hdr.params) { - - result = sctp_verify_param(net, asoc, param, cid, chunk, errp); + result = sctp_verify_param(net, ep, asoc, param, cid, + chunk, errp); switch (result) { case SCTP_IERROR_ABORT: case SCTP_IERROR_NOMEM: @@ -2483,6 +2484,7 @@ static int sctp_process_param(struct sctp_association *asoc, struct sctp_af *af; union sctp_addr_param *addr_param; struct sctp_transport *t; + struct sctp_endpoint *ep = asoc->ep; /* We maintain all INIT parameters in network byte order all the * time. This allows us to not worry about whether the parameters @@ -2623,7 +2625,7 @@ do_addr_param: goto fall_through; case SCTP_PARAM_RANDOM: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fall_through; /* Save peer's random parameter */ @@ -2636,7 +2638,7 @@ do_addr_param: break; case SCTP_PARAM_HMAC_ALGO: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fall_through; /* Save peer's HMAC list */ @@ -2652,7 +2654,7 @@ do_addr_param: break; case SCTP_PARAM_CHUNKS: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fall_through; asoc->peer.peer_chunks = kmemdup(param.p, diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 9973079..6eb2640 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -364,7 +364,7 @@ sctp_disposition_t sctp_sf_do_5_1B_init(struct net *net, /* Verify the INIT chunk before processing it. */ err_chunk = NULL; - if (!sctp_verify_init(net, asoc, chunk->chunk_hdr->type, + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, (sctp_init_chunk_t *)chunk->chunk_hdr, chunk, &err_chunk)) { /* This chunk contains fatal error. It is to be discarded. @@ -531,7 +531,7 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(struct net *net, /* Verify the INIT chunk before processing it. */ err_chunk = NULL; - if (!sctp_verify_init(net, asoc, chunk->chunk_hdr->type, + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, (sctp_init_chunk_t *)chunk->chunk_hdr, chunk, &err_chunk)) { @@ -1437,7 +1437,7 @@ static sctp_disposition_t sctp_sf_do_unexpected_init( /* Verify the INIT chunk before processing it. */ err_chunk = NULL; - if (!sctp_verify_init(net, asoc, chunk->chunk_hdr->type, + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, (sctp_init_chunk_t *)chunk->chunk_hdr, chunk, &err_chunk)) { /* This chunk contains fatal error. It is to be discarded. diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 9713641..dfb9b13 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3318,10 +3318,10 @@ static int sctp_setsockopt_auth_chunk(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authchunk val; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen != sizeof(struct sctp_authchunk)) @@ -3338,7 +3338,7 @@ static int sctp_setsockopt_auth_chunk(struct sock *sk, } /* add this chunk id to the endpoint */ - return sctp_auth_ep_add_chunkid(sctp_sk(sk)->ep, val.sauth_chunk); + return sctp_auth_ep_add_chunkid(ep, val.sauth_chunk); } /* @@ -3351,12 +3351,12 @@ static int sctp_setsockopt_hmac_ident(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_hmacalgo *hmacs; u32 idents; int err; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen < sizeof(struct sctp_hmacalgo)) @@ -3373,7 +3373,7 @@ static int sctp_setsockopt_hmac_ident(struct sock *sk, goto out; } - err = sctp_auth_ep_set_hmacs(sctp_sk(sk)->ep, hmacs); + err = sctp_auth_ep_set_hmacs(ep, hmacs); out: kfree(hmacs); return err; @@ -3389,12 +3389,12 @@ static int sctp_setsockopt_auth_key(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkey *authkey; struct sctp_association *asoc; int ret; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen <= sizeof(struct sctp_authkey)) @@ -3415,7 +3415,7 @@ static int sctp_setsockopt_auth_key(struct sock *sk, goto out; } - ret = sctp_auth_set_key(sctp_sk(sk)->ep, asoc, authkey); + ret = sctp_auth_set_key(ep, asoc, authkey); out: kzfree(authkey); return ret; @@ -3431,11 +3431,11 @@ static int sctp_setsockopt_active_key(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkeyid val; struct sctp_association *asoc; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen != sizeof(struct sctp_authkeyid)) @@ -3447,8 +3447,7 @@ static int sctp_setsockopt_active_key(struct sock *sk, if (!asoc && val.scact_assoc_id && sctp_style(sk, UDP)) return -EINVAL; - return sctp_auth_set_active_key(sctp_sk(sk)->ep, asoc, - val.scact_keynumber); + return sctp_auth_set_active_key(ep, asoc, val.scact_keynumber); } /* @@ -3460,11 +3459,11 @@ static int sctp_setsockopt_del_key(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkeyid val; struct sctp_association *asoc; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen != sizeof(struct sctp_authkeyid)) @@ -3476,8 +3475,7 @@ static int sctp_setsockopt_del_key(struct sock *sk, if (!asoc && val.scact_assoc_id && sctp_style(sk, UDP)) return -EINVAL; - return sctp_auth_del_key_id(sctp_sk(sk)->ep, asoc, - val.scact_keynumber); + return sctp_auth_del_key_id(ep, asoc, val.scact_keynumber); } @@ -5368,16 +5366,16 @@ static int sctp_getsockopt_maxburst(struct sock *sk, int len, static int sctp_getsockopt_hmac_ident(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_hmacalgo __user *p = (void __user *)optval; struct sctp_hmac_algo_param *hmacs; __u16 data_len = 0; u32 num_idents; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; - hmacs = sctp_sk(sk)->ep->auth_hmacs_list; + hmacs = ep->auth_hmacs_list; data_len = ntohs(hmacs->param_hdr.length) - sizeof(sctp_paramhdr_t); if (len < sizeof(struct sctp_hmacalgo) + data_len) @@ -5398,11 +5396,11 @@ static int sctp_getsockopt_hmac_ident(struct sock *sk, int len, static int sctp_getsockopt_active_key(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkeyid val; struct sctp_association *asoc; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (len < sizeof(struct sctp_authkeyid)) @@ -5417,7 +5415,7 @@ static int sctp_getsockopt_active_key(struct sock *sk, int len, if (asoc) val.scact_keynumber = asoc->active_key_id; else - val.scact_keynumber = sctp_sk(sk)->ep->active_key_id; + val.scact_keynumber = ep->active_key_id; len = sizeof(struct sctp_authkeyid); if (put_user(len, optlen)) @@ -5431,7 +5429,7 @@ static int sctp_getsockopt_active_key(struct sock *sk, int len, static int sctp_getsockopt_peer_auth_chunks(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authchunks __user *p = (void __user *)optval; struct sctp_authchunks val; struct sctp_association *asoc; @@ -5439,7 +5437,7 @@ static int sctp_getsockopt_peer_auth_chunks(struct sock *sk, int len, u32 num_chunks = 0; char __user *to; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (len < sizeof(struct sctp_authchunks)) @@ -5475,7 +5473,7 @@ num: static int sctp_getsockopt_local_auth_chunks(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authchunks __user *p = (void __user *)optval; struct sctp_authchunks val; struct sctp_association *asoc; @@ -5483,7 +5481,7 @@ static int sctp_getsockopt_local_auth_chunks(struct sock *sk, int len, u32 num_chunks = 0; char __user *to; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (len < sizeof(struct sctp_authchunks)) @@ -5500,7 +5498,7 @@ static int sctp_getsockopt_local_auth_chunks(struct sock *sk, int len, if (asoc) ch = (struct sctp_chunks_param*)asoc->c.auth_chunks; else - ch = sctp_sk(sk)->ep->auth_chunk_list; + ch = ep->auth_chunk_list; if (!ch) goto num; diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index bf3c6e8..fe0ba74 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -65,8 +65,11 @@ extern int sysctl_sctp_wmem[3]; static int proc_sctp_do_hmac_alg(ctl_table *ctl, int write, void __user *buffer, size_t *lenp, - loff_t *ppos); +static int proc_sctp_do_auth(struct ctl_table *ctl, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); + static ctl_table sctp_table[] = { { .procname = "sctp_mem", @@ -267,7 +270,7 @@ static ctl_table sctp_net_table[] = { .data = &init_net.sctp.auth_enable, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_sctp_do_auth, }, { .procname = "addr_scope_policy", @@ -348,6 +351,37 @@ static int proc_sctp_do_hmac_alg(ctl_table *ctl, return ret; } +static int proc_sctp_do_auth(struct ctl_table *ctl, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + struct net *net = current->nsproxy->net_ns; + struct ctl_table tbl; + int new_value, ret; + + memset(&tbl, 0, sizeof(struct ctl_table)); + tbl.maxlen = sizeof(unsigned int); + + if (write) + tbl.data = &new_value; + else + tbl.data = &net->sctp.auth_enable; + + ret = proc_dointvec(&tbl, write, buffer, lenp, ppos); + + if (write) { + struct sock *sk = net->sctp.ctl_sock; + + net->sctp.auth_enable = new_value; + /* Update the value in the control socket */ + lock_sock(sk); + sctp_sk(sk)->ep->auth_enable = new_value; + release_sock(sk); + } + + return ret; +} + int sctp_sysctl_net_register(struct net *net) { struct ctl_table *table; -- 1.9.3 From 820cc7808bab4099ddfb1c883b07819ce55e5082 Mon Sep 17 00:00:00 2001 From: Andrew Lutomirski <luto@xxxxxxxxxxxxxx> Date: Wed, 16 Apr 2014 21:41:34 -0700 Subject: [PATCH 18/49] net: Fix ns_capable check in sock_diag_put_filterinfo [ Upstream commit 78541c1dc60b65ecfce5a6a096fc260219d6784e ] The caller needs capabilities on the namespace being queried, not on their own namespace. This is a security bug, although it likely has only a minor impact. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> Acked-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/sock_diag.h | 2 +- net/core/sock_diag.c | 4 ++-- net/packet/diag.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h index 54f91d3..302ab80 100644 --- a/include/linux/sock_diag.h +++ b/include/linux/sock_diag.h @@ -23,7 +23,7 @@ int sock_diag_check_cookie(void *sk, __u32 *cookie); void sock_diag_save_cookie(void *sk, __u32 *cookie); int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attr); -int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk, +int sock_diag_put_filterinfo(struct sock *sk, struct sk_buff *skb, int attrtype); #endif diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c index a0e9cf6..6a7fae2 100644 --- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -49,7 +49,7 @@ int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attrtype) } EXPORT_SYMBOL_GPL(sock_diag_put_meminfo); -int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk, +int sock_diag_put_filterinfo(struct sock *sk, struct sk_buff *skb, int attrtype) { struct nlattr *attr; @@ -57,7 +57,7 @@ int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk, unsigned int len; int err = 0; - if (!ns_capable(user_ns, CAP_NET_ADMIN)) { + if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { nla_reserve(skb, attrtype, 0); return 0; } diff --git a/net/packet/diag.c b/net/packet/diag.c index a9584a2..ec8b6e8 100644 --- a/net/packet/diag.c +++ b/net/packet/diag.c @@ -171,7 +171,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, goto out_nlmsg_trim; if ((req->pdiag_show & PACKET_SHOW_FILTER) && - sock_diag_put_filterinfo(user_ns, sk, skb, PACKET_DIAG_FILTER)) + sock_diag_put_filterinfo(sk, skb, PACKET_DIAG_FILTER)) goto out_nlmsg_trim; return nlmsg_end(skb, nlh); -- 1.9.3 From b6ef1fc450b69e270e782f6e313ec33d40a3def2 Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:35 +1000 Subject: [PATCH 19/49] rtnetlink: Warn when interface's information won't fit in our packet [ Upstream commit 973462bbde79bb827824c73b59027a0aed5c9ca6 ] Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jpirko@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 4c3087d..5bfec28 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1039,6 +1039,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_head *head; struct nlattr *tb[IFLA_MAX+1]; u32 ext_filter_mask = 0; + int err; s_h = cb->args[0]; s_idx = cb->args[1]; @@ -1059,11 +1060,17 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; - if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, 0, - NLM_F_MULTI, - ext_filter_mask) <= 0) + err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, 0, + NLM_F_MULTI, + ext_filter_mask); + /* If we ran out of room on the first message, + * we're in trouble + */ + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); + + if (err <= 0) goto out; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); -- 1.9.3 From efc39fa5d0cb6c4fe703f9522c9c8d5e6351eb4e Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:36 +1000 Subject: [PATCH 20/49] rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set [ Upstream commit c53864fd60227de025cb79e05493b13f69843971 ] Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jiri@xxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 5bfec28..87ec574 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -714,7 +714,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, return 0; } -static size_t rtnl_port_size(const struct net_device *dev) +static size_t rtnl_port_size(const struct net_device *dev, + u32 ext_filter_mask) { size_t port_size = nla_total_size(4) /* PORT_VF */ + nla_total_size(PORT_PROFILE_MAX) /* PORT_PROFILE */ @@ -730,7 +731,8 @@ static size_t rtnl_port_size(const struct net_device *dev) size_t port_self_size = nla_total_size(sizeof(struct nlattr)) + port_size; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; if (dev_num_vf(dev->dev.parent)) return port_self_size + vf_ports_size + @@ -765,7 +767,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(ext_filter_mask & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */ + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ - + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ + rtnl_link_get_af_size(dev); /* IFLA_AF_SPEC */ } @@ -826,11 +828,13 @@ static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev) return 0; } -static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev) +static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev, + u32 ext_filter_mask) { int err; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; err = rtnl_port_self_fill(skb, dev); @@ -985,7 +989,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, nla_nest_end(skb, vfinfo); } - if (rtnl_port_fill(skb, dev)) + if (rtnl_port_fill(skb, dev, ext_filter_mask)) goto nla_put_failure; if (dev->rtnl_link_ops) { -- 1.9.3 From 05867665b1d4d1376b83c36b1f836d5e75dda742 Mon Sep 17 00:00:00 2001 From: Kumar Sundararajan <kumar@xxxxxx> Date: Thu, 24 Apr 2014 09:48:53 -0400 Subject: [PATCH 21/49] ipv6: fib: fix fib dump restart [ Upstream commit 1c2658545816088477e91860c3a645053719cb54 ] When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@xxxxxx> Signed-off-by: Chris Mason <clm@xxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_fib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 9c06ecb..009c962 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1418,7 +1418,7 @@ static int fib6_walk_continue(struct fib6_walker_t *w) if (w->skip) { w->skip--; - continue; + goto skip; } err = w->func(w); @@ -1428,6 +1428,7 @@ static int fib6_walk_continue(struct fib6_walker_t *w) w->count++; continue; } +skip: w->state = FWS_U; case FWS_U: if (fn == w->root) -- 1.9.3 From 794bbcedf3bd98bce4232310cfb4349239e7c32d Mon Sep 17 00:00:00 2001 From: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Date: Fri, 25 Apr 2014 17:01:18 +0900 Subject: [PATCH 22/49] bridge: Handle IFLA_ADDRESS correctly when creating bridge device [ Upstream commit 30313a3d5794472c3548d7288e306a5492030370 ] When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@xxxxxxx> Signed-off-by: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/bridge/br_netlink.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 06873e8..f16e9e4 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -438,6 +438,20 @@ static int br_validate(struct nlattr *tb[], struct nlattr *data[]) return 0; } +static int br_dev_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct net_bridge *br = netdev_priv(dev); + + if (tb[IFLA_ADDRESS]) { + spin_lock_bh(&br->lock); + br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS])); + spin_unlock_bh(&br->lock); + } + + return register_netdevice(dev); +} + static size_t br_get_link_af_size(const struct net_device *dev) { struct net_port_vlans *pv; @@ -466,6 +480,7 @@ struct rtnl_link_ops br_link_ops __read_mostly = { .priv_size = sizeof(struct net_bridge), .setup = br_dev_setup, .validate = br_validate, + .newlink = br_dev_newlink, .dellink = br_dev_delete, }; -- 1.9.3 From e1e3e63c3a6da0f29eb7a74fea004c385d7c6f70 Mon Sep 17 00:00:00 2001 From: Xufeng Zhang <xufeng.zhang@xxxxxxxxxxxxx> Date: Fri, 25 Apr 2014 16:55:41 +0800 Subject: [PATCH 23/49] sctp: reset flowi4_oif parameter on route lookup [ Upstream commit 85350871317a5adb35519d9dc6fc9e80809d42ad ] commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit e6b45241c (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang <xufeng.zhang@xxxxxxxxxxxxx> Signed-off-by: Julian Anastasov <ja@xxxxxx> Acked-by: Vlad Yasevich <vyasevich@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/protocol.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index eaee00c..5a3c1c0 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -498,8 +498,13 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr, continue; if ((laddr->state == SCTP_ADDR_SRC) && (AF_INET == laddr->a.sa.sa_family)) { - fl4->saddr = laddr->a.v4.sin_addr.s_addr; fl4->fl4_sport = laddr->a.v4.sin_port; + flowi4_update_output(fl4, + asoc->base.sk->sk_bound_dev_if, + RT_CONN_FLAGS(asoc->base.sk), + daddr->v4.sin_addr.s_addr, + laddr->a.v4.sin_addr.s_addr); + rt = ip_route_output_key(sock_net(sk), fl4); if (!IS_ERR(rt)) { dst = &rt->dst; -- 1.9.3 From 849be8d1250497c7a98d34a687b4960fba64d33c Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Tue, 29 Apr 2014 10:09:51 -0400 Subject: [PATCH 24/49] Revert "macvlan : fix checksums error when we are in bridge mode" [ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ] This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019. The commit above doesn't appear to be necessary any more as the checksums appear to be correctly computed/validated. Additionally the above commit breaks kvm configurations where one VM is using a device that support checksum offload (virtio) and the other VM does not. In this case, packets leaving virtio device will have CHECKSUM_PARTIAL set. The packets is forwarded to a macvtap that has offload features turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not update the checksum and thus a bad checksum is passed up to the guest. CC: Daniel Lezcano <daniel.lezcano@xxxxxxx> CC: Patrick McHardy <kaber@xxxxxxxxx> CC: Andrian Nord <nightnord@xxxxxxxxx> CC: Eric Dumazet <eric.dumazet@xxxxxxxxx> CC: Michael S. Tsirkin <mst@xxxxxxxxxx> CC: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> Acked-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 06eba6e..6ce640d 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -261,11 +261,9 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) const struct macvlan_dev *vlan = netdev_priv(dev); const struct macvlan_port *port = vlan->port; const struct macvlan_dev *dest; - __u8 ip_summed = skb->ip_summed; if (vlan->mode == MACVLAN_MODE_BRIDGE) { const struct ethhdr *eth = (void *)skb->data; - skb->ip_summed = CHECKSUM_UNNECESSARY; /* send to other bridge ports directly */ if (is_multicast_ether_addr(eth->h_dest)) { @@ -283,7 +281,6 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) } xmit_world: - skb->ip_summed = ip_summed; skb->dev = vlan->lowerdev; return dev_queue_xmit(skb); } -- 1.9.3 From 688d0b36bc66d0eeed823ab028a24bbc453e7152 Mon Sep 17 00:00:00 2001 From: Liu Yu <allanyuliu@xxxxxxxxxxx> Date: Wed, 30 Apr 2014 17:34:09 +0800 Subject: [PATCH 25/49] tcp_cubic: fix the range of delayed_ack [ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ] commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@xxxxxxxxxx> Signed-off-by: Liu Yu <allanyuliu@xxxxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: Neal Cardwell <ncardwell@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/tcp_cubic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index b6ae92a..894b7ce 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -408,7 +408,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT; ratio += cnt; - ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT); + ca->delayed_ack = clamp(ratio, 1U, ACK_RATIO_LIMIT); } /* Some calls are for duplicates without timetamps */ -- 1.9.3 From 95e1cbf02a4beccb1a1fd6baa4e73939a8778b96 Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Sun, 4 May 2014 23:24:31 +0200 Subject: [PATCH 26/49] net: ipv4: ip_forward: fix inverted local_df test [ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ] local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_forward.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index 98d7e53..bd1c5ba 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -42,12 +42,12 @@ static bool ip_may_fragment(const struct sk_buff *skb) { return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) || - !skb->local_df; + skb->local_df; } static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) { - if (skb->len <= mtu || skb->local_df) + if (skb->len <= mtu) return false; if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) -- 1.9.3 From 40f82fb235ff778a7800fe8996d4697887242a7b Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Mon, 5 May 2014 00:03:34 +0200 Subject: [PATCH 27/49] net: ipv6: send pkttoobig immediately if orig frag size > mtu [ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ] If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_output.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 32fb114..ffa8d29 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -347,12 +347,16 @@ static inline int ip6_forward_finish(struct sk_buff *skb) static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) { - if (skb->len <= mtu || skb->local_df) + if (skb->len <= mtu) return false; + /* ipv6 conntrack defrag sets max_frag_size + local_df */ if (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu) return true; + if (skb->local_df) + return false; + if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) return false; -- 1.9.3 From 0e2f0b16c09d4e96fc141acf78103a79bdaca21e Mon Sep 17 00:00:00 2001 From: Sergey Popovich <popovich_sergei@xxxxxxx> Date: Tue, 6 May 2014 18:23:08 +0300 Subject: [PATCH 28/49] ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation [ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ] Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich <popovich_sergei@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/fib_semantics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 8f6cb7a..9c3979a5 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -818,13 +818,13 @@ struct fib_info *fib_create_info(struct fib_config *cfg) fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL); if (fi == NULL) goto failure; + fib_info_cnt++; if (cfg->fc_mx) { fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL); if (!fi->fib_metrics) goto failure; } else fi->fib_metrics = (u32 *) dst_default_metrics; - fib_info_cnt++; fi->fib_net = hold_net(net); fi->fib_protocol = cfg->fc_protocol; -- 1.9.3 From e974bc7ffa69ff752c10dcfd2e3acb5da5a27d17 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 9 May 2014 14:45:00 +0200 Subject: [PATCH 29/49] net: cdc_mbim: handle unaccelerated VLAN tagged frames MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 6b5eeb7f874b689403e52a646e485d0191ab9507 ] This driver maps 802.1q VLANs to MBIM sessions. The mapping is based on a bogus assumption that all tagged frames will use the acceleration API because we enable NETIF_F_HW_VLAN_CTAG_TX. This fails for e.g. frames tagged in userspace using packet sockets. Such frames will erroneously be considered as untagged and silently dropped based on not being IP. Fix by falling back to looking into the ethernet header for a tag if no accelerated tag was found. Fixes: a82c7ce5bc5b ("net: cdc_ncm: map MBIM IPS SessionID to VLAN ID") Cc: Greg Suarez <gsuarez@xxxxxxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/cdc_mbim.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c index 25ba7ec..7cabe45 100644 --- a/drivers/net/usb/cdc_mbim.c +++ b/drivers/net/usb/cdc_mbim.c @@ -120,6 +120,16 @@ static void cdc_mbim_unbind(struct usbnet *dev, struct usb_interface *intf) cdc_ncm_unbind(dev, intf); } +/* verify that the ethernet protocol is IPv4 or IPv6 */ +static bool is_ip_proto(__be16 proto) +{ + switch (proto) { + case htons(ETH_P_IP): + case htons(ETH_P_IPV6): + return true; + } + return false; +} static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags) { @@ -128,6 +138,7 @@ static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb struct cdc_ncm_ctx *ctx = info->ctx; __le32 sign = cpu_to_le32(USB_CDC_MBIM_NDP16_IPS_SIGN); u16 tci = 0; + bool is_ip; u8 *c; if (!ctx) @@ -137,25 +148,32 @@ static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb if (skb->len <= ETH_HLEN) goto error; + /* Some applications using e.g. packet sockets will + * bypass the VLAN acceleration and create tagged + * ethernet frames directly. We primarily look for + * the accelerated out-of-band tag, but fall back if + * required + */ + skb_reset_mac_header(skb); + if (vlan_get_tag(skb, &tci) < 0 && skb->len > VLAN_ETH_HLEN && + __vlan_get_tag(skb, &tci) == 0) { + is_ip = is_ip_proto(vlan_eth_hdr(skb)->h_vlan_encapsulated_proto); + skb_pull(skb, VLAN_ETH_HLEN); + } else { + is_ip = is_ip_proto(eth_hdr(skb)->h_proto); + skb_pull(skb, ETH_HLEN); + } + /* mapping VLANs to MBIM sessions: * no tag => IPS session <0> * 1 - 255 => IPS session <vlanid> * 256 - 511 => DSS session <vlanid - 256> * 512 - 4095 => unsupported, drop */ - vlan_get_tag(skb, &tci); - switch (tci & 0x0f00) { case 0x0000: /* VLAN ID 0 - 255 */ - /* verify that datagram is IPv4 or IPv6 */ - skb_reset_mac_header(skb); - switch (eth_hdr(skb)->h_proto) { - case htons(ETH_P_IP): - case htons(ETH_P_IPV6): - break; - default: + if (!is_ip) goto error; - } c = (u8 *)&sign; c[3] = tci; break; @@ -169,7 +187,6 @@ static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb "unsupported tci=0x%04x\n", tci); goto error; } - skb_pull(skb, ETH_HLEN); } spin_lock_bh(&ctx->mtx); -- 1.9.3 From bec04625882a5c060454431eed0bd260f455f314 Mon Sep 17 00:00:00 2001 From: Peter Christensen <pch@xxxxxxxxxxxx> Date: Thu, 8 May 2014 11:15:37 +0200 Subject: [PATCH 30/49] macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces. [ Upstream commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 ] Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Peter Christensen <pch@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 6ce640d..c12aeae 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -420,8 +420,10 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change) struct macvlan_dev *vlan = netdev_priv(dev); struct net_device *lowerdev = vlan->lowerdev; - if (change & IFF_ALLMULTI) - dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); + if (dev->flags & IFF_UP) { + if (change & IFF_ALLMULTI) + dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); + } } static void macvlan_set_mac_lists(struct net_device *dev) -- 1.9.3 From 0db80db9845ef03ef9c51e4996701b8f2b88922f Mon Sep 17 00:00:00 2001 From: Susant Sahani <susant@xxxxxxxxxx> Date: Sat, 10 May 2014 00:11:32 +0530 Subject: [PATCH 31/49] ip6_tunnel: fix potential NULL pointer dereference [ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ] The function ip6_tnl_validate assumes that the rtnl attribute IFLA_IPTUN_PROTO always be filled . If this attribute is not filled by the userspace application kernel get crashed with NULL pointer dereference. This patch fixes the potential kernel crash when IFLA_IPTUN_PROTO is missing . Signed-off-by: Susant Sahani <susant@xxxxxxxxxx> Acked-by: Thomas Graf <tgraf@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_tunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index f21cf47..73d7f68 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1531,7 +1531,7 @@ static int ip6_tnl_validate(struct nlattr *tb[], struct nlattr *data[]) { u8 proto; - if (!data) + if (!data || !data[IFLA_IPTUN_PROTO]) return 0; proto = nla_get_u8(data[IFLA_IPTUN_PROTO]); -- 1.9.3 From 2863fa707db6cdd8a75ba359ddd30b2e7e6970b9 Mon Sep 17 00:00:00 2001 From: Li RongQing <roy.qing.li@xxxxxxxxx> Date: Thu, 22 May 2014 16:36:55 +0800 Subject: [PATCH 32/49] ipv4: initialise the itag variable in __mkroute_input [ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ] the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing <roy.qing.li@xxxxxxxxx> Acked-by: Alexei Starovoitov <ast@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index c850647..7256eef 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1478,7 +1478,7 @@ static int __mkroute_input(struct sk_buff *skb, struct in_device *out_dev; unsigned int flags = 0; bool do_cache; - u32 itag; + u32 itag = 0; /* get a working reference to the output device */ out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res)); -- 1.9.3 From 25f298d376c13ce1bd09e68d7382a68e3c457de2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 3 Apr 2014 09:28:10 -0700 Subject: [PATCH 33/49] net-gro: reset skb->truesize in napi_reuse_skb() [ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/dev.c b/net/core/dev.c index 5bf26c4..56383a3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3898,6 +3898,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->vlan_tci = 0; skb->dev = napi->dev; skb->skb_iif = 0; + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); napi->skb = skb; } -- 1.9.3 From ba3840894e0aa0d58735bc64ee0cdb49908430fa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 28 Jun 2013 17:17:49 +0200 Subject: [PATCH 34/49] net: qmi_wwan: fixup Sierra Wireless MC8305 entry MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 5a008ffa73b4401251d548c10cadac6f8a67cfb5 ] The MC8305 module got an additional entry added based solely on information from a Windows driver *.inf file. We now have the actual descriptor layout from one of these modules, and it consists of two alternate configurations where cfg #1 is a normal Gobi 2k layout and cfg #2 is MBIM only, using interface numbers 5 and 6 for MBIM control and data. The extra Windows driver entry for interface number 5 was most likely a bug. Deleting the bogus entry to avoid unnecessary qmi_wwan probe failures when using the MBIM configuration. Reported-by: Lana Black <sickmind@xxxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 37d9785..8bccb58 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -760,7 +760,6 @@ static const struct usb_device_id products[] = { {QMI_GOBI_DEVICE(0x1199, 0x9009)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {QMI_GOBI_DEVICE(0x1199, 0x900a)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ {QMI_GOBI_DEVICE(0x1199, 0x9011)}, /* Sierra Wireless Gobi 2000 Modem device (MC8305) */ - {QMI_FIXED_INTF(0x1199, 0x9011, 5)}, /* alternate interface number!? */ {QMI_GOBI_DEVICE(0x16d8, 0x8002)}, /* CMDTech Gobi 2000 Modem device (VU922) */ {QMI_GOBI_DEVICE(0x05c6, 0x9205)}, /* Gobi 2000 Modem device */ {QMI_GOBI_DEVICE(0x1199, 0x9013)}, /* Sierra Wireless Gobi 3000 Modem device (MC8355) */ -- 1.9.3 From 1ab6866fd0eeb3dfa230e37a7792467a766a582f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 28 Jun 2013 17:17:50 +0200 Subject: [PATCH 35/49] net: qmi_wwan: add Option GTM681W MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit aa3aba1cbcbe823bba623c7cab33d84ddf0fb6cd ] A standard Gobi 3000 reference design module. Reported-by: Richard Weinberger <richard@xxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 8bccb58..d35bd1d 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -747,6 +747,7 @@ static const struct usb_device_id products[] = { {QMI_GOBI_DEVICE(0x05c6, 0x9265)}, /* Asus Gobi 2000 Modem device (VR305) */ {QMI_GOBI_DEVICE(0x05c6, 0x9235)}, /* Top Global Gobi 2000 Modem device (VR306) */ {QMI_GOBI_DEVICE(0x05c6, 0x9275)}, /* iRex Technologies Gobi 2000 Modem device (VR307) */ + {QMI_GOBI_DEVICE(0x0af0, 0x8120)}, /* Option GTM681W */ {QMI_GOBI_DEVICE(0x1199, 0x68a5)}, /* Sierra Wireless Modem */ {QMI_GOBI_DEVICE(0x1199, 0x68a9)}, /* Sierra Wireless Modem */ {QMI_GOBI_DEVICE(0x1199, 0x9001)}, /* Sierra Wireless Gobi 2000 Modem device (VT773) */ -- 1.9.3 From 89c0430d044d3050d939c21143e9385a35ca760a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 28 Jun 2013 17:17:51 +0200 Subject: [PATCH 36/49] net: qmi_wwan: add TP-LINK MA260 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit d0b5e516298fba30774e2df22cfbd00ecb09c298 ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index d35bd1d..6d51dac 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -712,6 +712,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ + {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x1e2d, 0x12d1, 4)}, /* Cinterion PLxx */ -- 1.9.3 From 1463a68799bec39c01acef8a3a9198ff756ded99 Mon Sep 17 00:00:00 2001 From: Enrico Mioso <mrkiko.rs@xxxxxxxxx> Date: Sat, 29 Jun 2013 15:39:19 +0200 Subject: [PATCH 37/49] qmi_wwan: add ONDA MT689DC device ID (fwd) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit d8eb8f9963a55ccf6ebafa4bfdb9f70c17067825 ] Another QMI-speaking device by ZTE, re-branded by ONDA! I'm connected ovr this device's QMI interface right now, so I can say I tested it! :) Note: a follow-up patch was posted to the linux-usb mailing list, to prevent the option driver from binding to the device's QMI interface, making it unusable. Signed-off-by: Enrico Mioso <mrkiko.rs@xxxxxxxxx> Acked-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 6d51dac..7a38ea6 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -652,6 +652,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x19d2, 0x0002, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0012, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0017, 3)}, + {QMI_FIXED_INTF(0x19d2, 0x0019, 3)}, /* ONDA MT689DC */ {QMI_FIXED_INTF(0x19d2, 0x0021, 4)}, {QMI_FIXED_INTF(0x19d2, 0x0025, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0031, 4)}, -- 1.9.3 From c70d7f1a8c23fb10f45f5e9b8970d23c52f86823 Mon Sep 17 00:00:00 2001 From: Fabio Porcedda <fabio.porcedda@xxxxxxxxx> Date: Wed, 25 Sep 2013 11:21:25 +0200 Subject: [PATCH 38/49] net: qmi_wwan: add Telit LE920 newer firmware support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 905468fa4d54c3e572ed3045cd47cce37780716e ] Newer firmware use a new pid and a different interface. Signed-off-by: Fabio Porcedda <fabio.porcedda@xxxxxxxxx> Acked-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 7a38ea6..189e450 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -715,6 +715,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ + {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x1e2d, 0x12d1, 4)}, /* Cinterion PLxx */ /* 4. Gobi 1000 devices */ -- 1.9.3 From a857613a6e91d43f24929a23145ece4bda14b804 Mon Sep 17 00:00:00 2001 From: Aleksander Morgado <aleksander@xxxxxxxxxx> Date: Wed, 25 Sep 2013 17:02:36 +0200 Subject: [PATCH 39/49] net: qmi_wwan: fix Cinterion PLXX product ID MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 2d77f343343c4f38b8f94be1964bbbc6456a147f ] Cinterion PLXX LTE devices have a 0x0060 product ID, not 0x12d1. The blacklisting in the serial/option driver does actually use the correct PID, as per commit 8ff10bdb14a52e3f25d4ce09e0582a8684c1a6db ('USB: Blacklisted Cinterion's PLxx WWAN Interface'). CC: Hans-Christoph Schemmel <hans-christoph.schemmel@xxxxxxxxxxx> CC: Christian Schmiedl <christian.schmiedl@xxxxxxxxxxx> CC: Nicolaus Colberg <nicolaus.colberg@xxxxxxxxxxx> Signed-off-by: Aleksander Morgado <aleksander@xxxxxxxxxx> Acked-by: Bjørn Mork <bjorn@xxxxxxx> Acked-by: Christian Schmiedl <christian.schmiedl@xxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 189e450..24a0e9f 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -716,7 +716,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */ - {QMI_FIXED_INTF(0x1e2d, 0x12d1, 4)}, /* Cinterion PLxx */ + {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ -- 1.9.3 From b3295c89865023e6ab7339a1f8631db367feb465 Mon Sep 17 00:00:00 2001 From: Enrico Mioso <mrkiko.rs@xxxxxxxxx> Date: Tue, 15 Oct 2013 15:06:48 +0200 Subject: [PATCH 40/49] net: qmi_wwan: Olivetti Olicard 200 support [ Upstream commit ce97fef4235378108ed3bd96e1b3eab8fd0a1fbd ] This is a QMI device, manufactured by TCT Mobile Phones. A companion patch blacklisting this device's QMI interface in the option.c driver has been sent. Signed-off-by: Enrico Mioso <mrkiko.rs@xxxxxxxxx> Signed-off-by: Antonella Pellizzari <anto.pellizzari83@xxxxxxxxx> Tested-by: Dan Williams <dcbw@xxxxxxxxxx> Acked-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 24a0e9f..d6ef9b5 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -716,6 +716,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */ + {QMI_FIXED_INTF(0x0b3c, 0xc005, 6)}, /* Olivetti Olicard 200 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ /* 4. Gobi 1000 devices */ -- 1.9.3 From 9ae6c13addb0665f9009f5b24262a72f4342e3f0 Mon Sep 17 00:00:00 2001 From: Raymond Wanyoike <raymond.wanyoike@xxxxxxxxx> Date: Sun, 9 Feb 2014 00:01:02 +0300 Subject: [PATCH 41/49] net: qmi_wwan: add ZTE MF667 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 7653aabfbdc73c1567e29a9790701f5898ba1420 ] The driver description files give these descriptions to the vendor specific ports on this modem: VID_19D2&PID_1270&MI_00: "ZTE MF667 Diagnostics Port" VID_19D2&PID_1270&MI_01: "ZTE MF667 AT Port" VID_19D2&PID_1270&MI_02: "ZTE MF667 ATExt2 Port" VID_19D2&PID_1270&MI_03: "ZTE MF667 ATExt Port" VID_19D2&PID_1270&MI_04: "ZTE MF667 USB Modem" VID_19D2&PID_1270&MI_05: "ZTE MF667 Network Adapter" Signed-off-by: Raymond Wanyoike <raymond.wanyoike@xxxxxxxxx> Acked-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index d6ef9b5..e059192 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -699,6 +699,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x19d2, 0x1255, 3)}, {QMI_FIXED_INTF(0x19d2, 0x1255, 4)}, {QMI_FIXED_INTF(0x19d2, 0x1256, 4)}, + {QMI_FIXED_INTF(0x19d2, 0x1270, 5)}, /* ZTE MF667 */ {QMI_FIXED_INTF(0x19d2, 0x1401, 2)}, {QMI_FIXED_INTF(0x19d2, 0x1402, 2)}, /* ZTE MF60 */ {QMI_FIXED_INTF(0x19d2, 0x1424, 2)}, -- 1.9.3 From c64ee109235a2ec3029e92ec2a36563cf3304faa Mon Sep 17 00:00:00 2001 From: Aleksander Morgado <aleksander@xxxxxxxxxxxxx> Date: Wed, 12 Feb 2014 15:55:14 +0100 Subject: [PATCH 42/49] net: qmi_wwan: add support for Cinterion PXS8 and PHS8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9b2b6a2d669c909dd0b125fc834da94bcfc0aee7 ] When the PXS8 and PHS8 devices show up with PID 0x0053 they will expose both a QMI port and a WWAN interface. CC: Hans-Christoph Schemmel <hans-christoph.schemmel@xxxxxxxxxxx> CC: Christian Schmiedl <christian.schmiedl@xxxxxxxxxxx> CC: Nicolaus Colberg <nicolaus.colberg@xxxxxxxxxxx> CC: David McCullough <david.mccullough@xxxxxxxxxxxxx> Signed-off-by: Aleksander Morgado <aleksander@xxxxxxxxxxxxx> Acked-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index e059192..5aad568 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -719,6 +719,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x0b3c, 0xc005, 6)}, /* Olivetti Olicard 200 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ + {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)}, /* Cinterion PHxx,PXxx */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ -- 1.9.3 From 4ae7cf4f1ab9094f820f467be8659bee2ca255f1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:28 +0200 Subject: [PATCH 43/49] net: qmi_wwan: add Sierra Wireless EM7355 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b85f5deaf052340021d025e120a9858f084a1d79 ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 5aad568..16ab16f 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -711,6 +711,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x68a2, 8)}, /* Sierra Wireless MC7710 in QMI mode */ {QMI_FIXED_INTF(0x1199, 0x68a2, 19)}, /* Sierra Wireless MC7710 in QMI mode */ {QMI_FIXED_INTF(0x1199, 0x901c, 8)}, /* Sierra Wireless EM7700 */ + {QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ -- 1.9.3 From 4c647357e139edfc68d280e84ebe5cc74a47c225 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:29 +0200 Subject: [PATCH 44/49] net: qmi_wwan: add Sierra Wireless MC73xx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 1c138607a7be64074d7fba68d0d533ec38f9d17b ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 16ab16f..dc365e8 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -710,6 +710,9 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */ {QMI_FIXED_INTF(0x1199, 0x68a2, 8)}, /* Sierra Wireless MC7710 in QMI mode */ {QMI_FIXED_INTF(0x1199, 0x68a2, 19)}, /* Sierra Wireless MC7710 in QMI mode */ + {QMI_FIXED_INTF(0x1199, 0x68c0, 8)}, /* Sierra Wireless MC73xx */ + {QMI_FIXED_INTF(0x1199, 0x68c0, 10)}, /* Sierra Wireless MC73xx */ + {QMI_FIXED_INTF(0x1199, 0x68c0, 11)}, /* Sierra Wireless MC73xx */ {QMI_FIXED_INTF(0x1199, 0x901c, 8)}, /* Sierra Wireless EM7700 */ {QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ -- 1.9.3 From 6d36829e165e35fb84618c6a7f8b3f79dadf2cb0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:30 +0200 Subject: [PATCH 45/49] net: qmi_wwan: add Sierra Wireless MC7305/MC7355 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9214224e43e4264b02686ea8b455f310935607b5 ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index dc365e8..8a2fbdb 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -715,6 +715,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x68c0, 11)}, /* Sierra Wireless MC73xx */ {QMI_FIXED_INTF(0x1199, 0x901c, 8)}, /* Sierra Wireless EM7700 */ {QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */ + {QMI_FIXED_INTF(0x1199, 0x9041, 8)}, /* Sierra Wireless MC7305/MC7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ -- 1.9.3 From 2d480ec1bfa1e73be6fcdd5d197e2051b24e1f2f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:31 +0200 Subject: [PATCH 46/49] net: qmi_wwan: add Olivetti Olicard 500 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit efc0b25c3add97717ece57bf5319792ca98f348e ] Device interface layout: 0: ff/ff/ff - serial 1: ff/ff/ff - serial AT+PPP 2: 08/06/50 - storage 3: ff/ff/ff - serial 4: ff/ff/ff - QMI/wwan Reported-by: Julio Araujo <julio.araujo@xxxxxxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 8a2fbdb..39aa428 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -723,6 +723,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x0b3c, 0xc005, 6)}, /* Olivetti Olicard 200 */ + {QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)}, /* Olivetti Olicard 500 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)}, /* Cinterion PHxx,PXxx */ -- 1.9.3 From 56463029d63ab7f610776797c5a6ad7514812fb4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:32 +0200 Subject: [PATCH 47/49] net: qmi_wwan: add Alcatel L800MA MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 75573660c47a0db7cc931dcf154945610e02130a ] Device interface layout: 0: ff/ff/ff - serial 1: ff/00/00 - serial AT+PPP 2: ff/ff/ff - QMI/wwan 3: 08/06/50 - storage Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 39aa428..283ad0c 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -718,6 +718,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x9041, 8)}, /* Sierra Wireless MC7305/MC7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ + {QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ -- 1.9.3 From f46640a59823bf56f2e33c15a8339c53385bdd66 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:33 +0200 Subject: [PATCH 48/49] net: qmi_wwan: add a number of CMOTech devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 41be7d90993b1502d445bfc59e58348c258ce66a ] A number of older CMOTech modems are based on Qualcomm chips and exporting a QMI/wwan function. Reported-by: Lars Melin <larsm17@xxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 283ad0c..bf25998 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -649,6 +649,22 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x05c6, 0x920d, 5)}, {QMI_FIXED_INTF(0x12d1, 0x140c, 1)}, /* Huawei E173 */ {QMI_FIXED_INTF(0x12d1, 0x14ac, 1)}, /* Huawei E1820 */ + {QMI_FIXED_INTF(0x16d8, 0x6003, 0)}, /* CMOTech 6003 */ + {QMI_FIXED_INTF(0x16d8, 0x6007, 0)}, /* CMOTech CHE-628S */ + {QMI_FIXED_INTF(0x16d8, 0x6008, 0)}, /* CMOTech CMU-301 */ + {QMI_FIXED_INTF(0x16d8, 0x6280, 0)}, /* CMOTech CHU-628 */ + {QMI_FIXED_INTF(0x16d8, 0x7001, 0)}, /* CMOTech CHU-720S */ + {QMI_FIXED_INTF(0x16d8, 0x7002, 0)}, /* CMOTech 7002 */ + {QMI_FIXED_INTF(0x16d8, 0x7003, 4)}, /* CMOTech CHU-629K */ + {QMI_FIXED_INTF(0x16d8, 0x7004, 3)}, /* CMOTech 7004 */ + {QMI_FIXED_INTF(0x16d8, 0x7006, 5)}, /* CMOTech CGU-629 */ + {QMI_FIXED_INTF(0x16d8, 0x700a, 4)}, /* CMOTech CHU-629S */ + {QMI_FIXED_INTF(0x16d8, 0x7211, 0)}, /* CMOTech CHU-720I */ + {QMI_FIXED_INTF(0x16d8, 0x7212, 0)}, /* CMOTech 7212 */ + {QMI_FIXED_INTF(0x16d8, 0x7213, 0)}, /* CMOTech 7213 */ + {QMI_FIXED_INTF(0x16d8, 0x7251, 1)}, /* CMOTech 7251 */ + {QMI_FIXED_INTF(0x16d8, 0x7252, 1)}, /* CMOTech 7252 */ + {QMI_FIXED_INTF(0x16d8, 0x7253, 1)}, /* CMOTech 7253 */ {QMI_FIXED_INTF(0x19d2, 0x0002, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0012, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0017, 3)}, -- 1.9.3 From ce11a902b87f1174cec953856a09d9133986214f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:34 +0200 Subject: [PATCH 49/49] net: qmi_wwan: add a number of Dell devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 6f10c5d1b1aeddb63d33070abb8bc5a177beeb1f ] Dan writes: "The Dell drivers use the same configuration for PIDs: 81A2: Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card 81A3: Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card 81A4: Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card 81A8: Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card 81A9: Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card These devices are all clearly Sierra devices, but are also definitely Gobi-based. The A8 might be the MC7700/7710 and A9 is likely a MC7750. >From DellGobi5kSetup.exe from the Dell drivers: usbif0: serial/firmware loader? usbif2: nmea usbif3: modem/ppp usbif8: net/QMI" Reported-by: AceLan Kao <acelan.kao@xxxxxxxxxxxxx> Reported-by: Dan Williams <dcbw@xxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index bf25998..7be4860 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -743,6 +743,11 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)}, /* Olivetti Olicard 500 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)}, /* Cinterion PHxx,PXxx */ + {QMI_FIXED_INTF(0x413c, 0x81a2, 8)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a3, 8)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a4, 8)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a8, 8)}, /* Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a9, 8)}, /* Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ -- 1.9.3
From ccc7d03d13b5f6d3ba7a9803adbbe0da60ef337d Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Tue, 8 Apr 2014 17:26:13 +0200 Subject: [PATCH 01/76] net: sctp: wake up all assocs if sndbuf policy is per socket [ Upstream commit 52c35befb69b005c3fc5afdaae3a5717ad013411 ] SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Thomas Graf <tgraf@xxxxxxx> Cc: Neil Horman <nhorman@xxxxxxxxxxxxx> Cc: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/socket.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 981aaf8..5f83a6a 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6593,6 +6593,40 @@ static void __sctp_write_space(struct sctp_association *asoc) } } +static void sctp_wake_up_waiters(struct sock *sk, + struct sctp_association *asoc) +{ + struct sctp_association *tmp = asoc; + + /* We do accounting for the sndbuf space per association, + * so we only need to wake our own association. + */ + if (asoc->ep->sndbuf_policy) + return __sctp_write_space(asoc); + + /* Accounting for the sndbuf space is per socket, so we + * need to wake up others, try to be fair and in case of + * other associations, let them have a go first instead + * of just doing a sctp_write_space() call. + * + * Note that we reach sctp_wake_up_waiters() only when + * associations free up queued chunks, thus we are under + * lock and the list of associations on a socket is + * guaranteed not to change. + */ + for (tmp = list_next_entry(tmp, asocs); 1; + tmp = list_next_entry(tmp, asocs)) { + /* Manually skip the head element. */ + if (&tmp->asocs == &((sctp_sk(sk))->ep->asocs)) + continue; + /* Wake up association. */ + __sctp_write_space(tmp); + /* We've reached the end. */ + if (tmp == asoc) + break; + } +} + /* Do accounting for the sndbuf space. * Decrement the used sndbuf space of the corresponding association by the * data size which was just transmitted(freed). @@ -6620,7 +6654,7 @@ static void sctp_wfree(struct sk_buff *skb) sk_mem_uncharge(sk, skb->truesize); sock_wfree(skb); - __sctp_write_space(asoc); + sctp_wake_up_waiters(sk, asoc); sctp_association_put(asoc); } -- 1.7.11.7 From 402ed7d4dd449434bc1ed49739db7fa909f8f0f3 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Wed, 9 Apr 2014 16:10:20 +0200 Subject: [PATCH 02/76] net: sctp: test if association is dead in sctp_wake_up_waiters [ Upstream commit 1e1cdf8ac78793e0875465e98a648df64694a8d0 ] In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Acked-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/socket.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 5f83a6a..270d5bd 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6604,6 +6604,12 @@ static void sctp_wake_up_waiters(struct sock *sk, if (asoc->ep->sndbuf_policy) return __sctp_write_space(asoc); + /* If association goes down and is just flushing its + * outq, then just normally notify others. + */ + if (asoc->base.dead) + return sctp_write_space(sk); + /* Accounting for the sndbuf space is per socket, so we * need to wake up others, try to be fair and in case of * other associations, let them have a go first instead -- 1.7.11.7 From 58ce70cd1ce754079651197e0c25316da5188e6d Mon Sep 17 00:00:00 2001 From: Dmitry Petukhov <dmgenp@xxxxxxxxx> Date: Wed, 9 Apr 2014 02:23:20 +0600 Subject: [PATCH 03/76] l2tp: take PMTU from tunnel UDP socket [ Upstream commit f34c4a35d87949fbb0e0f31eba3c054e9f8199ba ] When l2tp driver tries to get PMTU for the tunnel destination, it uses the pointer to struct sock that represents PPPoX socket, while it should use the pointer that represents UDP socket of the tunnel. Signed-off-by: Dmitry Petukhov <dmgenp@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/l2tp/l2tp_ppp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 5990919..ec66063 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -756,9 +756,9 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, session->deref = pppol2tp_session_sock_put; /* If PMTU discovery was enabled, use the MTU that was discovered */ - dst = sk_dst_get(sk); + dst = sk_dst_get(tunnel->sock); if (dst != NULL) { - u32 pmtu = dst_mtu(__sk_dst_get(sk)); + u32 pmtu = dst_mtu(__sk_dst_get(tunnel->sock)); if (pmtu != 0) session->mtu = session->mru = pmtu - PPPOL2TP_HEADER_OVERHEAD; -- 1.7.11.7 From 3d68ab189f4f5f44f482774eef66b0f68e83116f Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Wed, 9 Apr 2014 10:28:50 +0200 Subject: [PATCH 04/76] net: core: don't account for udp header size when computing seglen [ Upstream commit 6d39d589bb76ee8a1c6cde6822006ae0053decff ] In case of tcp, gso_size contains the tcpmss. For UFO (udp fragmentation offloading) skbs, gso_size is the fragment payload size, i.e. we must not account for udp header size. Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet will be needlessly fragmented in the forward path, because we think its individual segments are too large for the outgoing link. Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path") Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx> Reported-by: Tobias Brunner <tobias@xxxxxxxxxxxxxx> Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/skbuff.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 90b96a1..31d7cd7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3951,12 +3951,14 @@ EXPORT_SYMBOL_GPL(skb_scrub_packet); unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) { const struct skb_shared_info *shinfo = skb_shinfo(skb); - unsigned int hdr_len; if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) - hdr_len = tcp_hdrlen(skb); - else - hdr_len = sizeof(struct udphdr); - return hdr_len + shinfo->gso_size; + return tcp_hdrlen(skb) + shinfo->gso_size; + + /* UFO sets gso_size to the size of the fragmentation + * payload, i.e. the size of the L4 (UDP) header is already + * accounted for. + */ + return shinfo->gso_size; } EXPORT_SYMBOL_GPL(skb_gso_transport_seglen); -- 1.7.11.7 From 536dc48515be23fbe083a3ed67b6a318c762f654 Mon Sep 17 00:00:00 2001 From: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Date: Wed, 9 Apr 2014 12:52:59 +0200 Subject: [PATCH 05/76] bonding: Remove debug_fs files when module init fails [ Upstream commit db29868653394937037d71dc3545768302dda643 ] Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter <tmricht@xxxxxxxxxxxxxxxxxx> Signed-off-by: Jay Vosburgh <j.vosburgh@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e5628fc..91ec8cd 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4536,6 +4536,7 @@ static int __init bonding_init(void) out: return res; err: + bond_destroy_debugfs(); bond_netlink_fini(); err_link: unregister_pernet_subsys(&bond_net_ops); -- 1.7.11.7 From 3fe697d2141a9f225cd80e522065f9709c1399a8 Mon Sep 17 00:00:00 2001 From: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Date: Wed, 9 Apr 2014 17:00:30 +0900 Subject: [PATCH 06/76] bridge: Fix double free and memory leak around br_allowed_ingress [ Upstream commit eb7076182d1ae4bc4641534134ed707100d76acc ] br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/bridge/br_input.c | 2 +- net/bridge/br_vlan.c | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index d0cca3c..7985dea 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -73,7 +73,7 @@ int br_handle_frame_finish(struct sk_buff *skb) goto drop; if (!br_allowed_ingress(p->br, nbp_get_vlan_info(p), skb, &vid)) - goto drop; + goto out; /* insert into forwarding database after filtering to avoid spoofing */ br = p->br; diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index f23c74b..ba7757b 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -170,7 +170,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, * rejected. */ if (!v) - return false; + goto drop; /* If vlan tx offload is disabled on bridge device and frame was * sent from vlan device on the bridge device, it does not have @@ -193,7 +193,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, * vlan untagged or priority-tagged traffic belongs to. */ if (pvid == VLAN_N_VID) - return false; + goto drop; /* PVID is set on this port. Any untagged or priority-tagged * ingress frame is considered to belong to this vlan. @@ -216,7 +216,8 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, /* Frame had a valid vlan tag. See if vlan is allowed */ if (test_bit(*vid, v->vlan_bitmap)) return true; - +drop: + kfree_skb(skb); return false; } -- 1.7.11.7 From 3459dac58d2b2893586f99070becab52b32bc9e8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 10 Apr 2014 21:23:36 -0700 Subject: [PATCH 07/76] ipv6: Limit mtu to 65575 bytes [ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@xxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: YOSHIFUJI Hideaki <yoshfuji@xxxxxxxxxxxxxx> Acked-by: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/ip6_route.h | 5 +++++ net/ipv6/route.c | 5 +++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 017badb..2e74c6c 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -32,6 +32,11 @@ struct route_info { #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 +/* We do not (yet ?) support IPv6 jumbograms (RFC 2675) + * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header + */ +#define IP6_MAX_MTU (0xFFFF + sizeof(struct ipv6hdr)) + /* * rt6_srcprefs2flags() and rt6_flags2srcprefs() translate * between IPV6_ADDR_PREFERENCES socket option values diff --git a/net/ipv6/route.c b/net/ipv6/route.c index fba54a4..7cc1102 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1342,7 +1342,7 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); if (mtu) - return mtu; + goto out; mtu = IPV6_MIN_MTU; @@ -1352,7 +1352,8 @@ static unsigned int ip6_mtu(const struct dst_entry *dst) mtu = idev->cnf.mtu6; rcu_read_unlock(); - return mtu; +out: + return min_t(unsigned int, mtu, IP6_MAX_MTU); } static struct dst_entry *icmp6_dst_gc_list; -- 1.7.11.7 From ffae4d6d57936b95c26b1b9fa5ed2aba5e1aa6c1 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Date: Fri, 11 Apr 2014 15:51:18 +0200 Subject: [PATCH 08/76] gre: don't allow to add the same tunnel twice [ Upstream commit 5a4552752d8f7f4cef1d98775ece7adb7616fde2 ] Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar <pshelar@xxxxxxxxxx> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_gre.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index ec4f762..94213c8 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -463,6 +463,7 @@ static const struct net_device_ops ipgre_netdev_ops = { static void ipgre_tunnel_setup(struct net_device *dev) { dev->netdev_ops = &ipgre_netdev_ops; + dev->type = ARPHRD_IPGRE; ip_tunnel_setup(dev, ipgre_net_id); } @@ -501,7 +502,6 @@ static int ipgre_tunnel_init(struct net_device *dev) memcpy(dev->dev_addr, &iph->saddr, 4); memcpy(dev->broadcast, &iph->daddr, 4); - dev->type = ARPHRD_IPGRE; dev->flags = IFF_NOARP; dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; dev->addr_len = 4; -- 1.7.11.7 From 50850fda5f7a99c37131da0abe173314371dff51 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Date: Fri, 11 Apr 2014 15:51:19 +0200 Subject: [PATCH 09/76] vti: don't allow to add the same tunnel twice [ Upstream commit 8d89dcdf80d88007647945a753821a06eb6cc5a5 ] Before the patch, it was possible to add two times the same tunnel: ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41 ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code"). CC: Cong Wang <amwang@xxxxxxxxxx> CC: Steffen Klassert <steffen.klassert@xxxxxxxxxxx> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_vti.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index 48eafae..e4a8f76 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -207,6 +207,7 @@ static const struct net_device_ops vti_netdev_ops = { static void vti_tunnel_setup(struct net_device *dev) { dev->netdev_ops = &vti_netdev_ops; + dev->type = ARPHRD_TUNNEL; ip_tunnel_setup(dev, vti_net_id); } @@ -218,7 +219,6 @@ static int vti_tunnel_init(struct net_device *dev) memcpy(dev->dev_addr, &iph->saddr, 4); memcpy(dev->broadcast, &iph->daddr, 4); - dev->type = ARPHRD_TUNNEL; dev->hard_header_len = LL_MAX_HEADER + sizeof(struct iphdr); dev->mtu = ETH_DATA_LEN; dev->flags = IFF_NOARP; -- 1.7.11.7 From 97985b7a91273963ac93935e7a9b1068ecaf54e9 Mon Sep 17 00:00:00 2001 From: "Wang, Xiaoming" <xiaoming.wang@xxxxxxxxx> Date: Mon, 14 Apr 2014 12:30:45 -0400 Subject: [PATCH 10/76] net: ipv4: current group_info should be put after using. [ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ] Plug a group_info refcount leak in ping_init. group_info is only needed during initialization and the code failed to release the reference on exit. While here move grabbing the reference to a place where it is actually needed. Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx> Signed-off-by: Zhang Dongxing <dongxing.zhang@xxxxxxxxx> Signed-off-by: xiaoming wang <xiaoming.wang@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ping.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 2d11c09..e21934b 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -252,26 +252,33 @@ int ping_init_sock(struct sock *sk) { struct net *net = sock_net(sk); kgid_t group = current_egid(); - struct group_info *group_info = get_current_groups(); - int i, j, count = group_info->ngroups; + struct group_info *group_info; + int i, j, count; kgid_t low, high; + int ret = 0; inet_get_ping_group_range_net(net, &low, &high); if (gid_lte(low, group) && gid_lte(group, high)) return 0; + group_info = get_current_groups(); + count = group_info->ngroups; for (i = 0; i < group_info->nblocks; i++) { int cp_count = min_t(int, NGROUPS_PER_BLOCK, count); for (j = 0; j < cp_count; j++) { kgid_t gid = group_info->blocks[i][j]; if (gid_lte(low, gid) && gid_lte(gid, high)) - return 0; + goto out_release_group; } count -= cp_count; } - return -EACCES; + ret = -EACCES; + +out_release_group: + put_group_info(group_info); + return ret; } EXPORT_SYMBOL_GPL(ping_init_sock); -- 1.7.11.7 From 400cc8c5b84f8641484d5e2dfcff7a375c98fd58 Mon Sep 17 00:00:00 2001 From: Julian Anastasov <ja@xxxxxx> Date: Sun, 13 Apr 2014 18:08:02 +0300 Subject: [PATCH 11/76] ipv4: return valid RTA_IIF on ip route get [ Upstream commit 91146153da2feab18efab2e13b0945b6bb704ded ] Extend commit 13378cad02afc2adc6c0e07fca03903c7ada0b37 ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0 which is displayed as 'iif *'. inet_iif is not appropriate to use because skb_iif is not set. Use the skb->dev->ifindex instead. Signed-off-by: Julian Anastasov <ja@xxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 4c011ec..db66057 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2364,7 +2364,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, } } else #endif - if (nla_put_u32(skb, RTA_IIF, rt->rt_iif)) + if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex)) goto nla_put_failure; } -- 1.7.11.7 From 96a094467c7711bebdab97b78cd02fa3e696de59 Mon Sep 17 00:00:00 2001 From: Mathias Krause <minipli@xxxxxxxxxxxxxx> Date: Sun, 13 Apr 2014 18:23:33 +0200 Subject: [PATCH 12/76] filter: prevent nla extensions to peek beyond the end of the message [ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ] The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check for a minimal message length before testing the supplied offset to be within the bounds of the message. This allows the subtraction of the nla header to underflow and therefore -- as the data type is unsigned -- allowing far to big offset and length values for the search of the netlink attribute. The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is also wrong. It has the minuend and subtrahend mixed up, therefore calculates a huge length value, allowing to overrun the end of the message while looking for the netlink attribute. The following three BPF snippets will trigger the bugs when attached to a UNIX datagram socket and parsing a message with length 1, 2 or 3. ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]-- | ld #0x87654321 | ldx #42 | ld #nla | ret a `--- ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]-- | ld #0x87654321 | ldx #42 | ld #nlan | ret a `--- ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]-- | ; (needs a fake netlink header at offset 0) | ld #0 | ldx #42 | ld #nlan | ret a `--- Fix the first issue by ensuring the message length fulfills the minimal size constrains of a nla header. Fix the second bug by getting the math for the remainder calculation right. Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@xxxxxxxxx> Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx> Acked-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/filter.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index ad30d62..ebce437 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -355,6 +355,8 @@ load_b: if (skb_is_nonlinear(skb)) return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; if (A > skb->len - sizeof(struct nlattr)) return 0; @@ -371,11 +373,13 @@ load_b: if (skb_is_nonlinear(skb)) return 0; + if (skb->len < sizeof(struct nlattr)) + return 0; if (A > skb->len - sizeof(struct nlattr)) return 0; nla = (struct nlattr *)&skb->data[A]; - if (nla->nla_len > A - skb->len) + if (nla->nla_len > skb->len - A) return 0; nla = nla_find_nested(nla, X); -- 1.7.11.7 From 333479ca47c708a7fabf0d18ba3325b120468af2 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <dborkman@xxxxxxxxxx> Date: Mon, 14 Apr 2014 21:45:17 +0200 Subject: [PATCH 13/76] Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" [ Upstream commit 362d52040c71f6e8d8158be48c812d7729cb8df1 ] This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@xxxxxxxxxxxx> Reported-by: Dongsheng Song <dongsheng.song@xxxxxxxxx> Reported-by: Fengguang Wu <fengguang.wu@xxxxxxxxx> Tested-by: Peter Butler <pbutler@xxxxxxxxxxxx> Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@xxxxxxx> Cc: Alexander Sverdlin <alexander.sverdlin@xxxxxxx> Cc: Vlad Yasevich <vyasevich@xxxxxxxxx> Acked-by: Vlad Yasevich <vyasevich@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/sctp/structs.h | 14 +++++++- net/sctp/associola.c | 82 ++++++++++++++++++++++++++++++++++++---------- net/sctp/sm_statefuns.c | 2 +- net/sctp/socket.c | 6 ++++ net/sctp/ulpevent.c | 8 ++--- 5 files changed, 87 insertions(+), 25 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 6ee76c8..d992ca3 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1653,6 +1653,17 @@ struct sctp_association { /* This is the last advertised value of rwnd over a SACK chunk. */ __u32 a_rwnd; + /* Number of bytes by which the rwnd has slopped. The rwnd is allowed + * to slop over a maximum of the association's frag_point. + */ + __u32 rwnd_over; + + /* Keeps treack of rwnd pressure. This happens when we have + * a window, but not recevie buffer (i.e small packets). This one + * is releases slowly (1 PMTU at a time ). + */ + __u32 rwnd_press; + /* This is the sndbuf size in use for the association. * This corresponds to the sndbuf size for the association, * as specified in the sk->sndbuf. @@ -1881,7 +1892,8 @@ void sctp_assoc_update(struct sctp_association *old, __u32 sctp_association_get_next_tsn(struct sctp_association *); void sctp_assoc_sync_pmtu(struct sock *, struct sctp_association *); -void sctp_assoc_rwnd_update(struct sctp_association *, bool); +void sctp_assoc_rwnd_increase(struct sctp_association *, unsigned int); +void sctp_assoc_rwnd_decrease(struct sctp_association *, unsigned int); void sctp_assoc_set_primary(struct sctp_association *, struct sctp_transport *); void sctp_assoc_del_nonprimary_peers(struct sctp_association *, diff --git a/net/sctp/associola.c b/net/sctp/associola.c index ee13d28..878e17a 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1396,35 +1396,44 @@ static inline bool sctp_peer_needs_update(struct sctp_association *asoc) return false; } -/* Update asoc's rwnd for the approximated state in the buffer, - * and check whether SACK needs to be sent. - */ -void sctp_assoc_rwnd_update(struct sctp_association *asoc, bool update_peer) +/* Increase asoc's rwnd by len and send any window update SACK if needed. */ +void sctp_assoc_rwnd_increase(struct sctp_association *asoc, unsigned int len) { - int rx_count; struct sctp_chunk *sack; struct timer_list *timer; - if (asoc->ep->rcvbuf_policy) - rx_count = atomic_read(&asoc->rmem_alloc); - else - rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc); + if (asoc->rwnd_over) { + if (asoc->rwnd_over >= len) { + asoc->rwnd_over -= len; + } else { + asoc->rwnd += (len - asoc->rwnd_over); + asoc->rwnd_over = 0; + } + } else { + asoc->rwnd += len; + } - if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0) - asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1; - else - asoc->rwnd = 0; + /* If we had window pressure, start recovering it + * once our rwnd had reached the accumulated pressure + * threshold. The idea is to recover slowly, but up + * to the initial advertised window. + */ + if (asoc->rwnd_press && asoc->rwnd >= asoc->rwnd_press) { + int change = min(asoc->pathmtu, asoc->rwnd_press); + asoc->rwnd += change; + asoc->rwnd_press -= change; + } - pr_debug("%s: asoc:%p rwnd=%u, rx_count=%d, sk_rcvbuf=%d\n", - __func__, asoc, asoc->rwnd, rx_count, - asoc->base.sk->sk_rcvbuf); + pr_debug("%s: asoc:%p rwnd increased by %d to (%u, %u) - %u\n", + __func__, asoc, len, asoc->rwnd, asoc->rwnd_over, + asoc->a_rwnd); /* Send a window update SACK if the rwnd has increased by at least the * minimum of the association's PMTU and half of the receive buffer. * The algorithm used is similar to the one described in * Section 4.2.3.3 of RFC 1122. */ - if (update_peer && sctp_peer_needs_update(asoc)) { + if (sctp_peer_needs_update(asoc)) { asoc->a_rwnd = asoc->rwnd; pr_debug("%s: sending window update SACK- asoc:%p rwnd:%u " @@ -1446,6 +1455,45 @@ void sctp_assoc_rwnd_update(struct sctp_association *asoc, bool update_peer) } } +/* Decrease asoc's rwnd by len. */ +void sctp_assoc_rwnd_decrease(struct sctp_association *asoc, unsigned int len) +{ + int rx_count; + int over = 0; + + if (unlikely(!asoc->rwnd || asoc->rwnd_over)) + pr_debug("%s: association:%p has asoc->rwnd:%u, " + "asoc->rwnd_over:%u!\n", __func__, asoc, + asoc->rwnd, asoc->rwnd_over); + + if (asoc->ep->rcvbuf_policy) + rx_count = atomic_read(&asoc->rmem_alloc); + else + rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc); + + /* If we've reached or overflowed our receive buffer, announce + * a 0 rwnd if rwnd would still be positive. Store the + * the potential pressure overflow so that the window can be restored + * back to original value. + */ + if (rx_count >= asoc->base.sk->sk_rcvbuf) + over = 1; + + if (asoc->rwnd >= len) { + asoc->rwnd -= len; + if (over) { + asoc->rwnd_press += asoc->rwnd; + asoc->rwnd = 0; + } + } else { + asoc->rwnd_over = len - asoc->rwnd; + asoc->rwnd = 0; + } + + pr_debug("%s: asoc:%p rwnd decreased by %d to (%u, %u, %u)\n", + __func__, asoc, len, asoc->rwnd, asoc->rwnd_over, + asoc->rwnd_press); +} /* Build the bind address list for the association based on info from the * local endpoint and the remote peer. diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 01e0024..ae9fbeb 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -6178,7 +6178,7 @@ static int sctp_eat_data(const struct sctp_association *asoc, * PMTU. In cases, such as loopback, this might be a rather * large spill over. */ - if ((!chunk->data_accepted) && (!asoc->rwnd || + if ((!chunk->data_accepted) && (!asoc->rwnd || asoc->rwnd_over || (datalen > asoc->rwnd + asoc->frag_point))) { /* If this is the next TSN, consider reneging to make diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 270d5bd..9a5d590 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2115,6 +2115,12 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk, sctp_skb_pull(skb, copied); skb_queue_head(&sk->sk_receive_queue, skb); + /* When only partial message is copied to the user, increase + * rwnd by that amount. If all the data in the skb is read, + * rwnd is updated when the event is freed. + */ + if (!sctp_ulpevent_is_notification(event)) + sctp_assoc_rwnd_increase(event->asoc, copied); goto out; } else if ((event->msg_flags & MSG_NOTIFICATION) || (event->msg_flags & MSG_EOR)) diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index 8d198ae..85c6465 100644 --- a/net/sctp/ulpevent.c +++ b/net/sctp/ulpevent.c @@ -989,7 +989,7 @@ static void sctp_ulpevent_receive_data(struct sctp_ulpevent *event, skb = sctp_event2skb(event); /* Set the owner and charge rwnd for bytes received. */ sctp_ulpevent_set_owner(event, asoc); - sctp_assoc_rwnd_update(asoc, false); + sctp_assoc_rwnd_decrease(asoc, skb_headlen(skb)); if (!skb->data_len) return; @@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event) { struct sk_buff *skb, *frag; unsigned int len; - struct sctp_association *asoc; /* Current stack structures assume that the rcv buffer is * per socket. For UDP style sockets this is not true as @@ -1036,11 +1035,8 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event) } done: - asoc = event->asoc; - sctp_association_hold(asoc); + sctp_assoc_rwnd_increase(event->asoc, len); sctp_ulpevent_release_owner(event); - sctp_assoc_rwnd_update(asoc, true); - sctp_association_put(asoc); } static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event) -- 1.7.11.7 From ec4825cc81e81c9ebe170a6b6697c5b133313481 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Mon, 14 Apr 2014 17:37:26 -0400 Subject: [PATCH 14/76] net: Start with correct mac_len in skb_network_protocol [ Upstream commit 1e785f48d29a09b6cf96db7b49b6320dada332e1 ] Sometimes, when the packet arrives at skb_mac_gso_segment() its skb->mac_len already accounts for some of the mac lenght headers in the packet. This seems to happen when forwarding through and OpenSSL tunnel. When we start looking for any vlan headers in skb_network_protocol() we seem to ignore any of the already known mac headers and start with an ETH_HLEN. This results in an incorrect offset, dropped TSO frames and general slowness of the connection. We can start counting from the known skb->mac_len and return at least that much if all mac level headers are known and accounted for. Fixes: 53d6471cef17262d3ad1c7ce8982a234244f68ec (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet <eric.dumazet@xxxxxxxxx> CC: Daniel Borkman <dborkman@xxxxxxxxxx> Tested-by: Martin Filip <nexus+kernel@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 45fa2f1..6088927 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2289,7 +2289,7 @@ EXPORT_SYMBOL(skb_checksum_help); __be16 skb_network_protocol(struct sk_buff *skb, int *depth) { __be16 type = skb->protocol; - int vlan_depth = ETH_HLEN; + int vlan_depth = skb->mac_len; /* Tunnel gso handlers can set protocol to ethernet. */ if (type == htons(ETH_P_TEB)) { -- 1.7.11.7 From a46c7028896af2152a91a3a00fa7348890440d65 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Date: Mon, 14 Apr 2014 17:11:38 +0200 Subject: [PATCH 15/76] ip6_gre: don't allow to remove the fb_tunnel_dev [ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ] It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@xxxxxxx> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_gre.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index f3ffb43..2465d18 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1566,6 +1566,15 @@ static int ip6gre_changelink(struct net_device *dev, struct nlattr *tb[], return 0; } +static void ip6gre_dellink(struct net_device *dev, struct list_head *head) +{ + struct net *net = dev_net(dev); + struct ip6gre_net *ign = net_generic(net, ip6gre_net_id); + + if (dev != ign->fb_tunnel_dev) + unregister_netdevice_queue(dev, head); +} + static size_t ip6gre_get_size(const struct net_device *dev) { return @@ -1643,6 +1652,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = { .validate = ip6gre_tunnel_validate, .newlink = ip6gre_newlink, .changelink = ip6gre_changelink, + .dellink = ip6gre_dellink, .get_size = ip6gre_get_size, .fill_info = ip6gre_fill_info, }; -- 1.7.11.7 From 658b0f20f9f4b2aaaa8ba505259259210aa42e5a Mon Sep 17 00:00:00 2001 From: dingtianhong <dingtianhong@xxxxxxxxxx> Date: Thu, 17 Apr 2014 18:40:36 +0800 Subject: [PATCH 16/76] vlan: Fix lockdep warning when vlan dev handle notification [ Upstream commit dc8eaaa006350d24030502a4521542e74b5cb39f ] When I open the LOCKDEP config and run these steps: modprobe 8021q vconfig add eth2 20 vconfig add eth2.20 30 ifconfig eth2 xx.xx.xx.xx then the Call Trace happened: [32524.386288] ============================================= [32524.386293] [ INFO: possible recursive locking detected ] [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O [32524.386302] --------------------------------------------- [32524.386306] ifconfig/3103 is trying to acquire lock: [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386326] [32524.386326] but task is already holding lock: [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386341] [32524.386341] other info that might help us debug this: [32524.386345] Possible unsafe locking scenario: [32524.386345] [32524.386350] CPU0 [32524.386352] ---- [32524.386354] lock(&vlan_netdev_addr_lock_key/1); [32524.386359] lock(&vlan_netdev_addr_lock_key/1); [32524.386364] [32524.386364] *** DEADLOCK *** [32524.386364] [32524.386368] May be due to missing lock nesting notation [32524.386368] [32524.386373] 2 locks held by ifconfig/3103: [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20 [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40 [32524.386398] [32524.386398] stack backtrace: [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35 [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8 [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0 [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000 [32524.386435] Call Trace: [32524.386441] [<ffffffff814f68a2>] dump_stack+0x6a/0x78 [32524.386448] [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940 [32524.386454] [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940 [32524.386459] [<ffffffff810a4874>] lock_acquire+0xe4/0x110 [32524.386464] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386471] [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40 [32524.386476] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0 [32524.386481] [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0 [32524.386489] [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q] [32524.386495] [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0 [32524.386500] [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40 [32524.386506] [<ffffffff8141b3cf>] __dev_open+0xef/0x150 [32524.386511] [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190 [32524.386516] [<ffffffff8141b292>] dev_change_flags+0x32/0x80 [32524.386524] [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830 [32524.386532] [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660 [32524.386540] [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0 [32524.386550] [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60 [32524.386558] [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0 [32524.386568] [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590 [32524.386578] [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50 [32524.386586] [<ffffffff811b39e5>] ? __fget_light+0x105/0x110 [32524.386594] [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0 [32524.386604] [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b ======================================================================== The reason is that all of the addr_lock_key for vlan dev have the same class, so if we change the status for vlan dev, the vlan dev and its real dev will hold the same class of addr_lock_key together, so the warning happened. we should distinguish the lock depth for vlan dev and its real dev. v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which could support to add 8 vlan id on a same vlan dev, I think it is enough for current scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id, and the vlan dev would not meet the same class key with its real dev. The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan dev could get a suitable class key. v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev and its real dev, but it make no sense, because the difference for subclass in the lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth to distinguish the different depth for every vlan dev, the same depth of the vlan dev could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h, I think it is enough here, the lockdep should never exceed that value. v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method, we could use _nested() variants to fix the problem, calculate the depth for every vlan dev, and use the depth as the subclass for addr_lock_key. Signed-off-by: Ding Tianhong <dingtianhong@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/8021q/vlan_dev.c | 46 +++++++++++++++++++++++++++++++++++++++++----- net/core/dev.c | 1 + 2 files changed, 42 insertions(+), 5 deletions(-) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 27bfe2f..96e8db7 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -493,10 +493,48 @@ static void vlan_dev_change_rx_flags(struct net_device *dev, int change) } } +static int vlan_calculate_locking_subclass(struct net_device *real_dev) +{ + int subclass = 0; + + while (is_vlan_dev(real_dev)) { + subclass++; + real_dev = vlan_dev_priv(real_dev)->real_dev; + } + + return subclass; +} + +static void vlan_dev_mc_sync(struct net_device *to, struct net_device *from) +{ + int err = 0, subclass; + + subclass = vlan_calculate_locking_subclass(to); + + spin_lock_nested(&to->addr_list_lock, subclass); + err = __hw_addr_sync(&to->mc, &from->mc, to->addr_len); + if (!err) + __dev_set_rx_mode(to); + spin_unlock(&to->addr_list_lock); +} + +static void vlan_dev_uc_sync(struct net_device *to, struct net_device *from) +{ + int err = 0, subclass; + + subclass = vlan_calculate_locking_subclass(to); + + spin_lock_nested(&to->addr_list_lock, subclass); + err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len); + if (!err) + __dev_set_rx_mode(to); + spin_unlock(&to->addr_list_lock); +} + static void vlan_dev_set_rx_mode(struct net_device *vlan_dev) { - dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); - dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); + vlan_dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); + vlan_dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); } /* @@ -608,9 +646,7 @@ static int vlan_dev_init(struct net_device *dev) SET_NETDEV_DEVTYPE(dev, &vlan_type); - if (is_vlan_dev(real_dev)) - subclass = 1; - + subclass = vlan_calculate_locking_subclass(dev); vlan_dev_set_lockdep_class(dev, subclass); vlan_dev_priv(dev)->vlan_pcpu_stats = alloc_percpu(struct vlan_pcpu_stats); diff --git a/net/core/dev.c b/net/core/dev.c index 6088927..01fde89 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5219,6 +5219,7 @@ void __dev_set_rx_mode(struct net_device *dev) if (ops->ndo_set_rx_mode) ops->ndo_set_rx_mode(dev); } +EXPORT_SYMBOL(__dev_set_rx_mode); void dev_set_rx_mode(struct net_device *dev) { -- 1.7.11.7 From 7aefa9f83c960f3af2869c85f7188f433acc43bb Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Fri, 16 May 2014 17:04:53 -0400 Subject: [PATCH 17/76] net: Find the nesting level of a given device by type. [ Upstream commit 4085ebe8c31face855fd01ee40372cb4aab1df3a ] Multiple devices in the kernel can be stacked/nested and they need to know their nesting level for the purposes of lockdep. This patch provides a generic function that determines a nesting level of a particular device by its type (ex: vlan, macvlan, etc). We only care about nesting of the same type of devices. For example: eth0 <- vlan0.10 <- macvlan0 <- vlan1.20 The nesting level of vlan1.20 would be 1, since there is another vlan in the stack under it. Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/netdevice.h | 10 ++++++++++ net/core/dev.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index daafd95..26b30df 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2988,6 +2988,14 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev, priv; \ priv = netdev_lower_get_next_private_rcu(dev, &(iter))) +void *netdev_lower_get_next(struct net_device *dev, + struct list_head **iter); +#define netdev_for_each_lower_dev(dev, ldev, iter) \ + for (iter = &(dev)->adj_list.lower, \ + ldev = netdev_lower_get_next(dev, &(iter)); \ + ldev; \ + ldev = netdev_lower_get_next(dev, &(iter))) + void *netdev_adjacent_get_private(struct list_head *adj_list); void *netdev_lower_get_first_private_rcu(struct net_device *dev); struct net_device *netdev_master_upper_dev_get(struct net_device *dev); @@ -3003,6 +3011,8 @@ void netdev_upper_dev_unlink(struct net_device *dev, void netdev_adjacent_rename_links(struct net_device *dev, char *oldname); void *netdev_lower_dev_get_private(struct net_device *dev, struct net_device *lower_dev); +int dev_get_nest_level(struct net_device *dev, + bool (*type_check)(struct net_device *dev)); int skb_checksum_help(struct sk_buff *skb); struct sk_buff *__skb_gso_segment(struct sk_buff *skb, netdev_features_t features, bool tx_path); diff --git a/net/core/dev.c b/net/core/dev.c index 01fde89..540386c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4605,6 +4605,32 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev, EXPORT_SYMBOL(netdev_lower_get_next_private_rcu); /** + * netdev_lower_get_next - Get the next device from the lower neighbour + * list + * @dev: device + * @iter: list_head ** of the current position + * + * Gets the next netdev_adjacent from the dev's lower neighbour + * list, starting from iter position. The caller must hold RTNL lock or + * its own locking that guarantees that the neighbour lower + * list will remain unchainged. + */ +void *netdev_lower_get_next(struct net_device *dev, struct list_head **iter) +{ + struct netdev_adjacent *lower; + + lower = list_entry((*iter)->next, struct netdev_adjacent, list); + + if (&lower->list == &dev->adj_list.lower) + return NULL; + + *iter = &lower->list; + + return lower->dev; +} +EXPORT_SYMBOL(netdev_lower_get_next); + +/** * netdev_lower_get_first_private_rcu - Get the first ->private from the * lower neighbour list, RCU * variant @@ -5054,6 +5080,30 @@ void *netdev_lower_dev_get_private(struct net_device *dev, } EXPORT_SYMBOL(netdev_lower_dev_get_private); + +int dev_get_nest_level(struct net_device *dev, + bool (*type_check)(struct net_device *dev)) +{ + struct net_device *lower = NULL; + struct list_head *iter; + int max_nest = -1; + int nest; + + ASSERT_RTNL(); + + netdev_for_each_lower_dev(dev, lower, iter) { + nest = dev_get_nest_level(lower, type_check); + if (max_nest < nest) + max_nest = nest; + } + + if (type_check(dev)) + max_nest++; + + return max_nest; +} +EXPORT_SYMBOL(dev_get_nest_level); + static void dev_change_rx_flags(struct net_device *dev, int flags) { const struct net_device_ops *ops = dev->netdev_ops; -- 1.7.11.7 From 11b9726e86094fa8248118e91735d5d1b317c5e7 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Fri, 16 May 2014 17:04:54 -0400 Subject: [PATCH 18/76] net: Allow for more then a single subclass for netif_addr_lock [ Upstream commit 25175ba5c9bff9aaf0229df34bb5d54c81633ec3 ] Currently netif_addr_lock_nested assumes that there can be only a single nesting level between 2 devices. However, if we have multiple devices of the same type stacked, this fails. For example: eth0 <-- vlan0.10 <-- vlan0.10.20 A more complicated configuration may stack more then one type of device in different order. Ex: eth0 <-- vlan0.10 <-- macvlan0 <-- vlan1.10.20 <-- macvlan1 This patch adds an ndo_* function that allows each stackable device to report its nesting level. If the device doesn't provide this function default subclass of 1 is used. Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/netdevice.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 26b30df..911718f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1145,6 +1145,7 @@ struct net_device_ops { netdev_tx_t (*ndo_dfwd_start_xmit) (struct sk_buff *skb, struct net_device *dev, void *priv); + int (*ndo_get_lock_subclass)(struct net_device *dev); }; /* @@ -2861,7 +2862,12 @@ static inline void netif_addr_lock(struct net_device *dev) static inline void netif_addr_lock_nested(struct net_device *dev) { - spin_lock_nested(&dev->addr_list_lock, SINGLE_DEPTH_NESTING); + int subclass = SINGLE_DEPTH_NESTING; + + if (dev->netdev_ops->ndo_get_lock_subclass) + subclass = dev->netdev_ops->ndo_get_lock_subclass(dev); + + spin_lock_nested(&dev->addr_list_lock, subclass); } static inline void netif_addr_lock_bh(struct net_device *dev) -- 1.7.11.7 From 95229d595f11808c6e20dcb6876e58af38fb4efb Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Fri, 16 May 2014 17:04:55 -0400 Subject: [PATCH 19/76] vlan: Fix lockdep warning with stacked vlan devices. [ Upstream commit d38569ab2bba6e6b3233acfc3a84cdbcfbd1f79f ] This reverts commit dc8eaaa006350d24030502a4521542e74b5cb39f. vlan: Fix lockdep warning when vlan dev handle notification Instead we use the new new API to find the lock subclass of our vlan device. This way we can support configurations where vlans are interspersed with other devices: bond -> vlan -> macvlan -> vlan Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/if_vlan.h | 3 ++- net/8021q/vlan.c | 1 + net/8021q/vlan_dev.c | 53 ++++++++++--------------------------------------- net/core/dev.c | 1 - 4 files changed, 13 insertions(+), 45 deletions(-) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index bbedfb5..72ba6f5 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -73,7 +73,7 @@ static inline struct vlan_ethhdr *vlan_eth_hdr(const struct sk_buff *skb) /* found in socket.c */ extern void vlan_ioctl_set(int (*hook)(struct net *, void __user *)); -static inline int is_vlan_dev(struct net_device *dev) +static inline bool is_vlan_dev(struct net_device *dev) { return dev->priv_flags & IFF_802_1Q_VLAN; } @@ -158,6 +158,7 @@ struct vlan_dev_priv { #ifdef CONFIG_NET_POLL_CONTROLLER struct netpoll *netpoll; #endif + unsigned int nest_level; }; static inline struct vlan_dev_priv *vlan_dev_priv(const struct net_device *dev) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 175273f..44ebd5c 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -169,6 +169,7 @@ int register_vlan_dev(struct net_device *dev) if (err < 0) goto out_uninit_mvrp; + vlan->nest_level = dev_get_nest_level(real_dev, is_vlan_dev) + 1; err = register_netdevice(dev); if (err < 0) goto out_uninit_mvrp; diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 96e8db7..cc0d218 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -493,48 +493,10 @@ static void vlan_dev_change_rx_flags(struct net_device *dev, int change) } } -static int vlan_calculate_locking_subclass(struct net_device *real_dev) -{ - int subclass = 0; - - while (is_vlan_dev(real_dev)) { - subclass++; - real_dev = vlan_dev_priv(real_dev)->real_dev; - } - - return subclass; -} - -static void vlan_dev_mc_sync(struct net_device *to, struct net_device *from) -{ - int err = 0, subclass; - - subclass = vlan_calculate_locking_subclass(to); - - spin_lock_nested(&to->addr_list_lock, subclass); - err = __hw_addr_sync(&to->mc, &from->mc, to->addr_len); - if (!err) - __dev_set_rx_mode(to); - spin_unlock(&to->addr_list_lock); -} - -static void vlan_dev_uc_sync(struct net_device *to, struct net_device *from) -{ - int err = 0, subclass; - - subclass = vlan_calculate_locking_subclass(to); - - spin_lock_nested(&to->addr_list_lock, subclass); - err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len); - if (!err) - __dev_set_rx_mode(to); - spin_unlock(&to->addr_list_lock); -} - static void vlan_dev_set_rx_mode(struct net_device *vlan_dev) { - vlan_dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); - vlan_dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); + dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); + dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev); } /* @@ -562,6 +524,11 @@ static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass) netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, &subclass); } +static int vlan_dev_get_lock_subclass(struct net_device *dev) +{ + return vlan_dev_priv(dev)->nest_level; +} + static const struct header_ops vlan_header_ops = { .create = vlan_dev_hard_header, .rebuild = vlan_dev_rebuild_header, @@ -597,7 +564,7 @@ static const struct net_device_ops vlan_netdev_ops; static int vlan_dev_init(struct net_device *dev) { struct net_device *real_dev = vlan_dev_priv(dev)->real_dev; - int subclass = 0, i; + int i; netif_carrier_off(dev); @@ -646,8 +613,7 @@ static int vlan_dev_init(struct net_device *dev) SET_NETDEV_DEVTYPE(dev, &vlan_type); - subclass = vlan_calculate_locking_subclass(dev); - vlan_dev_set_lockdep_class(dev, subclass); + vlan_dev_set_lockdep_class(dev, vlan_dev_get_lock_subclass(dev)); vlan_dev_priv(dev)->vlan_pcpu_stats = alloc_percpu(struct vlan_pcpu_stats); if (!vlan_dev_priv(dev)->vlan_pcpu_stats) @@ -827,6 +793,7 @@ static const struct net_device_ops vlan_netdev_ops = { .ndo_netpoll_cleanup = vlan_dev_netpoll_cleanup, #endif .ndo_fix_features = vlan_dev_fix_features, + .ndo_get_lock_subclass = vlan_dev_get_lock_subclass, }; void vlan_setup(struct net_device *dev) diff --git a/net/core/dev.c b/net/core/dev.c index 540386c..7c22974 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5269,7 +5269,6 @@ void __dev_set_rx_mode(struct net_device *dev) if (ops->ndo_set_rx_mode) ops->ndo_set_rx_mode(dev); } -EXPORT_SYMBOL(__dev_set_rx_mode); void dev_set_rx_mode(struct net_device *dev) { -- 1.7.11.7 From 203122112df5e669e906acc8412e30a447392570 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Fri, 16 May 2014 17:04:56 -0400 Subject: [PATCH 20/76] macvlan: Fix lockdep warnings with stacked macvlan devices [ Upstream commit c674ac30c549596295eb0a5af7f4714c0b905b6f ] Macvlan devices try to avoid stacking, but that's not always successfull or even desired. As an example, the following configuration is perefectly legal and valid: eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1 However, this configuration produces the following lockdep trace: [ 115.620418] ====================================================== [ 115.620477] [ INFO: possible circular locking dependency detected ] [ 115.620516] 3.15.0-rc1+ #24 Not tainted [ 115.620540] ------------------------------------------------------- [ 115.620577] ip/1704 is trying to acquire lock: [ 115.620604] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.620686] but task is already holding lock: [ 115.620723] (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.620795] which lock already depends on the new lock. [ 115.620853] the existing dependency chain (in reverse order) is: [ 115.620894] -> #1 (&macvlan_netdev_addr_lock_key){+.....}: [ 115.620935] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.620974] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621019] [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q] [ 115.621066] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621105] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621143] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] -> #0 (&vlan_netdev_addr_lock_key/1){+.....}: [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] other info that might help us debug this: [ 115.621174] Possible unsafe locking scenario: [ 115.621174] CPU0 CPU1 [ 115.621174] ---- ---- [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] *** DEADLOCK *** [ 115.621174] 2 locks held by ip/1704: [ 115.621174] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40 [ 115.621174] #1: (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.621174] stack backtrace: [ 115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24 [ 115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010 [ 115.621174] ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0 [ 115.621174] ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8 [ 115.621174] 0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230 [ 115.621174] Call Trace: [ 115.621174] [<ffffffff816ee20c>] dump_stack+0x4d/0x66 [ 115.621174] [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10 [ 115.621174] [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40 [ 115.621174] [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570 [ 115.621174] [<ffffffff810cfe9f>] ? up_read+0x1f/0x40 [ 115.621174] [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570 [ 115.621174] [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0 [ 115.621174] [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0 [ 115.621174] [<ffffffff8120a284>] ? mntput+0x24/0x40 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b Fix this by correctly providing macvlan lockdep class. Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 12 ++++++++++-- include/linux/if_macvlan.h | 1 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 1831fb7..9dde3ea 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -518,6 +518,11 @@ static struct lock_class_key macvlan_netdev_addr_lock_key; #define MACVLAN_STATE_MASK \ ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT)) +static int macvlan_get_nest_level(struct net_device *dev) +{ + return ((struct macvlan_dev *)netdev_priv(dev))->nest_level; +} + static void macvlan_set_lockdep_class_one(struct net_device *dev, struct netdev_queue *txq, void *_unused) @@ -528,8 +533,9 @@ static void macvlan_set_lockdep_class_one(struct net_device *dev, static void macvlan_set_lockdep_class(struct net_device *dev) { - lockdep_set_class(&dev->addr_list_lock, - &macvlan_netdev_addr_lock_key); + lockdep_set_class_and_subclass(&dev->addr_list_lock, + &macvlan_netdev_addr_lock_key, + macvlan_get_nest_level(dev)); netdev_for_each_tx_queue(dev, macvlan_set_lockdep_class_one, NULL); } @@ -731,6 +737,7 @@ static const struct net_device_ops macvlan_netdev_ops = { .ndo_fdb_add = macvlan_fdb_add, .ndo_fdb_del = macvlan_fdb_del, .ndo_fdb_dump = ndo_dflt_fdb_dump, + .ndo_get_lock_subclass = macvlan_get_nest_level, }; void macvlan_common_setup(struct net_device *dev) @@ -859,6 +866,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, vlan->dev = dev; vlan->port = port; vlan->set_features = MACVLAN_FEATURES; + vlan->nest_level = dev_get_nest_level(lowerdev, netif_is_macvlan) + 1; vlan->mode = MACVLAN_MODE_VEPA; if (data && data[IFLA_MACVLAN_MODE]) diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index 7c8b20b..a9a53b1 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -56,6 +56,7 @@ struct macvlan_dev { int numqueues; netdev_features_t tap_features; int minor; + int nest_level; }; static inline void macvlan_count_rx(const struct macvlan_dev *vlan, -- 1.7.11.7 From fb1f8d6cbcdc3409e1cd61dabe3b4e06f8826710 Mon Sep 17 00:00:00 2001 From: Ivan Vecera <ivecera@xxxxxxxxxx> Date: Thu, 17 Apr 2014 14:51:08 +0200 Subject: [PATCH 21/76] tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled The patch fixes a problem with dropped jumbo frames after usage of 'ethtool -G ... rx'. Scenario: 1. ip link set eth0 up 2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo 3. ip link set mtu 9000 dev eth0 The ethtool command set rx_jumbo_pending to zero so any received jumbo packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N' to workaround the issue. The patch changes the logic so rx_jumbo_pending value is changed only if jumbo frames are enabled (MTU > 1500). Signed-off-by: Ivan Vecera <ivecera@xxxxxxxxxx> Acked-by: Michael Chan <mchan@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/ethernet/broadcom/tg3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 70a225c8..a210766 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -12294,7 +12294,9 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e if (tg3_flag(tp, MAX_RXPEND_64) && tp->rx_pending > 63) tp->rx_pending = 63; - tp->rx_jumbo_pending = ering->rx_jumbo_pending; + + if (tg3_flag(tp, JUMBO_RING_ENABLE)) + tp->rx_jumbo_pending = ering->rx_jumbo_pending; for (i = 0; i < tp->irq_max; i++) tp->napi[i].tx_pending = ering->tx_pending; -- 1.7.11.7 From 6e1f2b94713532f619465581684aff6e985687d9 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Thu, 17 Apr 2014 17:26:50 +0200 Subject: [PATCH 22/76] net: sctp: cache auth_enable per endpoint [ Upstream commit b14878ccb7fac0242db82720b784ab62c467c0dc ] Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Signed-off-by: Daniel Borkmann <dborkman@xxxxxxxxxx> Acked-by: Neil Horman <nhorman@xxxxxxxxxxxxx> Tested-by: Joshua Kinard <kumba@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/sctp/structs.h | 4 +++- net/sctp/auth.c | 17 ++++++--------- net/sctp/endpointola.c | 3 ++- net/sctp/sm_make_chunk.c | 32 ++++++++++++++------------- net/sctp/sm_statefuns.c | 6 +++--- net/sctp/socket.c | 54 ++++++++++++++++++++++------------------------ net/sctp/sysctl.c | 36 ++++++++++++++++++++++++++++++- 7 files changed, 92 insertions(+), 60 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index d992ca3..0dfcc92 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1241,6 +1241,7 @@ struct sctp_endpoint { /* SCTP-AUTH: endpoint shared keys */ struct list_head endpoint_shared_keys; __u16 active_key_id; + __u8 auth_enable; }; /* Recover the outter endpoint structure. */ @@ -1269,7 +1270,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *, int sctp_has_association(struct net *net, const union sctp_addr *laddr, const union sctp_addr *paddr); -int sctp_verify_init(struct net *net, const struct sctp_association *asoc, +int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, + const struct sctp_association *asoc, sctp_cid_t, sctp_init_chunk_t *peer_init, struct sctp_chunk *chunk, struct sctp_chunk **err_chunk); int sctp_process_init(struct sctp_association *, struct sctp_chunk *chunk, diff --git a/net/sctp/auth.c b/net/sctp/auth.c index 683c7d1..0e85291 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -386,14 +386,13 @@ nomem: */ int sctp_auth_asoc_init_active_key(struct sctp_association *asoc, gfp_t gfp) { - struct net *net = sock_net(asoc->base.sk); struct sctp_auth_bytes *secret; struct sctp_shared_key *ep_key; /* If we don't support AUTH, or peer is not capable * we don't need to do anything. */ - if (!net->sctp.auth_enable || !asoc->peer.auth_capable) + if (!asoc->ep->auth_enable || !asoc->peer.auth_capable) return 0; /* If the key_id is non-zero and we couldn't find an @@ -440,16 +439,16 @@ struct sctp_shared_key *sctp_auth_get_shkey( */ int sctp_auth_init_hmacs(struct sctp_endpoint *ep, gfp_t gfp) { - struct net *net = sock_net(ep->base.sk); struct crypto_hash *tfm = NULL; __u16 id; - /* if the transforms are already allocted, we are done */ - if (!net->sctp.auth_enable) { + /* If AUTH extension is disabled, we are done */ + if (!ep->auth_enable) { ep->auth_hmacs = NULL; return 0; } + /* If the transforms are already allocated, we are done */ if (ep->auth_hmacs) return 0; @@ -665,12 +664,10 @@ static int __sctp_auth_cid(sctp_cid_t chunk, struct sctp_chunks_param *param) /* Check if peer requested that this chunk is authenticated */ int sctp_auth_send_cid(sctp_cid_t chunk, const struct sctp_association *asoc) { - struct net *net; if (!asoc) return 0; - net = sock_net(asoc->base.sk); - if (!net->sctp.auth_enable || !asoc->peer.auth_capable) + if (!asoc->ep->auth_enable || !asoc->peer.auth_capable) return 0; return __sctp_auth_cid(chunk, asoc->peer.peer_chunks); @@ -679,12 +676,10 @@ int sctp_auth_send_cid(sctp_cid_t chunk, const struct sctp_association *asoc) /* Check if we requested that peer authenticate this chunk. */ int sctp_auth_recv_cid(sctp_cid_t chunk, const struct sctp_association *asoc) { - struct net *net; if (!asoc) return 0; - net = sock_net(asoc->base.sk); - if (!net->sctp.auth_enable) + if (!asoc->ep->auth_enable) return 0; return __sctp_auth_cid(chunk, diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c index 8e5fdea..3d9f429 100644 --- a/net/sctp/endpointola.c +++ b/net/sctp/endpointola.c @@ -68,7 +68,8 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep, if (!ep->digest) return NULL; - if (net->sctp.auth_enable) { + ep->auth_enable = net->sctp.auth_enable; + if (ep->auth_enable) { /* Allocate space for HMACS and CHUNKS authentication * variables. There are arrays that we encode directly * into parameters to make the rest of the operations easier. diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index 3a1767e..fee5552 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -219,6 +219,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, gfp_t gfp, int vparam_len) { struct net *net = sock_net(asoc->base.sk); + struct sctp_endpoint *ep = asoc->ep; sctp_inithdr_t init; union sctp_params addrs; size_t chunksize; @@ -278,7 +279,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, chunksize += vparam_len; /* Account for AUTH related parameters */ - if (net->sctp.auth_enable) { + if (ep->auth_enable) { /* Add random parameter length*/ chunksize += sizeof(asoc->c.auth_random); @@ -363,7 +364,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc, } /* Add SCTP-AUTH chunks to the parameter list */ - if (net->sctp.auth_enable) { + if (ep->auth_enable) { sctp_addto_chunk(retval, sizeof(asoc->c.auth_random), asoc->c.auth_random); if (auth_hmacs) @@ -2010,7 +2011,7 @@ static void sctp_process_ext_param(struct sctp_association *asoc, /* if the peer reports AUTH, assume that he * supports AUTH. */ - if (net->sctp.auth_enable) + if (asoc->ep->auth_enable) asoc->peer.auth_capable = 1; break; case SCTP_CID_ASCONF: @@ -2102,6 +2103,7 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc, * SCTP_IERROR_NO_ERROR - continue with the chunk */ static sctp_ierror_t sctp_verify_param(struct net *net, + const struct sctp_endpoint *ep, const struct sctp_association *asoc, union sctp_params param, sctp_cid_t cid, @@ -2152,7 +2154,7 @@ static sctp_ierror_t sctp_verify_param(struct net *net, goto fallthrough; case SCTP_PARAM_RANDOM: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fallthrough; /* SCTP-AUTH: Secion 6.1 @@ -2169,7 +2171,7 @@ static sctp_ierror_t sctp_verify_param(struct net *net, break; case SCTP_PARAM_CHUNKS: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fallthrough; /* SCTP-AUTH: Section 3.2 @@ -2185,7 +2187,7 @@ static sctp_ierror_t sctp_verify_param(struct net *net, break; case SCTP_PARAM_HMAC_ALGO: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fallthrough; hmacs = (struct sctp_hmac_algo_param *)param.p; @@ -2220,10 +2222,9 @@ fallthrough: } /* Verify the INIT packet before we process it. */ -int sctp_verify_init(struct net *net, const struct sctp_association *asoc, - sctp_cid_t cid, - sctp_init_chunk_t *peer_init, - struct sctp_chunk *chunk, +int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep, + const struct sctp_association *asoc, sctp_cid_t cid, + sctp_init_chunk_t *peer_init, struct sctp_chunk *chunk, struct sctp_chunk **errp) { union sctp_params param; @@ -2264,8 +2265,8 @@ int sctp_verify_init(struct net *net, const struct sctp_association *asoc, /* Verify all the variable length parameters */ sctp_walk_params(param, peer_init, init_hdr.params) { - - result = sctp_verify_param(net, asoc, param, cid, chunk, errp); + result = sctp_verify_param(net, ep, asoc, param, cid, + chunk, errp); switch (result) { case SCTP_IERROR_ABORT: case SCTP_IERROR_NOMEM: @@ -2497,6 +2498,7 @@ static int sctp_process_param(struct sctp_association *asoc, struct sctp_af *af; union sctp_addr_param *addr_param; struct sctp_transport *t; + struct sctp_endpoint *ep = asoc->ep; /* We maintain all INIT parameters in network byte order all the * time. This allows us to not worry about whether the parameters @@ -2636,7 +2638,7 @@ do_addr_param: goto fall_through; case SCTP_PARAM_RANDOM: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fall_through; /* Save peer's random parameter */ @@ -2649,7 +2651,7 @@ do_addr_param: break; case SCTP_PARAM_HMAC_ALGO: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fall_through; /* Save peer's HMAC list */ @@ -2665,7 +2667,7 @@ do_addr_param: break; case SCTP_PARAM_CHUNKS: - if (!net->sctp.auth_enable) + if (!ep->auth_enable) goto fall_through; asoc->peer.peer_chunks = kmemdup(param.p, diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index ae9fbeb..5170a1f 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -357,7 +357,7 @@ sctp_disposition_t sctp_sf_do_5_1B_init(struct net *net, /* Verify the INIT chunk before processing it. */ err_chunk = NULL; - if (!sctp_verify_init(net, asoc, chunk->chunk_hdr->type, + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, (sctp_init_chunk_t *)chunk->chunk_hdr, chunk, &err_chunk)) { /* This chunk contains fatal error. It is to be discarded. @@ -524,7 +524,7 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(struct net *net, /* Verify the INIT chunk before processing it. */ err_chunk = NULL; - if (!sctp_verify_init(net, asoc, chunk->chunk_hdr->type, + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, (sctp_init_chunk_t *)chunk->chunk_hdr, chunk, &err_chunk)) { @@ -1430,7 +1430,7 @@ static sctp_disposition_t sctp_sf_do_unexpected_init( /* Verify the INIT chunk before processing it. */ err_chunk = NULL; - if (!sctp_verify_init(net, asoc, chunk->chunk_hdr->type, + if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type, (sctp_init_chunk_t *)chunk->chunk_hdr, chunk, &err_chunk)) { /* This chunk contains fatal error. It is to be discarded. diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 9a5d590..604a6ac 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3321,10 +3321,10 @@ static int sctp_setsockopt_auth_chunk(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authchunk val; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen != sizeof(struct sctp_authchunk)) @@ -3341,7 +3341,7 @@ static int sctp_setsockopt_auth_chunk(struct sock *sk, } /* add this chunk id to the endpoint */ - return sctp_auth_ep_add_chunkid(sctp_sk(sk)->ep, val.sauth_chunk); + return sctp_auth_ep_add_chunkid(ep, val.sauth_chunk); } /* @@ -3354,12 +3354,12 @@ static int sctp_setsockopt_hmac_ident(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_hmacalgo *hmacs; u32 idents; int err; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen < sizeof(struct sctp_hmacalgo)) @@ -3376,7 +3376,7 @@ static int sctp_setsockopt_hmac_ident(struct sock *sk, goto out; } - err = sctp_auth_ep_set_hmacs(sctp_sk(sk)->ep, hmacs); + err = sctp_auth_ep_set_hmacs(ep, hmacs); out: kfree(hmacs); return err; @@ -3392,12 +3392,12 @@ static int sctp_setsockopt_auth_key(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkey *authkey; struct sctp_association *asoc; int ret; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen <= sizeof(struct sctp_authkey)) @@ -3418,7 +3418,7 @@ static int sctp_setsockopt_auth_key(struct sock *sk, goto out; } - ret = sctp_auth_set_key(sctp_sk(sk)->ep, asoc, authkey); + ret = sctp_auth_set_key(ep, asoc, authkey); out: kzfree(authkey); return ret; @@ -3434,11 +3434,11 @@ static int sctp_setsockopt_active_key(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkeyid val; struct sctp_association *asoc; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen != sizeof(struct sctp_authkeyid)) @@ -3450,8 +3450,7 @@ static int sctp_setsockopt_active_key(struct sock *sk, if (!asoc && val.scact_assoc_id && sctp_style(sk, UDP)) return -EINVAL; - return sctp_auth_set_active_key(sctp_sk(sk)->ep, asoc, - val.scact_keynumber); + return sctp_auth_set_active_key(ep, asoc, val.scact_keynumber); } /* @@ -3463,11 +3462,11 @@ static int sctp_setsockopt_del_key(struct sock *sk, char __user *optval, unsigned int optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkeyid val; struct sctp_association *asoc; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (optlen != sizeof(struct sctp_authkeyid)) @@ -3479,8 +3478,7 @@ static int sctp_setsockopt_del_key(struct sock *sk, if (!asoc && val.scact_assoc_id && sctp_style(sk, UDP)) return -EINVAL; - return sctp_auth_del_key_id(sctp_sk(sk)->ep, asoc, - val.scact_keynumber); + return sctp_auth_del_key_id(ep, asoc, val.scact_keynumber); } @@ -5387,16 +5385,16 @@ static int sctp_getsockopt_maxburst(struct sock *sk, int len, static int sctp_getsockopt_hmac_ident(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_hmacalgo __user *p = (void __user *)optval; struct sctp_hmac_algo_param *hmacs; __u16 data_len = 0; u32 num_idents; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; - hmacs = sctp_sk(sk)->ep->auth_hmacs_list; + hmacs = ep->auth_hmacs_list; data_len = ntohs(hmacs->param_hdr.length) - sizeof(sctp_paramhdr_t); if (len < sizeof(struct sctp_hmacalgo) + data_len) @@ -5417,11 +5415,11 @@ static int sctp_getsockopt_hmac_ident(struct sock *sk, int len, static int sctp_getsockopt_active_key(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authkeyid val; struct sctp_association *asoc; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (len < sizeof(struct sctp_authkeyid)) @@ -5436,7 +5434,7 @@ static int sctp_getsockopt_active_key(struct sock *sk, int len, if (asoc) val.scact_keynumber = asoc->active_key_id; else - val.scact_keynumber = sctp_sk(sk)->ep->active_key_id; + val.scact_keynumber = ep->active_key_id; len = sizeof(struct sctp_authkeyid); if (put_user(len, optlen)) @@ -5450,7 +5448,7 @@ static int sctp_getsockopt_active_key(struct sock *sk, int len, static int sctp_getsockopt_peer_auth_chunks(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authchunks __user *p = (void __user *)optval; struct sctp_authchunks val; struct sctp_association *asoc; @@ -5458,7 +5456,7 @@ static int sctp_getsockopt_peer_auth_chunks(struct sock *sk, int len, u32 num_chunks = 0; char __user *to; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (len < sizeof(struct sctp_authchunks)) @@ -5495,7 +5493,7 @@ num: static int sctp_getsockopt_local_auth_chunks(struct sock *sk, int len, char __user *optval, int __user *optlen) { - struct net *net = sock_net(sk); + struct sctp_endpoint *ep = sctp_sk(sk)->ep; struct sctp_authchunks __user *p = (void __user *)optval; struct sctp_authchunks val; struct sctp_association *asoc; @@ -5503,7 +5501,7 @@ static int sctp_getsockopt_local_auth_chunks(struct sock *sk, int len, u32 num_chunks = 0; char __user *to; - if (!net->sctp.auth_enable) + if (!ep->auth_enable) return -EACCES; if (len < sizeof(struct sctp_authchunks)) @@ -5520,7 +5518,7 @@ static int sctp_getsockopt_local_auth_chunks(struct sock *sk, int len, if (asoc) ch = (struct sctp_chunks_param *)asoc->c.auth_chunks; else - ch = sctp_sk(sk)->ep->auth_chunk_list; + ch = ep->auth_chunk_list; if (!ch) goto num; diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 35c8923..c82fdc1 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -64,6 +64,9 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write, static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +static int proc_sctp_do_auth(struct ctl_table *ctl, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); static struct ctl_table sctp_table[] = { { @@ -266,7 +269,7 @@ static struct ctl_table sctp_net_table[] = { .data = &init_net.sctp.auth_enable, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_sctp_do_auth, }, { .procname = "addr_scope_policy", @@ -400,6 +403,37 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write, return ret; } +static int proc_sctp_do_auth(struct ctl_table *ctl, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + struct net *net = current->nsproxy->net_ns; + struct ctl_table tbl; + int new_value, ret; + + memset(&tbl, 0, sizeof(struct ctl_table)); + tbl.maxlen = sizeof(unsigned int); + + if (write) + tbl.data = &new_value; + else + tbl.data = &net->sctp.auth_enable; + + ret = proc_dointvec(&tbl, write, buffer, lenp, ppos); + + if (write) { + struct sock *sk = net->sctp.ctl_sock; + + net->sctp.auth_enable = new_value; + /* Update the value in the control socket */ + lock_sock(sk); + sctp_sk(sk)->ep->auth_enable = new_value; + release_sock(sk); + } + + return ret; +} + int sctp_sysctl_net_register(struct net *net) { struct ctl_table *table = sctp_net_table; -- 1.7.11.7 From f94aaacc4fab84539ff8158225195cbaf4ce5f59 Mon Sep 17 00:00:00 2001 From: Andrew Lutomirski <luto@xxxxxxxxxxxxxx> Date: Wed, 16 Apr 2014 21:41:34 -0700 Subject: [PATCH 23/76] net: Fix ns_capable check in sock_diag_put_filterinfo [ Upstream commit 78541c1dc60b65ecfce5a6a096fc260219d6784e ] The caller needs capabilities on the namespace being queried, not on their own namespace. This is a security bug, although it likely has only a minor impact. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> Acked-by: Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/sock_diag.h | 2 +- net/core/sock_diag.c | 4 ++-- net/packet/diag.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h index 54f91d3..302ab80 100644 --- a/include/linux/sock_diag.h +++ b/include/linux/sock_diag.h @@ -23,7 +23,7 @@ int sock_diag_check_cookie(void *sk, __u32 *cookie); void sock_diag_save_cookie(void *sk, __u32 *cookie); int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attr); -int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk, +int sock_diag_put_filterinfo(struct sock *sk, struct sk_buff *skb, int attrtype); #endif diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c index a0e9cf6..6a7fae2 100644 --- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -49,7 +49,7 @@ int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attrtype) } EXPORT_SYMBOL_GPL(sock_diag_put_meminfo); -int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk, +int sock_diag_put_filterinfo(struct sock *sk, struct sk_buff *skb, int attrtype) { struct nlattr *attr; @@ -57,7 +57,7 @@ int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk, unsigned int len; int err = 0; - if (!ns_capable(user_ns, CAP_NET_ADMIN)) { + if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { nla_reserve(skb, attrtype, 0); return 0; } diff --git a/net/packet/diag.c b/net/packet/diag.c index 533ce4f..435ff99 100644 --- a/net/packet/diag.c +++ b/net/packet/diag.c @@ -172,7 +172,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, goto out_nlmsg_trim; if ((req->pdiag_show & PACKET_SHOW_FILTER) && - sock_diag_put_filterinfo(user_ns, sk, skb, PACKET_DIAG_FILTER)) + sock_diag_put_filterinfo(sk, skb, PACKET_DIAG_FILTER)) goto out_nlmsg_trim; return nlmsg_end(skb, nlh); -- 1.7.11.7 From 4a093ebeeb9761a8b0f87d071a276f4ccc8b821c Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:35 +1000 Subject: [PATCH 24/76] rtnetlink: Warn when interface's information won't fit in our packet [ Upstream commit 973462bbde79bb827824c73b59027a0aed5c9ca6 ] Without IFLA_EXT_MASK specified, the information reported for a single interface in response to RTM_GETLINK is expected to fit within a netlink packet of NLMSG_GOODSIZE. If it doesn't, however, things will go badly wrong, When listing all interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first message in a packet as the end of the listing and omit information for that interface and all subsequent ones. This can cause getifaddrs(3) to enter an infinite loop. This patch won't fix the problem, but it will WARN_ON() making it easier to track down what's going wrong. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jpirko@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 120eecc..68867bf 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1130,6 +1130,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) struct hlist_head *head; struct nlattr *tb[IFLA_MAX+1]; u32 ext_filter_mask = 0; + int err; s_h = cb->args[0]; s_idx = cb->args[1]; @@ -1150,11 +1151,17 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; - if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, 0, - NLM_F_MULTI, - ext_filter_mask) <= 0) + err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, 0, + NLM_F_MULTI, + ext_filter_mask); + /* If we ran out of room on the first message, + * we're in trouble + */ + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); + + if (err <= 0) goto out; nl_dump_check_consistent(cb, nlmsg_hdr(skb)); -- 1.7.11.7 From 4983019758477032de5a036be6740404ab67e4bd Mon Sep 17 00:00:00 2001 From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 10:22:36 +1000 Subject: [PATCH 25/76] rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set [ Upstream commit c53864fd60227de025cb79e05493b13f69843971 ] Since 115c9b81928360d769a76c632bae62d15206a94a (rtnetlink: Fix problem with buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST attribute if they were solicited by a GETLINK message containing an IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag. That was done because some user programs broke when they received more data than expected - because IFLA_VFINFO_LIST contains information for each VF it can become large if there are many VFs. However, the IFLA_VF_PORTS attribute, supplied for devices which implement ndo_get_vf_port (currently the 'enic' driver only), has the same problem. It supplies per-VF information and can therefore become large, but it is not currently conditional on the IFLA_EXT_MASK value. Worse, it interacts badly with the existing EXT_MASK handling. When IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet. netlink_dump() will misinterpret this as having finished the listing and omit data for this interface and all subsequent ones. That can cause getifaddrs(3) to enter an infinite loop. This patch addresses the problem by only supplying IFLA_VF_PORTS when IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set. Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> Reviewed-by: Jiri Pirko <jiri@xxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/rtnetlink.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 68867bf..3797d96 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -774,7 +774,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, return 0; } -static size_t rtnl_port_size(const struct net_device *dev) +static size_t rtnl_port_size(const struct net_device *dev, + u32 ext_filter_mask) { size_t port_size = nla_total_size(4) /* PORT_VF */ + nla_total_size(PORT_PROFILE_MAX) /* PORT_PROFILE */ @@ -790,7 +791,8 @@ static size_t rtnl_port_size(const struct net_device *dev) size_t port_self_size = nla_total_size(sizeof(struct nlattr)) + port_size; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; if (dev_num_vf(dev->dev.parent)) return port_self_size + vf_ports_size + @@ -825,7 +827,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(ext_filter_mask & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */ + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ - + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */ + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */ @@ -887,11 +889,13 @@ static int rtnl_port_self_fill(struct sk_buff *skb, struct net_device *dev) return 0; } -static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev) +static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev, + u32 ext_filter_mask) { int err; - if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent) + if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent || + !(ext_filter_mask & RTEXT_FILTER_VF)) return 0; err = rtnl_port_self_fill(skb, dev); @@ -1076,7 +1080,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, nla_nest_end(skb, vfinfo); } - if (rtnl_port_fill(skb, dev)) + if (rtnl_port_fill(skb, dev, ext_filter_mask)) goto nla_put_failure; if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) { -- 1.7.11.7 From 8ab0b3efaaa45b4b7caa328b7cca3d1da4eafce2 Mon Sep 17 00:00:00 2001 From: Kumar Sundararajan <kumar@xxxxxx> Date: Thu, 24 Apr 2014 09:48:53 -0400 Subject: [PATCH 26/76] ipv6: fib: fix fib dump restart [ Upstream commit 1c2658545816088477e91860c3a645053719cb54 ] When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@xxxxxx> Signed-off-by: Chris Mason <clm@xxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_fib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 075602f..1e55f5e 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1418,7 +1418,7 @@ static int fib6_walk_continue(struct fib6_walker_t *w) if (w->skip) { w->skip--; - continue; + goto skip; } err = w->func(w); @@ -1428,6 +1428,7 @@ static int fib6_walk_continue(struct fib6_walker_t *w) w->count++; continue; } +skip: w->state = FWS_U; case FWS_U: if (fn == w->root) -- 1.7.11.7 From 033f35aed425cc6ff2364842c89ee007b5597579 Mon Sep 17 00:00:00 2001 From: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Date: Fri, 25 Apr 2014 17:01:18 +0900 Subject: [PATCH 27/76] bridge: Handle IFLA_ADDRESS correctly when creating bridge device [ Upstream commit 30313a3d5794472c3548d7288e306a5492030370 ] When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@xxxxxxx> Signed-off-by: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/bridge/br_netlink.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index e74b6d53..e8844d9 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -445,6 +445,20 @@ static int br_validate(struct nlattr *tb[], struct nlattr *data[]) return 0; } +static int br_dev_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct net_bridge *br = netdev_priv(dev); + + if (tb[IFLA_ADDRESS]) { + spin_lock_bh(&br->lock); + br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS])); + spin_unlock_bh(&br->lock); + } + + return register_netdevice(dev); +} + static size_t br_get_link_af_size(const struct net_device *dev) { struct net_port_vlans *pv; @@ -473,6 +487,7 @@ struct rtnl_link_ops br_link_ops __read_mostly = { .priv_size = sizeof(struct net_bridge), .setup = br_dev_setup, .validate = br_validate, + .newlink = br_dev_newlink, .dellink = br_dev_delete, }; -- 1.7.11.7 From 3054c5cb3076f416f58845a6e57bcd9ec2cf2922 Mon Sep 17 00:00:00 2001 From: Xufeng Zhang <xufeng.zhang@xxxxxxxxxxxxx> Date: Fri, 25 Apr 2014 16:55:41 +0800 Subject: [PATCH 28/76] sctp: reset flowi4_oif parameter on route lookup [ Upstream commit 85350871317a5adb35519d9dc6fc9e80809d42ad ] commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) introduces another regression which is very similar to the problem of commit e6b45241c (ipv4: reset flowi parameters on route connect) wants to fix: Before we call ip_route_output_key() in sctp_v4_get_dst() to get a dst that matches a bind address as the source address, we have already called this function previously and the flowi parameters have been initialized including flowi4_oif, so when we call this function again, the process in __ip_route_output_key() will be different because of the setting of flowi4_oif, and we'll get a networking device which corresponds to the inputted flowi4_oif as the output device, this is wrong because we'll never hit this place if the previously returned source address of dst match one of the bound addresses. To reproduce this problem, a vlan setting is enough: # ifconfig eth0 up # route del default # vconfig add eth0 2 # vconfig add eth0 3 # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0 # route add default gw 10.0.1.254 dev eth0.2 # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0 # ip rule add from 10.0.0.14 table 4 # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3 # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I You'll detect that all the flow are routed to eth0.2(10.0.1.254). Signed-off-by: Xufeng Zhang <xufeng.zhang@xxxxxxxxxxxxx> Signed-off-by: Julian Anastasov <ja@xxxxxx> Acked-by: Vlad Yasevich <vyasevich@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/protocol.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 4e1d0fc..a62a215 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -491,8 +491,13 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr, continue; if ((laddr->state == SCTP_ADDR_SRC) && (AF_INET == laddr->a.sa.sa_family)) { - fl4->saddr = laddr->a.v4.sin_addr.s_addr; fl4->fl4_sport = laddr->a.v4.sin_port; + flowi4_update_output(fl4, + asoc->base.sk->sk_bound_dev_if, + RT_CONN_FLAGS(asoc->base.sk), + daddr->v4.sin_addr.s_addr, + laddr->a.v4.sin_addr.s_addr); + rt = ip_route_output_key(sock_net(sk), fl4); if (!IS_ERR(rt)) { dst = &rt->dst; -- 1.7.11.7 From 66f09a46a862e9507300a226a04b73916b644e4e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:28 +0200 Subject: [PATCH 29/76] net: qmi_wwan: add Sierra Wireless EM7355 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b85f5deaf052340021d025e120a9858f084a1d79 ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 313cb6c..0d47547 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -724,6 +724,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x68a2, 8)}, /* Sierra Wireless MC7710 in QMI mode */ {QMI_FIXED_INTF(0x1199, 0x68a2, 19)}, /* Sierra Wireless MC7710 in QMI mode */ {QMI_FIXED_INTF(0x1199, 0x901c, 8)}, /* Sierra Wireless EM7700 */ + {QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ -- 1.7.11.7 From 06dcd37dbbafc6f1b79a1a71ea99bb69468147fc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:29 +0200 Subject: [PATCH 30/76] net: qmi_wwan: add Sierra Wireless MC73xx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 1c138607a7be64074d7fba68d0d533ec38f9d17b ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 0d47547..6289f0b 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -723,6 +723,9 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x114f, 0x68a2, 8)}, /* Sierra Wireless MC7750 */ {QMI_FIXED_INTF(0x1199, 0x68a2, 8)}, /* Sierra Wireless MC7710 in QMI mode */ {QMI_FIXED_INTF(0x1199, 0x68a2, 19)}, /* Sierra Wireless MC7710 in QMI mode */ + {QMI_FIXED_INTF(0x1199, 0x68c0, 8)}, /* Sierra Wireless MC73xx */ + {QMI_FIXED_INTF(0x1199, 0x68c0, 10)}, /* Sierra Wireless MC73xx */ + {QMI_FIXED_INTF(0x1199, 0x68c0, 11)}, /* Sierra Wireless MC73xx */ {QMI_FIXED_INTF(0x1199, 0x901c, 8)}, /* Sierra Wireless EM7700 */ {QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ -- 1.7.11.7 From d090fc14edb881011c28e8889e731edad521a929 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:30 +0200 Subject: [PATCH 31/76] net: qmi_wwan: add Sierra Wireless MC7305/MC7355 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9214224e43e4264b02686ea8b455f310935607b5 ] Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 6289f0b..a25a5db 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -728,6 +728,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x68c0, 11)}, /* Sierra Wireless MC73xx */ {QMI_FIXED_INTF(0x1199, 0x901c, 8)}, /* Sierra Wireless EM7700 */ {QMI_FIXED_INTF(0x1199, 0x901f, 8)}, /* Sierra Wireless EM7355 */ + {QMI_FIXED_INTF(0x1199, 0x9041, 8)}, /* Sierra Wireless MC7305/MC7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ -- 1.7.11.7 From e2ab0c042276064e3c7a40beac8f95e1fed8d486 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:31 +0200 Subject: [PATCH 32/76] net: qmi_wwan: add Olivetti Olicard 500 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit efc0b25c3add97717ece57bf5319792ca98f348e ] Device interface layout: 0: ff/ff/ff - serial 1: ff/ff/ff - serial AT+PPP 2: 08/06/50 - storage 3: ff/ff/ff - serial 4: ff/ff/ff - QMI/wwan Reported-by: Julio Araujo <julio.araujo@xxxxxxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index a25a5db..e1ea405 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -736,6 +736,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */ {QMI_FIXED_INTF(0x0b3c, 0xc005, 6)}, /* Olivetti Olicard 200 */ + {QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)}, /* Olivetti Olicard 500 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)}, /* Cinterion PHxx,PXxx */ -- 1.7.11.7 From 835c17636082ac5719978de1a0f9c5db03980e73 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:32 +0200 Subject: [PATCH 33/76] net: qmi_wwan: add Alcatel L800MA MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 75573660c47a0db7cc931dcf154945610e02130a ] Device interface layout: 0: ff/ff/ff - serial 1: ff/00/00 - serial AT+PPP 2: ff/ff/ff - QMI/wwan 3: 08/06/50 - storage Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index e1ea405..04e7326 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -731,6 +731,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x9041, 8)}, /* Sierra Wireless MC7305/MC7355 */ {QMI_FIXED_INTF(0x1199, 0x9051, 8)}, /* Netgear AirCard 340U */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ + {QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ -- 1.7.11.7 From 3deb263206ea0cf4b755e99c7b00396ace1b95ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:33 +0200 Subject: [PATCH 34/76] net: qmi_wwan: add a number of CMOTech devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 41be7d90993b1502d445bfc59e58348c258ce66a ] A number of older CMOTech modems are based on Qualcomm chips and exporting a QMI/wwan function. Reported-by: Lars Melin <larsm17@xxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 04e7326..75b78eb 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -662,6 +662,22 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x05c6, 0x920d, 5)}, {QMI_FIXED_INTF(0x12d1, 0x140c, 1)}, /* Huawei E173 */ {QMI_FIXED_INTF(0x12d1, 0x14ac, 1)}, /* Huawei E1820 */ + {QMI_FIXED_INTF(0x16d8, 0x6003, 0)}, /* CMOTech 6003 */ + {QMI_FIXED_INTF(0x16d8, 0x6007, 0)}, /* CMOTech CHE-628S */ + {QMI_FIXED_INTF(0x16d8, 0x6008, 0)}, /* CMOTech CMU-301 */ + {QMI_FIXED_INTF(0x16d8, 0x6280, 0)}, /* CMOTech CHU-628 */ + {QMI_FIXED_INTF(0x16d8, 0x7001, 0)}, /* CMOTech CHU-720S */ + {QMI_FIXED_INTF(0x16d8, 0x7002, 0)}, /* CMOTech 7002 */ + {QMI_FIXED_INTF(0x16d8, 0x7003, 4)}, /* CMOTech CHU-629K */ + {QMI_FIXED_INTF(0x16d8, 0x7004, 3)}, /* CMOTech 7004 */ + {QMI_FIXED_INTF(0x16d8, 0x7006, 5)}, /* CMOTech CGU-629 */ + {QMI_FIXED_INTF(0x16d8, 0x700a, 4)}, /* CMOTech CHU-629S */ + {QMI_FIXED_INTF(0x16d8, 0x7211, 0)}, /* CMOTech CHU-720I */ + {QMI_FIXED_INTF(0x16d8, 0x7212, 0)}, /* CMOTech 7212 */ + {QMI_FIXED_INTF(0x16d8, 0x7213, 0)}, /* CMOTech 7213 */ + {QMI_FIXED_INTF(0x16d8, 0x7251, 1)}, /* CMOTech 7251 */ + {QMI_FIXED_INTF(0x16d8, 0x7252, 1)}, /* CMOTech 7252 */ + {QMI_FIXED_INTF(0x16d8, 0x7253, 1)}, /* CMOTech 7253 */ {QMI_FIXED_INTF(0x19d2, 0x0002, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0012, 1)}, {QMI_FIXED_INTF(0x19d2, 0x0017, 3)}, -- 1.7.11.7 From a055b167a8fc39f2837bd807796a02d5d327c37d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 25 Apr 2014 19:00:34 +0200 Subject: [PATCH 35/76] net: qmi_wwan: add a number of Dell devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 6f10c5d1b1aeddb63d33070abb8bc5a177beeb1f ] Dan writes: "The Dell drivers use the same configuration for PIDs: 81A2: Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card 81A3: Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card 81A4: Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card 81A8: Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card 81A9: Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card These devices are all clearly Sierra devices, but are also definitely Gobi-based. The A8 might be the MC7700/7710 and A9 is likely a MC7750. >From DellGobi5kSetup.exe from the Dell drivers: usbif0: serial/firmware loader? usbif2: nmea usbif3: modem/ppp usbif8: net/QMI" Reported-by: AceLan Kao <acelan.kao@xxxxxxxxxxxxx> Reported-by: Dan Williams <dcbw@xxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/qmi_wwan.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 75b78eb..48c4902 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -756,6 +756,11 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)}, /* Olivetti Olicard 500 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)}, /* Cinterion PLxx */ {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)}, /* Cinterion PHxx,PXxx */ + {QMI_FIXED_INTF(0x413c, 0x81a2, 8)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a3, 8)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a4, 8)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a8, 8)}, /* Dell Wireless 5808 Gobi(TM) 4G LTE Mobile Broadband Card */ + {QMI_FIXED_INTF(0x413c, 0x81a9, 8)}, /* Dell Wireless 5808e Gobi(TM) 4G LTE Mobile Broadband Card */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ -- 1.7.11.7 From 37799498b141c2086b3398353869e84a47f6c401 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp <socketcan@xxxxxxxxxxxx> Date: Sat, 26 Apr 2014 21:18:32 +0200 Subject: [PATCH 36/76] slip: fix spinlock variant [ Upstream commit ddcde142bed44490e338ed1124cb149976d355bb ] With commit cc9fa74e2a ("slip/slcan: added locking in wakeup function") a formerly missing locking was added to slip.c and slcan.c by Andre Naujoks. Alexander Stein contributed the fix 367525c8c2 ("can: slcan: Fix spinlock variant") as the kernel lock debugging advised to use spin_lock_bh() instead of just using spin_lock(). This fix has to be applied to the same code section in slip.c for the same reason too. Signed-off-by: Oliver Hartkopp <socketcan@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/slip/slip.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c index cc70ecf..ad4a94e 100644 --- a/drivers/net/slip/slip.c +++ b/drivers/net/slip/slip.c @@ -429,13 +429,13 @@ static void slip_write_wakeup(struct tty_struct *tty) if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev)) return; - spin_lock(&sl->lock); + spin_lock_bh(&sl->lock); if (sl->xleft <= 0) { /* Now serial buffer is almost free & we can start * transmission of another packet */ sl->dev->stats.tx_packets++; clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); - spin_unlock(&sl->lock); + spin_unlock_bh(&sl->lock); sl_unlock(sl); return; } @@ -443,7 +443,7 @@ static void slip_write_wakeup(struct tty_struct *tty) actual = tty->ops->write(tty, sl->xhead, sl->xleft); sl->xleft -= actual; sl->xhead += actual; - spin_unlock(&sl->lock); + spin_unlock_bh(&sl->lock); } static void sl_tx_timeout(struct net_device *dev) -- 1.7.11.7 From 2db0428d9d627e6a1aa0a2efa7ab2d63e3bad254 Mon Sep 17 00:00:00 2001 From: Karl Heiss <kheiss@xxxxxxxxx> Date: Fri, 25 Apr 2014 14:26:30 -0400 Subject: [PATCH 37/76] net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. [ Upstream commit 8c2eab9097dba50bcd73ed4632baccc3f34857f9 ] Don't transition to the PF state on every strike after 'Path.Max.Retrans'. Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6: Additional (PMR - PFMR) consecutive timeouts on a PF destination confirm the path failure, upon which the destination transitions to the Inactive state. As described in [RFC4960], the sender (i) SHOULD notify ULP about this state transition, and (ii) transmit heartbeats to the Inactive destination at a lower frequency as described in Section 8.3 of [RFC4960]. This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike. Signed-off-by: Karl Heiss <kheiss@xxxxxxxxx> Acked-by: Vlad Yasevich <vyasevich@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sctp/sm_sideeffect.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 5d6883f..fef2acd 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -496,11 +496,10 @@ static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands, /* If the transport error count is greater than the pf_retrans * threshold, and less than pathmaxrtx, and if the current state - * is not SCTP_UNCONFIRMED, then mark this transport as Partially - * Failed, see SCTP Quick Failover Draft, section 5.1 + * is SCTP_ACTIVE, then mark this transport as Partially Failed, + * see SCTP Quick Failover Draft, section 5.1 */ - if ((transport->state != SCTP_PF) && - (transport->state != SCTP_UNCONFIRMED) && + if ((transport->state == SCTP_ACTIVE) && (asoc->pf_retrans < transport->pathmaxrxt) && (transport->error_count > asoc->pf_retrans)) { -- 1.7.11.7 From 0ade2eb60f102fc8887b9f64d4fdf742b3eac89d Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Tue, 29 Apr 2014 10:09:50 -0400 Subject: [PATCH 38/76] mactap: Fix checksum errors for non-gso packets in bridge mode [ Upstream commit cbdb04279ccaefcc702c8757757eea8ed76e50cf ] The following is a problematic configuration: VM1: virtio-net device connected to macvtap0@eth0 VM2: e1000 device connect to macvtap1@eth0 The problem is is that virtio-net supports checksum offloading and thus sends the packets to the host with CHECKSUM_PARTIAL set. On the other hand, e1000 does not support any acceleration. For small TCP packets (and this includes the 3-way handshake), e1000 ends up receiving packets that only have a partial checksum set. This causes TCP to fail checksum validation and to drop packets. As a result tcp connections can not be established. Commit 3e4f8b787370978733ca6cae452720a4f0c296b8 macvtap: Perform GSO on forwarding path. fixes this issue for large packets wthat will end up undergoing GSO. This commit adds a check for the non-GSO case and attempts to compute the checksum for partially checksummed packets in the non-GSO case. CC: Daniel Lezcano <daniel.lezcano@xxxxxxx> CC: Patrick McHardy <kaber@xxxxxxxxx> CC: Andrian Nord <nightnord@xxxxxxxxx> CC: Eric Dumazet <eric.dumazet@xxxxxxxxx> CC: Michael S. Tsirkin <mst@xxxxxxxxxx> CC: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvtap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index ff111a8..3381c4f 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -322,6 +322,15 @@ static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb) segs = nskb; } } else { + /* If we receive a partial checksum and the tap side + * doesn't support checksum offload, compute the checksum. + * Note: it doesn't matter which checksum feature to + * check, we either support them all or none. + */ + if (skb->ip_summed == CHECKSUM_PARTIAL && + !(features & NETIF_F_ALL_CSUM) && + skb_checksum_help(skb)) + goto drop; skb_queue_tail(&q->sk.sk_receive_queue, skb); } -- 1.7.11.7 From 824f362dbfd7fa44c4960bfa6c0b1b73e70b7d01 Mon Sep 17 00:00:00 2001 From: Vlad Yasevich <vyasevic@xxxxxxxxxx> Date: Tue, 29 Apr 2014 10:09:51 -0400 Subject: [PATCH 39/76] Revert "macvlan : fix checksums error when we are in bridge mode" [ Upstream commit f114890cdf84d753f6b41cd0cc44ba51d16313da ] This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019. The commit above doesn't appear to be necessary any more as the checksums appear to be correctly computed/validated. Additionally the above commit breaks kvm configurations where one VM is using a device that support checksum offload (virtio) and the other VM does not. In this case, packets leaving virtio device will have CHECKSUM_PARTIAL set. The packets is forwarded to a macvtap that has offload features turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not update the checksum and thus a bad checksum is passed up to the guest. CC: Daniel Lezcano <daniel.lezcano@xxxxxxx> CC: Patrick McHardy <kaber@xxxxxxxxx> CC: Andrian Nord <nightnord@xxxxxxxxx> CC: Eric Dumazet <eric.dumazet@xxxxxxxxx> CC: Michael S. Tsirkin <mst@xxxxxxxxxx> CC: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: Vlad Yasevich <vyasevic@xxxxxxxxxx> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> Acked-by: Jason Wang <jasowang@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 9dde3ea..00c3727 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -263,11 +263,9 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) const struct macvlan_dev *vlan = netdev_priv(dev); const struct macvlan_port *port = vlan->port; const struct macvlan_dev *dest; - __u8 ip_summed = skb->ip_summed; if (vlan->mode == MACVLAN_MODE_BRIDGE) { const struct ethhdr *eth = (void *)skb->data; - skb->ip_summed = CHECKSUM_UNNECESSARY; /* send to other bridge ports directly */ if (is_multicast_ether_addr(eth->h_dest)) { @@ -285,7 +283,6 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev) } xmit_world: - skb->ip_summed = ip_summed; skb->dev = vlan->lowerdev; return dev_queue_xmit(skb); } -- 1.7.11.7 From b28b7caa3ed180cdb5f6baf9342c216056e84af5 Mon Sep 17 00:00:00 2001 From: Liu Yu <allanyuliu@xxxxxxxxxxx> Date: Wed, 30 Apr 2014 17:34:09 +0800 Subject: [PATCH 40/76] tcp_cubic: fix the range of delayed_ack [ Upstream commit 0cda345d1b2201dd15591b163e3c92bad5191745 ] commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@xxxxxxxxxx> Signed-off-by: Liu Yu <allanyuliu@xxxxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Acked-by: Neal Cardwell <ncardwell@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/tcp_cubic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 828e4c3..121a9a2 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -409,7 +409,7 @@ static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT; ratio += cnt; - ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT); + ca->delayed_ack = clamp(ratio, 1U, ACK_RATIO_LIMIT); } /* Some calls are for duplicates without timetamps */ -- 1.7.11.7 From 1d042336646047518d883e8c4bf98430bd9504fe Mon Sep 17 00:00:00 2001 From: John Fastabend <john.fastabend@xxxxxxxxx> Date: Thu, 1 May 2014 09:23:06 -0700 Subject: [PATCH 41/76] net: sched: lock imbalance in hhf qdisc [ Upstream commit f6a082fed1e6407c2f4437d0d963b1bcbe5f9f58 ] hhf_change() takes the sch_tree_lock and releases it but misses the error cases. Fix the missed case here. To reproduce try a command like this, # tc qdisc change dev p3p2 root hhf quantum 40960 non_hh_weight 300000 Signed-off-by: John Fastabend <john.r.fastabend@xxxxxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sched/sch_hhf.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c index 647680b..0399778 100644 --- a/net/sched/sch_hhf.c +++ b/net/sched/sch_hhf.c @@ -553,11 +553,6 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt) if (err < 0) return err; - sch_tree_lock(sch); - - if (tb[TCA_HHF_BACKLOG_LIMIT]) - sch->limit = nla_get_u32(tb[TCA_HHF_BACKLOG_LIMIT]); - if (tb[TCA_HHF_QUANTUM]) new_quantum = nla_get_u32(tb[TCA_HHF_QUANTUM]); @@ -567,6 +562,12 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt) non_hh_quantum = (u64)new_quantum * new_hhf_non_hh_weight; if (non_hh_quantum > INT_MAX) return -EINVAL; + + sch_tree_lock(sch); + + if (tb[TCA_HHF_BACKLOG_LIMIT]) + sch->limit = nla_get_u32(tb[TCA_HHF_BACKLOG_LIMIT]); + q->quantum = new_quantum; q->hhf_non_hh_weight = new_hhf_non_hh_weight; -- 1.7.11.7 From f055fc4ed48e3179cce4ef7d614adc75d0d4eba4 Mon Sep 17 00:00:00 2001 From: Andy King <acking@xxxxxxxxxx> Date: Thu, 1 May 2014 15:20:43 -0700 Subject: [PATCH 42/76] vsock: Make transport the proto owner [ Upstream commit 2c4a336e0a3e203fab6aa8d8f7bb70a0ad968a6b ] Right now the core vsock module is the owner of the proto family. This means there's nothing preventing the transport module from unloading if there are open sockets, which results in a panic. Fix that by allowing the transport to be the owner, which will refcount it properly. Includes version bump to 1.0.1.0-k Passes checkpatch this time, I swear... Acked-by: Dmitry Torokhov <dtor@xxxxxxxxxx> Signed-off-by: Andy King <acking@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/net/af_vsock.h | 6 +++++- net/vmw_vsock/af_vsock.c | 47 ++++++++++++++++++++++------------------------- 2 files changed, 27 insertions(+), 26 deletions(-) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 7d64d36..4282778 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -155,7 +155,11 @@ struct vsock_transport { /**** CORE ****/ -int vsock_core_init(const struct vsock_transport *t); +int __vsock_core_init(const struct vsock_transport *t, struct module *owner); +static inline int vsock_core_init(const struct vsock_transport *t) +{ + return __vsock_core_init(t, THIS_MODULE); +} void vsock_core_exit(void); /**** UTILS ****/ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 5adfd94..85d232b 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1925,9 +1925,23 @@ static struct miscdevice vsock_device = { .fops = &vsock_device_ops, }; -static int __vsock_core_init(void) +int __vsock_core_init(const struct vsock_transport *t, struct module *owner) { - int err; + int err = mutex_lock_interruptible(&vsock_register_mutex); + + if (err) + return err; + + if (transport) { + err = -EBUSY; + goto err_busy; + } + + /* Transport must be the owner of the protocol so that it can't + * unload while there are open sockets. + */ + vsock_proto.owner = owner; + transport = t; vsock_init_tables(); @@ -1951,36 +1965,19 @@ static int __vsock_core_init(void) goto err_unregister_proto; } + mutex_unlock(&vsock_register_mutex); return 0; err_unregister_proto: proto_unregister(&vsock_proto); err_misc_deregister: misc_deregister(&vsock_device); - return err; -} - -int vsock_core_init(const struct vsock_transport *t) -{ - int retval = mutex_lock_interruptible(&vsock_register_mutex); - if (retval) - return retval; - - if (transport) { - retval = -EBUSY; - goto out; - } - - transport = t; - retval = __vsock_core_init(); - if (retval) - transport = NULL; - -out: + transport = NULL; +err_busy: mutex_unlock(&vsock_register_mutex); - return retval; + return err; } -EXPORT_SYMBOL_GPL(vsock_core_init); +EXPORT_SYMBOL_GPL(__vsock_core_init); void vsock_core_exit(void) { @@ -2000,5 +1997,5 @@ EXPORT_SYMBOL_GPL(vsock_core_exit); MODULE_AUTHOR("VMware, Inc."); MODULE_DESCRIPTION("VMware Virtual Socket Family"); -MODULE_VERSION("1.0.0.0-k"); +MODULE_VERSION("1.0.1.0-k"); MODULE_LICENSE("GPL v2"); -- 1.7.11.7 From a8b9fa36d8615e4080d4ccdae3ef2a2a2bdb54cc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 2 May 2014 23:27:00 +0200 Subject: [PATCH 43/76] net: cdc_ncm: fix buffer overflow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9becd707841207652449a8dfd90fe9c476d88546 ] Commit 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs") changed the padding logic for devices with the ZLP flag set. This meant that frames of any size will be sent without additional padding, except for the single byte added if the size is a multiple of the USB packet size. But if the unpadded size is identical to the maximum frame size, and the maximum size is a multiplum of the USB packet size, then this one-byte padding will overflow the buffer. Prevent padding if already at maximum frame size, letting usbnet transmit a ZLP instead in this case. Fixes: 4d619f625a60 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs") Reported by: Yu-an Shih <yshih@xxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/cdc_ncm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index d350d27..75d7d9d 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -768,7 +768,7 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign) skb_out->len > CDC_NCM_MIN_TX_PKT) memset(skb_put(skb_out, ctx->tx_max - skb_out->len), 0, ctx->tx_max - skb_out->len); - else if ((skb_out->len % dev->maxpacket) == 0) + else if (skb_out->len < ctx->tx_max && (skb_out->len % dev->maxpacket) == 0) *skb_put(skb_out, 1) = 0; /* force short packet */ /* set final frame length */ -- 1.7.11.7 From 5a3373184f1bea278b3b31337b10c2e207cf749b Mon Sep 17 00:00:00 2001 From: Eyal Perry <eyalpe@xxxxxxxxxxxx> Date: Sun, 4 May 2014 17:07:25 +0300 Subject: [PATCH 44/76] net/mlx4_core: Don't issue PCIe speed/width checks for VFs [ Upstream commit 83d3459a5928f18c9344683e31bc2a7c3c25562a ] Carrying out PCI speed/width checks through pcie_get_minimum_link() on VFs yield wrong results, so remove them. Fixes: b912b2f ('net/mlx4_core: Warn if device doesn't have enough PCI bandwidth') Signed-off-by: Eyal Perry <eyalpe@xxxxxxxxxxxx> Signed-off-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/ethernet/mellanox/mlx4/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index d413e60..95c316b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2399,7 +2399,8 @@ slave_start: * No return code for this call, just warn the user in case of PCI * express device capabilities are under-satisfied by the bus. */ - mlx4_check_pcie_caps(dev); + if (!mlx4_is_slave(dev)) + mlx4_check_pcie_caps(dev); /* In master functions, the communication channel must be initialized * after obtaining its address from fw */ -- 1.7.11.7 From c81f7c3c3f2604a7dbeb51e1ed3169738e86f447 Mon Sep 17 00:00:00 2001 From: Ying Cai <ycai@xxxxxxxxxx> Date: Sun, 4 May 2014 15:20:04 -0700 Subject: [PATCH 45/76] ip_tunnel: Set network header properly for IP_ECN_decapsulate() [ Upstream commit e96f2e7c430014eff52c93cabef1ad4f42ed0db1 ] In ip_tunnel_rcv(), set skb->network_header to inner IP header before IP_ECN_decapsulate(). Without the fix, IP_ECN_decapsulate() takes outer IP header as inner IP header, possibly causing error messages or packet drops. Note that this skb_reset_network_header() call was in this spot when the original feature for checking consistency of ECN bits through tunnels was added in eccc1bb8d4b4 ("tunnel: drop packet if ECN present with not-ECT"). It was only removed from this spot in 3d7b46cd20e3 ("ip_tunnel: push generic protocol handling to ip_tunnel module."). Fixes: 3d7b46cd20e3 ("ip_tunnel: push generic protocol handling to ip_tunnel module.") Reported-by: Neal Cardwell <ncardwell@xxxxxxxxxx> Signed-off-by: Ying Cai <ycai@xxxxxxxxxx> Acked-by: Neal Cardwell <ncardwell@xxxxxxxxxx> Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_tunnel.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index a82a22d..6ec4beb54 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -438,6 +438,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb, tunnel->i_seqno = ntohl(tpi->seq) + 1; } + skb_reset_network_header(skb); + err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { if (log_ecn_error) -- 1.7.11.7 From 06a164d98c481cc5d04a79c041fb0940357981e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timo=20Ter=C3=A4s?= <timo.teras@xxxxxx> Date: Fri, 16 May 2014 08:34:39 +0300 Subject: [PATCH 46/76] ipv4: ip_tunnels: disable cache for nbma gre tunnels MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 22fb22eaebf4d16987f3fd9c3484c436ee0badf2 ] The connected check fails to check for ip_gre nbma mode tunnels properly. ip_gre creates temporary tnl_params with daddr specified to pass-in the actual target on per-packet basis from neighbor layer. Detect these tunnels by inspecting the actual tunnel configuration. Minimal test case: ip route add 192.168.1.1/32 via 10.0.0.1 ip route add 192.168.1.2/32 via 10.0.0.2 ip tunnel add nbma0 mode gre key 1 tos c0 ip addr add 172.17.0.0/16 dev nbma0 ip link set nbma0 up ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0 ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0 ping 172.17.0.1 ping 172.17.0.2 The second ping should be going to 192.168.1.2 and head 10.0.0.2; but cached gre tunnel level route is used and it's actually going to 192.168.1.1 via 10.0.0.1. The lladdr's need to go to separate dst for the bug to trigger. Test case uses separate route entries, but this can also happen when the route entry is same: if there is a nexthop exception or the GRE tunnel is IPsec'ed in which case the dst points to xfrm bundle unique to the gre lladdr. Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels") Signed-off-by: Timo Teräs <timo.teras@xxxxxx> Cc: Tom Herbert <therbert@xxxxxxxxxx> Cc: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_tunnel.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 6ec4beb54..547bd39 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -536,9 +536,10 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, unsigned int max_headroom; /* The extra header space needed */ __be32 dst; int err; - bool connected = true; + bool connected; inner_iph = (const struct iphdr *)skb_inner_network_header(skb); + connected = (tunnel->parms.iph.daddr != 0); dst = tnl_params->daddr; if (dst == 0) { -- 1.7.11.7 From 0ddca0a2d0e14ab2e01f8856e2eee0bddc7800ca Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Sat, 3 May 2014 16:12:47 +0200 Subject: [PATCH 47/76] net: cdc_mbim: __vlan_find_dev_deep need rcu_read_lock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 4f4178f3bb1f470d7fb863ec531e08e20a0fd51c ] Fixes this warning introduced by commit 5b8f15f78e6f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations"): =============================== [ INFO: suspicious RCU usage. ] 3.15.0-rc3 #213 Tainted: G W O ------------------------------- net/8021q/vlan_core.c:69 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 no locks held by ksoftirqd/0/3. stack backtrace: CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W O 3.15.0-rc3 #213 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 0000000000000001 ffff880232533bf0 ffffffff813a5ee6 0000000000000006 ffff880232530090 ffff880232533c20 ffffffff81076b94 0000000000000081 0000000000000000 ffff8802085ac000 ffff88007fc8ea00 ffff880232533c50 Call Trace: [<ffffffff813a5ee6>] dump_stack+0x4e/0x68 [<ffffffff81076b94>] lockdep_rcu_suspicious+0xfa/0x103 [<ffffffff813978a6>] __vlan_find_dev_deep+0x54/0x94 [<ffffffffa04a1938>] cdc_mbim_rx_fixup+0x379/0x66a [cdc_mbim] [<ffffffff813ab76f>] ? _raw_spin_unlock_irqrestore+0x3a/0x49 [<ffffffff81079671>] ? trace_hardirqs_on_caller+0x192/0x1a1 [<ffffffffa059bd10>] usbnet_bh+0x59/0x287 [usbnet] [<ffffffff8104067d>] tasklet_action+0xbb/0xcd [<ffffffff81040057>] __do_softirq+0x14c/0x30d [<ffffffff81040237>] run_ksoftirqd+0x1f/0x50 [<ffffffff8105f13e>] smpboot_thread_fn+0x172/0x18e [<ffffffff8105efcc>] ? SyS_setgroups+0xdf/0xdf [<ffffffff810594b0>] kthread+0xb5/0xbd [<ffffffff813a84b1>] ? __wait_for_common+0x13b/0x170 [<ffffffff810593fb>] ? __kthread_parkme+0x5c/0x5c [<ffffffff813b147c>] ret_from_fork+0x7c/0xb0 [<ffffffff810593fb>] ? __kthread_parkme+0x5c/0x5c Fixes: 5b8f15f78e6f ("net: cdc_mbim: handle IPv6 Neigbor Solicitations") Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/cdc_mbim.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c index c9f3281..13f7705 100644 --- a/drivers/net/usb/cdc_mbim.c +++ b/drivers/net/usb/cdc_mbim.c @@ -204,17 +204,23 @@ static void do_neigh_solicit(struct usbnet *dev, u8 *buf, u16 tci) return; /* need to send the NA on the VLAN dev, if any */ - if (tci) + rcu_read_lock(); + if (tci) { netdev = __vlan_find_dev_deep(dev->net, htons(ETH_P_8021Q), tci); - else + if (!netdev) { + rcu_read_unlock(); + return; + } + } else { netdev = dev->net; - if (!netdev) - return; + } + dev_hold(netdev); + rcu_read_unlock(); in6_dev = in6_dev_get(netdev); if (!in6_dev) - return; + goto out; is_router = !!in6_dev->cnf.forwarding; in6_dev_put(in6_dev); @@ -224,6 +230,8 @@ static void do_neigh_solicit(struct usbnet *dev, u8 *buf, u16 tci) true /* solicited */, false /* override */, true /* inc_opt */); +out: + dev_put(netdev); } static bool is_neigh_solicit(u8 *buf, size_t len) -- 1.7.11.7 From 479358ec9d72cf240ed4a01208a1e5b740a201fc Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Sun, 4 May 2014 23:24:31 +0200 Subject: [PATCH 48/76] net: ipv4: ip_forward: fix inverted local_df test [ Upstream commit ca6c5d4ad216d5942ae544bbf02503041bd802aa ] local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_forward.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index f3869c1..1c6bd43 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -42,12 +42,12 @@ static bool ip_may_fragment(const struct sk_buff *skb) { return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) || - !skb->local_df; + skb->local_df; } static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) { - if (skb->len <= mtu || skb->local_df) + if (skb->len <= mtu) return false; if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) -- 1.7.11.7 From 082d505f6b2e79fb16d349f44231e70fc4c739e6 Mon Sep 17 00:00:00 2001 From: Florian Westphal <fw@xxxxxxxxx> Date: Mon, 5 May 2014 00:03:34 +0200 Subject: [PATCH 49/76] net: ipv6: send pkttoobig immediately if orig frag size > mtu [ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ] If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_output.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 3702d17..264a984 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -344,12 +344,16 @@ static unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst) static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) { - if (skb->len <= mtu || skb->local_df) + if (skb->len <= mtu) return false; + /* ipv6 conntrack defrag sets max_frag_size + local_df */ if (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu) return true; + if (skb->local_df) + return false; + if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) return false; -- 1.7.11.7 From ac57d37eaec9aa7bb8be546d84e69d3d130b3bb3 Mon Sep 17 00:00:00 2001 From: Sergey Popovich <popovich_sergei@xxxxxxx> Date: Tue, 6 May 2014 18:23:08 +0300 Subject: [PATCH 50/76] ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation [ Upstream commit aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd ] Increment fib_info_cnt in fib_create_info() right after successfuly alllocating fib_info structure, overwise fib_metrics allocation failure leads to fib_info_cnt incorrectly decremented in free_fib_info(), called on error path from fib_create_info(). Signed-off-by: Sergey Popovich <popovich_sergei@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/fib_semantics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index b53f0bf..9d43468 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -820,13 +820,13 @@ struct fib_info *fib_create_info(struct fib_config *cfg) fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL); if (fi == NULL) goto failure; + fib_info_cnt++; if (cfg->fc_mx) { fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL); if (!fi->fib_metrics) goto failure; } else fi->fib_metrics = (u32 *) dst_default_metrics; - fib_info_cnt++; fi->fib_net = hold_net(net); fi->fib_protocol = cfg->fc_protocol; -- 1.7.11.7 From 22321abd289362abd962fcdda0333811bf493ebf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@xxxxxxx> Date: Fri, 9 May 2014 14:45:00 +0200 Subject: [PATCH 51/76] net: cdc_mbim: handle unaccelerated VLAN tagged frames MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 6b5eeb7f874b689403e52a646e485d0191ab9507 ] This driver maps 802.1q VLANs to MBIM sessions. The mapping is based on a bogus assumption that all tagged frames will use the acceleration API because we enable NETIF_F_HW_VLAN_CTAG_TX. This fails for e.g. frames tagged in userspace using packet sockets. Such frames will erroneously be considered as untagged and silently dropped based on not being IP. Fix by falling back to looking into the ethernet header for a tag if no accelerated tag was found. Fixes: a82c7ce5bc5b ("net: cdc_ncm: map MBIM IPS SessionID to VLAN ID") Cc: Greg Suarez <gsuarez@xxxxxxxxxxxxxx> Signed-off-by: Bjørn Mork <bjorn@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/usb/cdc_mbim.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/drivers/net/usb/cdc_mbim.c b/drivers/net/usb/cdc_mbim.c index 13f7705..2e025dd 100644 --- a/drivers/net/usb/cdc_mbim.c +++ b/drivers/net/usb/cdc_mbim.c @@ -120,6 +120,16 @@ static void cdc_mbim_unbind(struct usbnet *dev, struct usb_interface *intf) cdc_ncm_unbind(dev, intf); } +/* verify that the ethernet protocol is IPv4 or IPv6 */ +static bool is_ip_proto(__be16 proto) +{ + switch (proto) { + case htons(ETH_P_IP): + case htons(ETH_P_IPV6): + return true; + } + return false; +} static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags) { @@ -128,6 +138,7 @@ static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb struct cdc_ncm_ctx *ctx = info->ctx; __le32 sign = cpu_to_le32(USB_CDC_MBIM_NDP16_IPS_SIGN); u16 tci = 0; + bool is_ip; u8 *c; if (!ctx) @@ -137,25 +148,32 @@ static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb if (skb->len <= ETH_HLEN) goto error; + /* Some applications using e.g. packet sockets will + * bypass the VLAN acceleration and create tagged + * ethernet frames directly. We primarily look for + * the accelerated out-of-band tag, but fall back if + * required + */ + skb_reset_mac_header(skb); + if (vlan_get_tag(skb, &tci) < 0 && skb->len > VLAN_ETH_HLEN && + __vlan_get_tag(skb, &tci) == 0) { + is_ip = is_ip_proto(vlan_eth_hdr(skb)->h_vlan_encapsulated_proto); + skb_pull(skb, VLAN_ETH_HLEN); + } else { + is_ip = is_ip_proto(eth_hdr(skb)->h_proto); + skb_pull(skb, ETH_HLEN); + } + /* mapping VLANs to MBIM sessions: * no tag => IPS session <0> * 1 - 255 => IPS session <vlanid> * 256 - 511 => DSS session <vlanid - 256> * 512 - 4095 => unsupported, drop */ - vlan_get_tag(skb, &tci); - switch (tci & 0x0f00) { case 0x0000: /* VLAN ID 0 - 255 */ - /* verify that datagram is IPv4 or IPv6 */ - skb_reset_mac_header(skb); - switch (eth_hdr(skb)->h_proto) { - case htons(ETH_P_IP): - case htons(ETH_P_IPV6): - break; - default: + if (!is_ip) goto error; - } c = (u8 *)&sign; c[3] = tci; break; @@ -169,7 +187,6 @@ static struct sk_buff *cdc_mbim_tx_fixup(struct usbnet *dev, struct sk_buff *skb "unsupported tci=0x%04x\n", tci); goto error; } - skb_pull(skb, ETH_HLEN); } spin_lock_bh(&ctx->mtx); -- 1.7.11.7 From a264086933d7b5a9a0d9f6277fc076e908ade374 Mon Sep 17 00:00:00 2001 From: Peter Christensen <pch@xxxxxxxxxxxx> Date: Thu, 8 May 2014 11:15:37 +0200 Subject: [PATCH 52/76] macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces. [ Upstream commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 ] Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Peter Christensen <pch@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/macvlan.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 00c3727..20bb669 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -458,8 +458,10 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change) struct macvlan_dev *vlan = netdev_priv(dev); struct net_device *lowerdev = vlan->lowerdev; - if (change & IFF_ALLMULTI) - dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); + if (dev->flags & IFF_UP) { + if (change & IFF_ALLMULTI) + dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1); + } } static void macvlan_set_mac_lists(struct net_device *dev) -- 1.7.11.7 From 69bcbfc3e228ff326dc12bea891a3ca55f5f486a Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov <nikolay@xxxxxxxxxx> Date: Fri, 9 May 2014 11:11:39 +0200 Subject: [PATCH 53/76] sfc: fix calling of free_irq with already free vector [ Upstream commit 1c3639005f48492e5f2d965779efd814e80f8b15 ] If the sfc driver is in legacy interrupt mode (either explicitly by using interrupt_mode module param or by falling back to it) it will hit a warning at kernel/irq/manage.c because it will try to free an irq which wasn't allocated by it in the first place because the MSI(X) irqs are zero and it'll try to free them unconditionally. So fix it by checking if we're in legacy mode and freeing the appropriate irqs. CC: Zenghui Shi <zshi@xxxxxxxxxx> CC: Ben Hutchings <ben@xxxxxxxxxxxxxxx> CC: <linux-net-drivers@xxxxxxxxxxxxxx> CC: Shradha Shah <sshah@xxxxxxxxxxxxxx> CC: David S. Miller <davem@xxxxxxxxxxxxx> Fixes: 1899c111a535 ("sfc: Fix IRQ cleanup in case of a probe failure") Reported-by: Zenghui Shi <zshi@xxxxxxxxxx> Signed-off-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxx> Acked-by: Shradha Shah <sshah@xxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/ethernet/sfc/nic.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c index 79226b1..cb3fb9d 100644 --- a/drivers/net/ethernet/sfc/nic.c +++ b/drivers/net/ethernet/sfc/nic.c @@ -156,13 +156,15 @@ void efx_nic_fini_interrupt(struct efx_nic *efx) efx->net_dev->rx_cpu_rmap = NULL; #endif - /* Disable MSI/MSI-X interrupts */ - efx_for_each_channel(channel, efx) - free_irq(channel->irq, &efx->msi_context[channel->channel]); - - /* Disable legacy interrupt */ - if (efx->legacy_irq) + if (EFX_INT_MODE_USE_MSI(efx)) { + /* Disable MSI/MSI-X interrupts */ + efx_for_each_channel(channel, efx) + free_irq(channel->irq, + &efx->msi_context[channel->channel]); + } else { + /* Disable legacy interrupt */ free_irq(efx->legacy_irq, efx); + } } /* Register dump */ -- 1.7.11.7 From 7f8b114eb600e80506b59b8fb0849d7b6e65ed53 Mon Sep 17 00:00:00 2001 From: Susant Sahani <susant@xxxxxxxxxx> Date: Sat, 10 May 2014 00:11:32 +0530 Subject: [PATCH 54/76] ip6_tunnel: fix potential NULL pointer dereference [ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ] The function ip6_tnl_validate assumes that the rtnl attribute IFLA_IPTUN_PROTO always be filled . If this attribute is not filled by the userspace application kernel get crashed with NULL pointer dereference. This patch fixes the potential kernel crash when IFLA_IPTUN_PROTO is missing . Signed-off-by: Susant Sahani <susant@xxxxxxxxxx> Acked-by: Thomas Graf <tgraf@xxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_tunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 5db8d31..0e51f68 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1564,7 +1564,7 @@ static int ip6_tnl_validate(struct nlattr *tb[], struct nlattr *data[]) { u8 proto; - if (!data) + if (!data || !data[IFLA_IPTUN_PROTO]) return 0; proto = nla_get_u8(data[IFLA_IPTUN_PROTO]); -- 1.7.11.7 From 71870cd91c25ec4f34dee6da320fde7ae6b3f7ee Mon Sep 17 00:00:00 2001 From: Duan Jiong <duanj.fnst@xxxxxxxxxxxxxx> Date: Fri, 9 May 2014 13:16:48 +0800 Subject: [PATCH 55/76] neigh: set nud_state to NUD_INCOMPLETE when probing router reachability [ Upstream commit 2176d5d41891753774f648b67470398a5acab584 ] Since commit 7e98056964("ipv6: router reachability probing"), a router falls into NUD_FAILED will be probed. Now if function rt6_select() selects a router which neighbour state is NUD_FAILED, and at the same time function rt6_probe() changes the neighbour state to NUD_PROBE, then function dst_neigh_output() can directly send packets, but actually the neighbour still is unreachable. If we set nud_state to NUD_INCOMPLETE instead NUD_PROBE, packets will not be sent out until the neihbour is reachable. In addition, because the route should be probes with a single NS, so we must set neigh->probes to neigh_max_probes(), then the neigh timer timeout and function neigh_timer_handler() will not send other NS Messages. Signed-off-by: Duan Jiong <duanj.fnst@xxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/neighbour.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index e161290..7d95f69 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1247,8 +1247,8 @@ void __neigh_set_probe_once(struct neighbour *neigh) neigh->updated = jiffies; if (!(neigh->nud_state & NUD_FAILED)) return; - neigh->nud_state = NUD_PROBE; - atomic_set(&neigh->probes, NEIGH_VAR(neigh->parms, UCAST_PROBES)); + neigh->nud_state = NUD_INCOMPLETE; + atomic_set(&neigh->probes, neigh_max_probes(neigh)); neigh_add_timer(neigh, jiffies + NEIGH_VAR(neigh->parms, RETRANS_TIME)); } -- 1.7.11.7 From 3e5c42bcf76f962ddc55afc233c8f48c868dc7d5 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich <simon@xxxxxxxxxxxxx> Date: Wed, 26 Mar 2014 15:46:21 +0100 Subject: [PATCH 56/76] batman-adv: fix neigh_ifinfo imbalance [ Upstream commit c1e517fbbcdb13f50662af4edc11c3251fe44f86 ] The neigh_ifinfo object must be freed if it has been used in batadv_iv_ogm_process_per_outif(). This is a regression introduced by 89652331c00f43574515059ecbf262d26d885717 ("batman-adv: split tq information in neigh_node struct") Reported-by: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Signed-off-by: Simon Wunderlich <simon@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxxx> --- net/batman-adv/bat_iv_ogm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 8323bce..d074d06 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -1545,6 +1545,8 @@ out_neigh: if ((orig_neigh_node) && (!is_single_hop_neigh)) batadv_orig_node_free_ref(orig_neigh_node); out: + if (router_ifinfo) + batadv_neigh_ifinfo_free_ref(router_ifinfo); if (router) batadv_neigh_node_free_ref(router); if (router_router) -- 1.7.11.7 From 42024bdb4aaf65f824a8857ef9304f6df6e85070 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich <simon@xxxxxxxxxxxxx> Date: Wed, 26 Mar 2014 15:46:22 +0100 Subject: [PATCH 57/76] batman-adv: fix neigh reference imbalance [ Upstream commit 000c8dff97311357535d64539e58990526e4de70 ] When an interface is removed from batman-adv, the orig_ifinfo of a orig_node may be removed without releasing the router first. This will prevent the reference for the neighbor pointed at by the orig_ifinfo->router to be released, and this leak may result in reference leaks for the interface used by this neighbor. Fix that. This is a regression introduced by 7351a4822d42827ba0110677c0cbad88a3d52585 ("batman-adv: split out router from orig_node"). Reported-by: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Signed-off-by: Simon Wunderlich <simon@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxxx> --- net/batman-adv/originator.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 8539416..25df60d 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -500,12 +500,17 @@ batadv_neigh_node_get(const struct batadv_orig_node *orig_node, static void batadv_orig_ifinfo_free_rcu(struct rcu_head *rcu) { struct batadv_orig_ifinfo *orig_ifinfo; + struct batadv_neigh_node *router; orig_ifinfo = container_of(rcu, struct batadv_orig_ifinfo, rcu); if (orig_ifinfo->if_outgoing != BATADV_IF_DEFAULT) batadv_hardif_free_ref_now(orig_ifinfo->if_outgoing); + /* this is the last reference to this object */ + router = rcu_dereference_protected(orig_ifinfo->router, true); + if (router) + batadv_neigh_node_free_ref_now(router); kfree(orig_ifinfo); } -- 1.7.11.7 From 267517437494f9644ddefbe1362d3bf9c66f8771 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich <simon@xxxxxxxxxxxxx> Date: Wed, 26 Mar 2014 15:46:23 +0100 Subject: [PATCH 58/76] batman-adv: always run purge_orig_neighbors [ Upstream commit 7b955a9fc164487d7c51acb9787f6d1b01b35ef6 ] The current code will not execute batadv_purge_orig_neighbors() when an orig_ifinfo has already been purged. However we need to run it in any case. Fix that. This is a regression introduced by 7351a4822d42827ba0110677c0cbad88a3d52585 ("batman-adv: split out router from orig_node") Signed-off-by: Simon Wunderlich <simon@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxxx> --- net/batman-adv/originator.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 25df60d..47b0886 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -857,7 +857,7 @@ static bool batadv_purge_orig_node(struct batadv_priv *bat_priv, { struct batadv_neigh_node *best_neigh_node; struct batadv_hard_iface *hard_iface; - bool changed; + bool changed_ifinfo, changed_neigh; if (batadv_has_timed_out(orig_node->last_seen, 2 * BATADV_PURGE_TIMEOUT)) { @@ -867,10 +867,10 @@ static bool batadv_purge_orig_node(struct batadv_priv *bat_priv, jiffies_to_msecs(orig_node->last_seen)); return true; } - changed = batadv_purge_orig_ifinfo(bat_priv, orig_node); - changed = changed || batadv_purge_orig_neighbors(bat_priv, orig_node); + changed_ifinfo = batadv_purge_orig_ifinfo(bat_priv, orig_node); + changed_neigh = batadv_purge_orig_neighbors(bat_priv, orig_node); - if (!changed) + if (!changed_ifinfo && !changed_neigh) return false; /* first for NULL ... */ -- 1.7.11.7 From 29f5de556ef733aa5d96b82858a19781f3f67184 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich <simon@xxxxxxxxxxxxx> Date: Wed, 26 Mar 2014 15:46:24 +0100 Subject: [PATCH 59/76] batman-adv: fix removing neigh_ifinfo [ Upstream commit 709de13f0c532fe9c468c094aff069a725ed57fe ] When an interface is removed separately, all neighbors need to be checked if they have a neigh_ifinfo structure for that particular interface. If that is the case, remove that ifinfo so any references to a hard interface can be freed. This is a regression introduced by 89652331c00f43574515059ecbf262d26d885717 ("batman-adv: split tq information in neigh_node struct") Reported-by: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Signed-off-by: Simon Wunderlich <simon@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxxx> --- net/batman-adv/originator.c | 46 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 47b0886..3e6dd91 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -702,6 +702,47 @@ free_orig_node: } /** + * batadv_purge_neigh_ifinfo - purge obsolete ifinfo entries from neighbor + * @bat_priv: the bat priv with all the soft interface information + * @neigh: orig node which is to be checked + */ +static void +batadv_purge_neigh_ifinfo(struct batadv_priv *bat_priv, + struct batadv_neigh_node *neigh) +{ + struct batadv_neigh_ifinfo *neigh_ifinfo; + struct batadv_hard_iface *if_outgoing; + struct hlist_node *node_tmp; + + spin_lock_bh(&neigh->ifinfo_lock); + + /* for all ifinfo objects for this neighinator */ + hlist_for_each_entry_safe(neigh_ifinfo, node_tmp, + &neigh->ifinfo_list, list) { + if_outgoing = neigh_ifinfo->if_outgoing; + + /* always keep the default interface */ + if (if_outgoing == BATADV_IF_DEFAULT) + continue; + + /* don't purge if the interface is not (going) down */ + if ((if_outgoing->if_status != BATADV_IF_INACTIVE) && + (if_outgoing->if_status != BATADV_IF_NOT_IN_USE) && + (if_outgoing->if_status != BATADV_IF_TO_BE_REMOVED)) + continue; + + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, + "neighbor/ifinfo purge: neighbor %pM, iface: %s\n", + neigh->addr, if_outgoing->net_dev->name); + + hlist_del_rcu(&neigh_ifinfo->list); + batadv_neigh_ifinfo_free_ref(neigh_ifinfo); + } + + spin_unlock_bh(&neigh->ifinfo_lock); +} + +/** * batadv_purge_orig_ifinfo - purge obsolete ifinfo entries from originator * @bat_priv: the bat priv with all the soft interface information * @orig_node: orig node which is to be checked @@ -800,6 +841,11 @@ batadv_purge_orig_neighbors(struct batadv_priv *bat_priv, hlist_del_rcu(&neigh_node->list); batadv_neigh_node_free_ref(neigh_node); + } else { + /* only necessary if not the whole neighbor is to be + * deleted, but some interface has been removed. + */ + batadv_purge_neigh_ifinfo(bat_priv, neigh_node); } } -- 1.7.11.7 From 30f3a851b6e09131944e88e063075240deaed255 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov <ast@xxxxxxxxxxxx> Date: Tue, 13 May 2014 15:05:55 -0700 Subject: [PATCH 60/76] net: filter: x86: fix JIT address randomization [ Upstream commit 773cd38f40b8834be991dbfed36683acc1dd41ee ] bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0 [<ffffffff8106c90c>] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header *header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxxxx> Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- arch/x86/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 4ed75dd..af2d431 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -171,7 +171,7 @@ static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen, memset(header, 0xcc, sz); /* fill whole space with int3 instructions */ header->pages = sz / PAGE_SIZE; - hole = sz - (proglen + sizeof(*header)); + hole = min(sz - (proglen + sizeof(*header)), PAGE_SIZE - sizeof(*header)); /* insert a random number of int3 instructions before BPF code */ *image_ptr = &header->image[prandom_u32() % hole]; -- 1.7.11.7 From 7e4343e4f22c78c61fc406b2ba9523b90a2f3013 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens@xxxxxxxxxx> Date: Wed, 14 May 2014 09:48:21 +0200 Subject: [PATCH 61/76] net: filter: s390: fix JIT address randomization [ Upstream commit e84d2f8d2ae33c8215429824e1ecf24cbca9645e ] This is the s390 variant of Alexei's JIT bug fix. (patch description below stolen from Alexei's patch) bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0 [<ffffffff8106c90c>] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header *header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page. Fixes: aa2d2c73c21f2 ("s390/bpf,jit: address randomize and write protect jit code") Reported-by: Alexei Starovoitov <ast@xxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # v3.11+ Signed-off-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- arch/s390/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index a778ee2..8e08c67 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -811,7 +811,7 @@ static struct bpf_binary_header *bpf_alloc_binary(unsigned int bpfsize, return NULL; memset(header, 0, sz); header->pages = sz / PAGE_SIZE; - hole = sz - (bpfsize + sizeof(*header)); + hole = min(sz - (bpfsize + sizeof(*header)), PAGE_SIZE - sizeof(*header)); /* Insert random number of illegal instructions before BPF code * and make sure the first instruction starts at an even address. */ -- 1.7.11.7 From 134c43864aaab734be8d8ac77e5fc716a96d9789 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Date: Sun, 11 May 2014 22:59:30 +0200 Subject: [PATCH 62/76] net: avoid dependency of net_get_random_once on nop patching MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 3d4405226d27b3a215e4d03cfa51f536244e5de7 ] net_get_random_once depends on the static keys infrastructure to patch up the branch to the slow path during boot. This was realized by abusing the static keys api and defining a new initializer to not enable the call site while still indicating that the branch point should get patched up. This was needed to have the fast path considered likely by gcc. The static key initialization during boot up normally walks through all the registered keys and either patches in ideal nops or enables the jump site but omitted that step on x86 if ideal nops where already placed at static_key branch points. Thus net_get_random_once branches not always became active. This patch switches net_get_random_once to the ordinary static_key api and thus places the kernel fast path in the - by gcc considered - unlikely path. Microbenchmarks on Intel and AMD x86-64 showed that the unlikely path actually beats the likely path in terms of cycle cost and that different nop patterns did not make much difference, thus this switch should not be noticeable. Fixes: a48e42920ff38b ("net: introduce new macro net_get_random_once") Reported-by: Tuomas Räsänen <tuomasjjrasanen@xxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/net.h | 15 ++++----------- net/core/utils.c | 8 ++++---- 2 files changed, 8 insertions(+), 15 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index 94734a6..17d8339 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -248,24 +248,17 @@ do { \ bool __net_get_random_once(void *buf, int nbytes, bool *done, struct static_key *done_key); -#ifdef HAVE_JUMP_LABEL -#define ___NET_RANDOM_STATIC_KEY_INIT ((struct static_key) \ - { .enabled = ATOMIC_INIT(0), .entries = (void *)1 }) -#else /* !HAVE_JUMP_LABEL */ -#define ___NET_RANDOM_STATIC_KEY_INIT STATIC_KEY_INIT_FALSE -#endif /* HAVE_JUMP_LABEL */ - #define net_get_random_once(buf, nbytes) \ ({ \ bool ___ret = false; \ static bool ___done = false; \ - static struct static_key ___done_key = \ - ___NET_RANDOM_STATIC_KEY_INIT; \ - if (!static_key_true(&___done_key)) \ + static struct static_key ___once_key = \ + STATIC_KEY_INIT_TRUE; \ + if (static_key_true(&___once_key)) \ ___ret = __net_get_random_once(buf, \ nbytes, \ &___done, \ - &___done_key); \ + &___once_key); \ ___ret; \ }) diff --git a/net/core/utils.c b/net/core/utils.c index 2f737bf..eed3433 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -348,8 +348,8 @@ static void __net_random_once_deferred(struct work_struct *w) { struct __net_random_once_work *work = container_of(w, struct __net_random_once_work, work); - if (!static_key_enabled(work->key)) - static_key_slow_inc(work->key); + BUG_ON(!static_key_enabled(work->key)); + static_key_slow_dec(work->key); kfree(work); } @@ -367,7 +367,7 @@ static void __net_random_once_disable_jump(struct static_key *key) } bool __net_get_random_once(void *buf, int nbytes, bool *done, - struct static_key *done_key) + struct static_key *once_key) { static DEFINE_SPINLOCK(lock); unsigned long flags; @@ -382,7 +382,7 @@ bool __net_get_random_once(void *buf, int nbytes, bool *done, *done = true; spin_unlock_irqrestore(&lock, flags); - __net_random_once_disable_jump(done_key); + __net_random_once_disable_jump(once_key); return true; } -- 1.7.11.7 From eb682baf32e96c3d145c89eab5e0432483da92d9 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Date: Sun, 11 May 2014 23:01:13 +0200 Subject: [PATCH 63/76] ipv6: fix calculation of option len in ip6_append_data [ Upstream commit 3a1cebe7e05027a1c96f2fc1a8eddf5f19b78f42 ] tot_len does specify the size of struct ipv6_txoptions. We need opt_flen + opt_nflen to calculate the overall length of additional ipv6 extensions. I found this while auditing the ipv6 output path for a memory corruption reported by Alexey Preobrazhensky while he fuzzed an instrumented AddressSanitizer kernel with trinity. This may or may not be the cause of the original bug. Fixes: 4df98e76cde7c6 ("ipv6: pmtudisc setting not respected with UFO/CORK") Reported-by: Alexey Preobrazhensky <preobr@xxxxxxxxxx> Signed-off-by: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 264a984..a62b610 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1229,7 +1229,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, unsigned int maxnonfragsize, headersize; headersize = sizeof(struct ipv6hdr) + - (opt ? opt->tot_len : 0) + + (opt ? opt->opt_flen + opt->opt_nflen : 0) + (dst_allfrag(&rt->dst) ? sizeof(struct frag_hdr) : 0) + rt->rt6i_nfheader_len; -- 1.7.11.7 From 2e76993bcc68fcfa8936862cfa501c51f349eeb4 Mon Sep 17 00:00:00 2001 From: Cong Wang <cwang@xxxxxxxxxxxxxxxx> Date: Mon, 12 May 2014 15:11:20 -0700 Subject: [PATCH 64/76] rtnetlink: wait for unregistering devices in rtnl_link_unregister() [ Upstream commit 200b916f3575bdf11609cb447661b8d5957b0bbf ] From: Cong Wang <cwang@xxxxxxxxxxxxxxxx> commit 50624c934db18ab90 (net: Delay default_device_exit_batch until no devices are unregistering) introduced rtnl_lock_unregistering() for default_device_exit_batch(). Same race could happen we when rmmod a driver which calls rtnl_link_unregister() as we call dev->destructor without rtnl lock. For long term, I think we should clean up the mess of netdev_run_todo() and net namespce exit code. Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> Cc: David S. Miller <davem@xxxxxxxxxxxxx> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx> Signed-off-by: Cong Wang <cwang@xxxxxxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- include/linux/rtnetlink.h | 5 +++++ net/core/dev.c | 2 +- net/core/net_namespace.c | 2 +- net/core/rtnetlink.c | 33 ++++++++++++++++++++++++++++++++- 4 files changed, 39 insertions(+), 3 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 8e3e66a..953937e 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -4,6 +4,7 @@ #include <linux/mutex.h> #include <linux/netdevice.h> +#include <linux/wait.h> #include <uapi/linux/rtnetlink.h> extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 group, int echo); @@ -22,6 +23,10 @@ extern void rtnl_lock(void); extern void rtnl_unlock(void); extern int rtnl_trylock(void); extern int rtnl_is_locked(void); + +extern wait_queue_head_t netdev_unregistering_wq; +extern struct mutex net_mutex; + #ifdef CONFIG_PROVE_LOCKING extern int lockdep_rtnl_is_held(void); #else diff --git a/net/core/dev.c b/net/core/dev.c index 7c22974..6de8401 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5573,7 +5573,7 @@ static int dev_new_index(struct net *net) /* Delayed registration/unregisteration */ static LIST_HEAD(net_todo_list); -static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq); +DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq); static void net_set_todo(struct net_device *dev) { diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 81d3a9a..7c8ffd9 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -24,7 +24,7 @@ static LIST_HEAD(pernet_list); static struct list_head *first_device = &pernet_list; -static DEFINE_MUTEX(net_mutex); +DEFINE_MUTEX(net_mutex); LIST_HEAD(net_namespace_list); EXPORT_SYMBOL_GPL(net_namespace_list); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 3797d96..83b9d6a 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -353,15 +353,46 @@ void __rtnl_link_unregister(struct rtnl_link_ops *ops) } EXPORT_SYMBOL_GPL(__rtnl_link_unregister); +/* Return with the rtnl_lock held when there are no network + * devices unregistering in any network namespace. + */ +static void rtnl_lock_unregistering_all(void) +{ + struct net *net; + bool unregistering; + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait(&netdev_unregistering_wq, &wait, + TASK_UNINTERRUPTIBLE); + unregistering = false; + rtnl_lock(); + for_each_net(net) { + if (net->dev_unreg_count > 0) { + unregistering = true; + break; + } + } + if (!unregistering) + break; + __rtnl_unlock(); + schedule(); + } + finish_wait(&netdev_unregistering_wq, &wait); +} + /** * rtnl_link_unregister - Unregister rtnl_link_ops from rtnetlink. * @ops: struct rtnl_link_ops * to unregister */ void rtnl_link_unregister(struct rtnl_link_ops *ops) { - rtnl_lock(); + /* Close the race with cleanup_net() */ + mutex_lock(&net_mutex); + rtnl_lock_unregistering_all(); __rtnl_link_unregister(ops); rtnl_unlock(); + mutex_unlock(&net_mutex); } EXPORT_SYMBOL_GPL(rtnl_link_unregister); -- 1.7.11.7 From 8f5ff1e5baae6e3a6ecf4d9c4cbf208ef5e905d5 Mon Sep 17 00:00:00 2001 From: Guenter Roeck <linux@xxxxxxxxxxxx> Date: Wed, 14 May 2014 13:12:49 -0700 Subject: [PATCH 65/76] net: phy: Don't call phy_resume if phy_init_hw failed [ Upstream commit b394745df2d9d4c30bf1bcc55773bec6f3bc7c67 ] After the call to phy_init_hw failed in phy_attach_direct, phy_detach is called to detach the phy device from its network device. If the attached driver is a generic phy driver, this also detaches the driver. Subsequently phy_resume is called, which assumes without checking that a driver is attached to the device. This will result in a crash such as Unable to handle kernel paging request for data at address 0xffffffffffffff90 Faulting instruction address: 0xc0000000003a0e18 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0000000003a0e18] .phy_attach_direct+0x68/0x17c LR [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c Call Trace: [c0000003fc0475d0] [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c (unreliable) [c0000003fc047670] [c0000000003a0ff8] .phy_connect_direct+0x28/0x98 [c0000003fc047700] [c0000000003f0074] .of_phy_connect+0x4c/0xa4 Only call phy_resume if phy_init_hw was successful. Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx> Acked-by: Florian Fainelli <f.fainelli@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/phy/phy_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 2f6989b..3653754 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -613,8 +613,8 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, err = phy_init_hw(phydev); if (err) phy_detach(phydev); - - phy_resume(phydev); + else + phy_resume(phydev); return err; } -- 1.7.11.7 From 07239cc9b23ba0c3b4e741fe5b5838be1bc1b943 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov <nikolay@xxxxxxxxxx> Date: Thu, 15 May 2014 13:35:23 +0200 Subject: [PATCH 66/76] bonding: fix out of range parameters for bond_intmax_tbl [ Upstream commit 81c708068dfedece038e07d818ba68333d8d885d ] I've missed to add a NULL entry to the bond_intmax_tbl when I introduced it with the conversion of arp_interval so add it now. CC: Jay Vosburgh <j.vosburgh@xxxxxxxxx> CC: Veaceslav Falico <vfalico@xxxxxxxxx> CC: Andy Gospodarek <andy@xxxxxxxxxxxxx> Fixes: 7bdb04ed0dbf ("bonding: convert arp_interval to use the new option API") Signed-off-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxx> Acked-by: Veaceslav Falico <vfalico@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- drivers/net/bonding/bond_options.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 298c265..a937a37 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -70,6 +70,7 @@ static struct bond_opt_value bond_fail_over_mac_tbl[] = { static struct bond_opt_value bond_intmax_tbl[] = { { "off", 0, BOND_VALFLAG_DEFAULT}, { "maxval", INT_MAX, BOND_VALFLAG_MAX}, + { NULL, -1, 0} }; static struct bond_opt_value bond_lacp_rate_tbl[] = { -- 1.7.11.7 From b2f7c70b41e577652a0622002eb32497eee8b69b Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Fri, 16 May 2014 11:34:37 -0700 Subject: [PATCH 67/76] net: gro: make sure skb->cb[] initial content has not to be zero MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 29e98242783ed3ba569797846a606ba66f781625 ] Starting from linux-3.13, GRO attempts to build full size skbs. Problem is the commit assumed one particular field in skb->cb[] was clean, but it is not the case on some stacked devices. Timo reported a crash in case traffic is decrypted before reaching a GRE device. Fix this by initializing NAPI_GRO_CB(skb)->last at the right place, this also removes one conditional. Thanks a lot to Timo for providing full reports and bisecting this. Fixes: 8a29111c7ca6 ("net: gro: allow to build full sized skb") Bisected-by: Timo Teras <timo.teras@xxxxxx> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Tested-by: Timo Teräs <timo.teras@xxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/dev.c | 1 + net/core/skbuff.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 6de8401..62d29d9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3944,6 +3944,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff } NAPI_GRO_CB(skb)->count = 1; NAPI_GRO_CB(skb)->age = jiffies; + NAPI_GRO_CB(skb)->last = skb; skb_shinfo(skb)->gso_size = skb_gro_len(skb); skb->next = napi->gro_list; napi->gro_list = skb; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 31d7cd7..e5ae776e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3076,7 +3076,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) if (unlikely(p->len + len >= 65536)) return -E2BIG; - lp = NAPI_GRO_CB(p)->last ?: p; + lp = NAPI_GRO_CB(p)->last; pinfo = skb_shinfo(lp); if (headlen <= offset) { @@ -3192,7 +3192,7 @@ merge: __skb_pull(skb, offset); - if (!NAPI_GRO_CB(p)->last) + if (NAPI_GRO_CB(p)->last == p) skb_shinfo(p)->frag_list = skb; else NAPI_GRO_CB(p)->last->next = skb; -- 1.7.11.7 From bb98c19578226cb5bff5116fbfcec0b10f2cf360 Mon Sep 17 00:00:00 2001 From: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Date: Thu, 24 Apr 2014 03:44:25 +0800 Subject: [PATCH 68/76] batman-adv: fix indirect hard_iface NULL dereference [ Upstream commit 16a4142363b11952d3aa76ac78004502c0c2fe6e ] If hard_iface is NULL and goto out is made batadv_hardif_free_ref() doesn't check for NULL before dereferencing it to get to refcount. Introduced in cb1c92ec37fb70543d133a1fa7d9b54d6f8a1ecd ("batman-adv: add debugfs support to view multiif tables"). Reported-by: Sven Eckelmann <sven@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Acked-by: Antonio Quartulli <antonio@xxxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxxx> --- net/batman-adv/originator.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 3e6dd91..abf612d 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -1074,7 +1074,8 @@ int batadv_orig_hardif_seq_print_text(struct seq_file *seq, void *offset) bat_priv->bat_algo_ops->bat_orig_print(bat_priv, seq, hard_iface); out: - batadv_hardif_free_ref(hard_iface); + if (hard_iface) + batadv_hardif_free_ref(hard_iface); return 0; } -- 1.7.11.7 From 1bfc41f24dfe65a5bb0ab4e2f92d4e3a42ac1b2d Mon Sep 17 00:00:00 2001 From: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Date: Wed, 23 Apr 2014 14:05:16 +0200 Subject: [PATCH 69/76] batman-adv: fix reference counting imbalance while sending fragment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit be181015a189cd141398b761ba4e79d33fe69949 ] In the new fragmentation code the batadv_frag_send_packet() function obtains a reference to the primary_if, but it does not release it upon return. This reference imbalance prevents the primary_if (and then the related netdevice) to be properly released on shut down. Fix this by releasing the primary_if in batadv_frag_send_packet(). Introduced by ee75ed88879af88558818a5c6609d85f60ff0df4 ("batman-adv: Fragment and send skbs larger than mtu") Cc: Martin Hundebøll <martin@xxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> Acked-by: Martin Hundebøll <martin@xxxxxxxxxxxxx> --- net/batman-adv/fragmentation.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c index 88df9b1..cc1cfd6 100644 --- a/net/batman-adv/fragmentation.c +++ b/net/batman-adv/fragmentation.c @@ -418,12 +418,13 @@ bool batadv_frag_send_packet(struct sk_buff *skb, struct batadv_neigh_node *neigh_node) { struct batadv_priv *bat_priv; - struct batadv_hard_iface *primary_if; + struct batadv_hard_iface *primary_if = NULL; struct batadv_frag_packet frag_header; struct sk_buff *skb_fragment; unsigned mtu = neigh_node->if_incoming->net_dev->mtu; unsigned header_size = sizeof(frag_header); unsigned max_fragment_size, max_packet_size; + bool ret = false; /* To avoid merge and refragmentation at next-hops we never send * fragments larger than BATADV_FRAG_MAX_FRAG_SIZE @@ -483,7 +484,11 @@ bool batadv_frag_send_packet(struct sk_buff *skb, skb->len + ETH_HLEN); batadv_send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); - return true; + ret = true; + out_err: - return false; + if (primary_if) + batadv_hardif_free_ref(primary_if); + + return ret; } -- 1.7.11.7 From 820d23dbf22567aaebbceeeddeeb47c0e8a12588 Mon Sep 17 00:00:00 2001 From: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Date: Fri, 2 May 2014 01:35:13 +0200 Subject: [PATCH 70/76] batman-adv: increase orig refcount when storing ref in gw_node [ Upstream commit 377fe0f968b30a1a714fab53a908061914f30e26 ] A pointer to the orig_node representing a bat-gateway is stored in the gw_node->orig_node member, but the refcount for such orig_node is never increased. This leads to memory faults when gw_node->orig_node is accessed and the originator has already been freed. Fix this by increasing the refcount on gw_node creation and decreasing it on gw_node free. Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> --- net/batman-adv/gateway_client.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c index 55cf226..36b9ae6 100644 --- a/net/batman-adv/gateway_client.c +++ b/net/batman-adv/gateway_client.c @@ -42,8 +42,10 @@ static void batadv_gw_node_free_ref(struct batadv_gw_node *gw_node) { - if (atomic_dec_and_test(&gw_node->refcount)) + if (atomic_dec_and_test(&gw_node->refcount)) { + batadv_orig_node_free_ref(gw_node->orig_node); kfree_rcu(gw_node, rcu); + } } static struct batadv_gw_node * @@ -408,9 +410,14 @@ static void batadv_gw_node_add(struct batadv_priv *bat_priv, if (gateway->bandwidth_down == 0) return; + if (!atomic_inc_not_zero(&orig_node->refcount)) + return; + gw_node = kzalloc(sizeof(*gw_node), GFP_ATOMIC); - if (!gw_node) + if (!gw_node) { + batadv_orig_node_free_ref(orig_node); return; + } INIT_HLIST_NODE(&gw_node->list); gw_node->orig_node = orig_node; -- 1.7.11.7 From 95560a081a3f09663a6809908207b04e52e1b37e Mon Sep 17 00:00:00 2001 From: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Date: Sat, 29 Mar 2014 17:27:38 +0100 Subject: [PATCH 71/76] batman-adv: fix local TT check for outgoing arp requests in DAT [ Upstream commit cc2f33860cea0e48ebec096130bd0f7c4bf6e0bc ] Change introduced by 88e48d7b3340ef07b108eb8a8b3813dd093cc7f7 ("batman-adv: make DAT drop ARP requests targeting local clients") implements a check that prevents DAT from using the caching mechanism when the client that is supposed to provide a reply to an arp request is local. However change brought by be1db4f6615b5e6156c807ea8985171c215c2d57 ("batman-adv: make the Distributed ARP Table vlan aware") has not converted the above check into its vlan aware version thus making it useless when the local client is behind a vlan. Fix the behaviour by properly specifying the vlan when checking for a client being local or not. Reported-by: Simon Wunderlich <simon@xxxxxxxxxxxxx> Signed-off-by: Antonio Quartulli <antonio@xxxxxxxxxxxxx> Signed-off-by: Marek Lindner <mareklindner@xxxxxxxxxxxxx> --- net/batman-adv/distributed-arp-table.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index edee504..bd8219a 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -940,8 +940,7 @@ bool batadv_dat_snoop_outgoing_arp_request(struct batadv_priv *bat_priv, * additional DAT answer may trigger kernel warnings about * a packet coming from the wrong port. */ - if (batadv_is_my_client(bat_priv, dat_entry->mac_addr, - BATADV_NO_FLAGS)) { + if (batadv_is_my_client(bat_priv, dat_entry->mac_addr, vid)) { ret = true; goto out; } -- 1.7.11.7 From 3354f258016419ccf8877cc93a72e4afbfbf14ce Mon Sep 17 00:00:00 2001 From: Steffen Klassert <steffen.klassert@xxxxxxxxxxx> Date: Mon, 19 May 2014 11:36:56 +0200 Subject: [PATCH 72/76] ip_tunnel: Initialize the fallback device properly [ Upstream commit 78ff4be45a4c51d8fb21ad92e4fabb467c6c3eeb ] We need to initialize the fallback device to have a correct mtu set on this device. Otherwise the mtu is set to null and the device is unusable. Fixes: fd58156e456d ("IPIP: Use ip-tunneling code.") Cc: Pravin B Shelar <pshelar@xxxxxxxxxx> Signed-off-by: Steffen Klassert <steffen.klassert@xxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/ip_tunnel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 547bd39..0c3a5d1 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -875,6 +875,7 @@ int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id, */ if (!IS_ERR(itn->fb_tunnel_dev)) { itn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; + itn->fb_tunnel_dev->mtu = ip_tunnel_bind_dev(itn->fb_tunnel_dev); ip_tunnel_add(itn, netdev_priv(itn->fb_tunnel_dev)); } rtnl_unlock(); -- 1.7.11.7 From 5cbe7e31f804ad60cc53f4bad41489aaa468e4a1 Mon Sep 17 00:00:00 2001 From: Cong Wang <xiyou.wangcong@xxxxxxxxx> Date: Mon, 19 May 2014 12:15:49 -0700 Subject: [PATCH 73/76] net_sched: fix an oops in tcindex filter [ Upstream commit bf63ac73b3e132e6bf0c8798aba7b277c3316e19 ] Kelly reported the following crash: IP: [<ffffffff817a993d>] tcf_action_exec+0x46/0x90 PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000 RIP: 0010:[<ffffffff817a993d>] [<ffffffff817a993d>] tcf_action_exec+0x46/0x90 RSP: 0018:ffff8800d21b9b90 EFLAGS: 00010283 RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8 RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840 RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001 R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840 R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8 FS: 00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0 Stack: ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000 ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90 ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18 Call Trace: [<ffffffff817c55bb>] tcindex_classify+0x88/0x9b [<ffffffff817a7f7d>] tc_classify_compat+0x3e/0x7b [<ffffffff817a7fdf>] tc_classify+0x25/0x9f [<ffffffff817b0e68>] htb_enqueue+0x55/0x27a [<ffffffff817b6c2e>] dsmark_enqueue+0x165/0x1a4 [<ffffffff81775642>] __dev_queue_xmit+0x35e/0x536 [<ffffffff8177582a>] dev_queue_xmit+0x10/0x12 [<ffffffff818f8ecd>] packet_sendmsg+0xb26/0xb9a [<ffffffff810b1507>] ? __lock_acquire+0x3ae/0xdf3 [<ffffffff8175cf08>] __sock_sendmsg_nosec+0x25/0x27 [<ffffffff8175d916>] sock_aio_write+0xd0/0xe7 [<ffffffff8117d6b8>] do_sync_write+0x59/0x78 [<ffffffff8117d84d>] vfs_write+0xb5/0x10a [<ffffffff8117d96a>] SyS_write+0x49/0x7f [<ffffffff8198e212>] system_call_fastpath+0x16/0x1b This is because we memcpy struct tcindex_filter_result which contains struct tcf_exts, obviously struct list_head can not be simply copied. This is a regression introduced by commit 33be627159913b094bb578 (net_sched: act: use standard struct list_head). It's not very easy to fix it as the code is a mess: if (old_r) memcpy(&cr, r, sizeof(cr)); else { memset(&cr, 0, sizeof(cr)); tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); } ... tcf_exts_change(tp, &cr.exts, &e); ... memcpy(r, &cr, sizeof(cr)); the above code should equal to: tcindex_filter_result_init(&cr); if (old_r) cr.res = r->res; ... if (old_r) tcf_exts_change(tp, &r->exts, &e); else tcf_exts_change(tp, &cr.exts, &e); ... r->res = cr.res; after this change, since there is no need to copy struct tcf_exts. And it also fixes other places zero'ing struct's contains struct tcf_exts. Fixes: commit 33be627159913b0 (net_sched: act: use standard struct list_head) Reported-by: Kelly Anderson <kelly@xxxxxxxxx> Tested-by: Kelly Anderson <kelly@xxxxxxxxx> Cc: David S. Miller <davem@xxxxxxxxxxxxx> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/sched/cls_tcindex.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index eed8404..f435a88 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -188,6 +188,12 @@ static const struct nla_policy tcindex_policy[TCA_TCINDEX_MAX + 1] = { [TCA_TCINDEX_CLASSID] = { .type = NLA_U32 }, }; +static void tcindex_filter_result_init(struct tcindex_filter_result *r) +{ + memset(r, 0, sizeof(*r)); + tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); +} + static int tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, u32 handle, struct tcindex_data *p, @@ -207,15 +213,11 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, return err; memcpy(&cp, p, sizeof(cp)); - memset(&new_filter_result, 0, sizeof(new_filter_result)); - tcf_exts_init(&new_filter_result.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); + tcindex_filter_result_init(&new_filter_result); + tcindex_filter_result_init(&cr); if (old_r) - memcpy(&cr, r, sizeof(cr)); - else { - memset(&cr, 0, sizeof(cr)); - tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); - } + cr.res = r->res; if (tb[TCA_TCINDEX_HASH]) cp.hash = nla_get_u32(tb[TCA_TCINDEX_HASH]); @@ -267,9 +269,14 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, err = -ENOMEM; if (!cp.perfect && !cp.h) { if (valid_perfect_hash(&cp)) { + int i; + cp.perfect = kcalloc(cp.hash, sizeof(*r), GFP_KERNEL); if (!cp.perfect) goto errout; + for (i = 0; i < cp.hash; i++) + tcf_exts_init(&cp.perfect[i].exts, TCA_TCINDEX_ACT, + TCA_TCINDEX_POLICE); balloc = 1; } else { cp.h = kcalloc(cp.hash, sizeof(f), GFP_KERNEL); @@ -295,14 +302,17 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_bind_filter(tp, &cr.res, base); } - tcf_exts_change(tp, &cr.exts, &e); + if (old_r) + tcf_exts_change(tp, &r->exts, &e); + else + tcf_exts_change(tp, &cr.exts, &e); tcf_tree_lock(tp); if (old_r && old_r != r) - memset(old_r, 0, sizeof(*old_r)); + tcindex_filter_result_init(old_r); memcpy(p, &cp, sizeof(cp)); - memcpy(r, &cr, sizeof(cr)); + r->res = cr.res; if (r == &new_filter_result) { struct tcindex_filter **fp; -- 1.7.11.7 From 8f182806c2821beada7800f8506f28a8673cb350 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Mon, 19 May 2014 21:56:34 -0700 Subject: [PATCH 74/76] ipv6: gro: fix CHECKSUM_COMPLETE support [ Upstream commit 4de462ab63e23953fd05da511aeb460ae10cc726 ] When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling broke on GRE+IPv6 because we did not update/use the appropriate csum : GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of skb->csum Tested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens at the first level (ethernet device) instead of being done in gre tunnel. Native IPv6+TCP is still properly aggregated. Fixes: bf5a755f5e918 ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Cc: Jerry Chu <hkchu@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv6/ip6_offload.c | 6 +----- net/ipv6/tcpv6_offload.c | 2 +- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 59f95af..b2f0915 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -196,7 +196,6 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head, unsigned int off; u16 flush = 1; int proto; - __wsum csum; off = skb_gro_offset(skb); hlen = off + sizeof(*iph); @@ -264,13 +263,10 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head, NAPI_GRO_CB(skb)->flush |= flush; - csum = skb->csum; - skb_postpull_rcsum(skb, iph, skb_network_header_len(skb)); + skb_gro_postpull_rcsum(skb, iph, nlen); pp = ops->callbacks.gro_receive(head, skb); - skb->csum = csum; - out_unlock: rcu_read_unlock(); diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index 0d78132..8517d3c 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -42,7 +42,7 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head, if (NAPI_GRO_CB(skb)->flush) goto skip_csum; - wsum = skb->csum; + wsum = NAPI_GRO_CB(skb)->csum; switch (skb->ip_summed) { case CHECKSUM_NONE: -- 1.7.11.7 From ebfc00e3b9fa0cdc22bb574cc4ca0957450685ab Mon Sep 17 00:00:00 2001 From: Li RongQing <roy.qing.li@xxxxxxxxx> Date: Thu, 22 May 2014 16:36:55 +0800 Subject: [PATCH 75/76] ipv4: initialise the itag variable in __mkroute_input [ Upstream commit fbdc0ad095c0a299e9abf5d8ac8f58374951149a ] the value of itag is a random value from stack, and may not be initiated by fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID is not set This will make the cached dst uncertainty Signed-off-by: Li RongQing <roy.qing.li@xxxxxxxxx> Acked-by: Alexei Starovoitov <ast@xxxxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index db66057..1344373 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1526,7 +1526,7 @@ static int __mkroute_input(struct sk_buff *skb, struct in_device *out_dev; unsigned int flags = 0; bool do_cache; - u32 itag; + u32 itag = 0; /* get a working reference to the output device */ out_dev = __in_dev_get_rcu(FIB_RES_DEV(*res)); -- 1.7.11.7 From e27b790ee06a5b3d917023588bec8ab1dbbeab82 Mon Sep 17 00:00:00 2001 From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 3 Apr 2014 09:28:10 -0700 Subject: [PATCH 76/76] net-gro: reset skb->truesize in napi_reuse_skb() [ Upstream commit e33d0ba8047b049c9262fdb1fcafb93cb52ceceb ] Recycling skb always had been very tough... This time it appears GRO layer can accumulate skb->truesize adjustments made by drivers when they attach a fragment to skb. skb_gro_receive() can only subtract from skb->truesize the used part of a fragment. I spotted this problem seeing TcpExtPruneCalled and TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where TCP receive window should be sized properly to accept traffic coming from a driver not overshooting skb->truesize. Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/dev.c b/net/core/dev.c index 62d29d9..fccc195 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4051,6 +4051,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->vlan_tci = 0; skb->dev = napi->dev; skb->skb_iif = 0; + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); napi->skb = skb; } -- 1.7.11.7