[stable request 4.4] tcp: fix tcp_mark_head_lost to check skb len before fragmenting

Vinson Lee <vlee@xxxxxxxxxxxxxxx> · Mon, 19 Jun 2017 14:51:20 -0700

Hi.

I am seeing this warning on this 4.4.62 based kernel from Ubuntu 16.04.

------------[ cut here ]------------
WARNING: CPU: 13 PID: 0 at
/build/linux-0XAgc4/linux-4.4.0/net/ipv4/tcp_output.c:1145
tcp_fragment+0x34d/0x370()
Modules linked in: ip6table_filter ipip tunnel4 ip_tunnel cpuid 8021q
garp mrp stp llc ip6_tables binfmt_misc xt_comment nf_log_ipv4
nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype iptable_filter
iptable_mangle iptable_raw ip_tables x_tables intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
ipmi_ssif aes_x86_64 lrw gf128mul glue_helper ablk_helper hpilo cryptd
serio_raw sb_edac edac_core ioatdma lpc_ich shpchp 8250_fintek mac_hid
acpi_power_meter ipmi_si ipmi_devintf ipmi_msghandler autofs4 raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear ixgbe dca vxlan
ip6_udp_tunnel udp_tunnel ptp ahci pps_core psmouse libahci
 mdio wmi fjes [last unloaded: nf_conntrack]
CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.4.0-78-generic #99-Ubuntu
 0000000000000286 24b314fc942cf971 ffff88105f3439a8 ffffffff813f8dd3
 0000000000000000 ffffffff81d71d78 ffff88105f3439e0 ffffffff81081302
 ffff88088a1c5000 ffff880cc988f800 000000000000004b 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff813f8dd3>] dump_stack+0x63/0x90
 [<ffffffff81081302>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8108144a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff817903ad>] tcp_fragment+0x34d/0x370
 [<ffffffff81785b7f>] tcp_mark_head_lost+0x14f/0x240
 [<ffffffff8178692c>] tcp_update_scoreboard+0x4c/0x70
 [<ffffffff8178c3d2>] tcp_fastretrans_alert+0x6f2/0xaa0
 [<ffffffff8178cbeb>] tcp_ack+0x46b/0x800
 [<ffffffff8178da39>] tcp_rcv_established+0x1d9/0x780
 [<ffffffff8174c232>] ? sk_filter_trim_cap+0x42/0x160
 [<ffffffff81798a05>] tcp_v4_do_rcv+0x145/0x200
 [<ffffffff81799bf2>] tcp_v4_rcv+0x872/0xa20
 [<ffffffffc02290c7>] ? iptable_filter_hook+0x27/0x56 [iptable_filter]
 [<ffffffff8176c172>] ? nf_iterate+0x62/0x80
 [<ffffffff81773504>] ip_local_deliver_finish+0x94/0x1e0
 [<ffffffff8177380f>] ip_local_deliver+0x6f/0xe0
 [<ffffffff81773470>] ? ip_rcv_finish+0x320/0x320
 [<ffffffff817731e2>] ip_rcv_finish+0x92/0x320
 [<ffffffff81773b11>] ip_rcv+0x291/0x3a0
 [<ffffffff81773150>] ? inet_del_offload+0x40/0x40
 [<ffffffff817357a4>] __netif_receive_skb_core+0x704/0xa60
 [<ffffffff8179ec00>] ? tcp4_gro_receive+0x130/0x1d0
 [<ffffffff81735b18>] __netif_receive_skb+0x18/0x60
 [<ffffffff81735b92>] netif_receive_skb_internal+0x32/0xa0
 [<ffffffff81736813>] napi_gro_receive+0xc3/0x120
 [<ffffffffc02b8dfa>] gro_cell_poll+0x5a/0x90 [ip_tunnel]
 [<ffffffff8173605e>] net_rx_action+0x21e/0x360
 [<ffffffff81085de1>] __do_softirq+0x101/0x290
 [<ffffffff810860e3>] irq_exit+0xa3/0xb0
 [<ffffffff81050e13>] smp_call_function_single_interrupt+0x33/0x40
 [<ffffffff81841e62>] call_function_single_interrupt+0x82/0x90
 <EOI>  [<ffffffff816d4131>] ? cpuidle_enter_state+0x111/0x2b0
 [<ffffffff816d4307>] cpuidle_enter+0x17/0x20
 [<ffffffff810c4592>] call_cpuidle+0x32/0x60
 [<ffffffff816d42e3>] ? cpuidle_select+0x13/0x20
 [<ffffffff810c4850>] cpu_startup_entry+0x290/0x350
 [<ffffffff810517c4>] start_secondary+0x154/0x190
---[ end trace f0d076c2d7e8bb40 ]---

This might be fixed by upstream Linux 4.5 commit "tcp: fix
tcp_mark_head_lost to check skb len before fragmenting". If so, would
you please consider backporting this patch to stable series Linux 4.4?

commit d88270eef4b56bd7973841dd1fed387ccfa83709
Author: Neal Cardwell <ncardwell@xxxxxxxxxx>
Date:   Mon Jan 25 14:01:53 2016 -0800

    tcp: fix tcp_mark_head_lost to check skb len before fragmenting

    This commit fixes a corner case in tcp_mark_head_lost() which was
    causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.

    tcp_mark_head_lost() was assuming that if a packet has
    tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
    M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
    maintained, this is not always true.

    For example, suppose the sender sends 4 1-byte packets and have the
    last 3 packet sacked. It will merge the last 3 packets in the write
    queue into an skb with pcount = 3 and len = 3 bytes. If another
    recovery happens after a sack reneging event, tcp_mark_head_lost()
    may attempt to split the skb assuming it has more than 2*MSS bytes.

    This sounds very counterintuitive, but as the commit description for
    the related commit c0638c247f55 ("tcp: don't fragment SACKed skbs in
    tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
    coalesces adjacent regions of SACKed skbs, and when doing this it
    preserves the sum of their packet counts in order to reflect the
    real-world dynamics on the wire. The c0638c247f55 commit tried to
    avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
    where the non-proportionality between pcount and skb->len/mss is known
    to be possible. However, that commit did not handle the case where
    during a reneging event one of these weird SACKed skbs becomes an
    un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.

    The fix is to simply mark the entire skb lost when this happens.
    This makes the recovery slightly more aggressive in such corner
    cases before we detect reordering. But once we detect reordering
    this code path is by-passed because FACK is disabled.

    Signed-off-by: Neal Cardwell <ncardwell@xxxxxxxxxx>
    Signed-off-by: Yuchung Cheng <ycheng@xxxxxxxxxx>
    Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

Cheers,
Vinson