On Tue, Aug 6, 2019 at 5:09 PM Matthieu Baerts <matthieu.baerts@xxxxxxxxxxxx> wrote: > > From: Eric Dumazet <edumazet@xxxxxxxxxx> > > commit b617158dc096709d8600c53b6052144d12b89fab upstream. > > Some applications set tiny SO_SNDBUF values and expect > TCP to just work. Recent patches to address CVE-2019-11478 > broke them in case of losses, since retransmits might > be prevented. > > We should allow these flows to make progress. > > This patch allows the first and last skb in retransmit queue > to be split even if memory limits are hit. > > It also adds the some room due to the fact that tcp_sendmsg() > and tcp_sendpage() might overshoot sk_wmem_queued by about one full > TSO skb (64KB size). Note this allowance was already present > in stable backports for kernels < 4.15 > > Note for < 4.15 backports : > tcp_rtx_queue_tail() will probably look like : > > static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) > { > struct sk_buff *skb = tcp_send_head(sk); > > return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); > } > > Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> > Reported-by: Andrew Prout <aprout@xxxxxxxxxx> > Tested-by: Andrew Prout <aprout@xxxxxxxxxx> > Tested-by: Jonathan Lemon <jonathan.lemon@xxxxxxxxx> > Tested-by: Michal Kubecek <mkubecek@xxxxxxx> > Acked-by: Neal Cardwell <ncardwell@xxxxxxxxxx> > Acked-by: Yuchung Cheng <ycheng@xxxxxxxxxx> > Acked-by: Christoph Paasch <cpaasch@xxxxxxxxx> > Cc: Jonathan Looney <jtl@xxxxxxxxxxx> > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > Signed-off-by: Matthieu Baerts <matthieu.baerts@xxxxxxxxxxxx> > --- > > Notes: > Hello, > > Here is the backport for linux-4.14.y branch simply by implementing > functions written by Eric here in the commit message and in this email > thread. It might be valid for older versions, I didn't check. > > In my setup with MPTCP, I had the same bug other had where TCP flows > were stalled. The initial fix b6653b3629e5 (tcp: refine memory limit > test in tcp_fragment()) from Eric was helping but the backport in > < 4.15 was not looking safe: 1bc13903773b (tcp: refine memory limit > test in tcp_fragment()). > > I then decided to test the new fix and it is working fine in my test > environment, no stalled TCP flows in a few hours. > > In this email thread I see that Eric will push a patch for v4.14. I > absolutely do not want to add pressure or steal his work but because I > have the patch here and it is tested, I was thinking it could be a good > idea to share it with others. Feel free to ignore this patch if needed. > But if this patch can reduce a tiny bit Eric's workload, I would be > very glad if it helps :) > Because at the end, it is Eric's work, feel free to change my > "Signed-off-by" by "Tested-By" if it is how it work or nothing if you > prefer to wait for Eric's patch. This patch is fine, I was simply on vacation last week, and wanted to truly take full advantage of them ;) Thanks ! > > Cheers, > Matt > > include/net/tcp.h | 17 +++++++++++++++++ > net/ipv4/tcp_output.c | 11 ++++++++++- > 2 files changed, 27 insertions(+), 1 deletion(-) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 0b477a1e1177..7994e569644e 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1688,6 +1688,23 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli > tcp_sk(sk)->highest_sack = NULL; > } > > +static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk) > +{ > + struct sk_buff *skb = tcp_write_queue_head(sk); > + > + if (skb == tcp_send_head(sk)) > + skb = NULL; > + > + return skb; > +} > + > +static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) > +{ > + struct sk_buff *skb = tcp_send_head(sk); > + > + return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); > +} > + > static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) > { > __skb_queue_tail(&sk->sk_write_queue, skb); > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index a5960b9b6741..a99086bf26ea 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1264,6 +1264,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, > struct tcp_sock *tp = tcp_sk(sk); > struct sk_buff *buff; > int nsize, old_factor; > + long limit; > int nlen; > u8 flags; > > @@ -1274,7 +1275,15 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, > if (nsize < 0) > nsize = 0; > > - if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 0x20000)) { > + /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb. > + * We need some allowance to not penalize applications setting small > + * SO_SNDBUF values. > + * Also allow first and last skb in retransmit queue to be split. > + */ > + limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE); > + if (unlikely((sk->sk_wmem_queued >> 1) > limit && > + skb != tcp_rtx_queue_head(sk) && > + skb != tcp_rtx_queue_tail(sk))) { > NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); > return -ENOMEM; > } > -- > 2.20.1 >