On Wed, 13 May 2020, Lorenz Bauer wrote: > > > Option 1: always downgrade UNNECESSARY to NONE > > > - Easiest to back port > > > - The helper is safe by default > > > - Performance impact unclear > > > - No escape hatch for Cilium > > > > > > Option 2: add a flag to force CHECKSUM_NONE > > > - New UAPI, can this be backported? > > > - The helper isn't safe by default, needs documentation > > > - Escape hatch for Cilium > > > > > > Option 3: downgrade to CHECKSUM_NONE, add flag to skip this > > > - New UAPI, can this be backported? > > > - The helper is safe by default > > > - Escape hatch for Cilium (though you'd need to detect availability of the > > > flag somehow) > > > > This seems most reasonable to me; I can try and cook a proposal for tomorrow as > > potential fix. Even if we add a flag, this is still backportable to stable (as > > long as the overall patch doesn't get too complex and the backport itself stays > > compatible uapi-wise to latest kernels. We've done that before.). I happen to > > have two ixgbe NICs on some of my test machines which seem to be setting the > > CHECKSUM_UNNECESSARY, so I'll run some experiments from over here as well. > > Great! I'm happy to test, of course. > I had a go at implementing option 3 as a few colleagues ran into this problem. They confirmed the fix below resolved the issue. Daniel is this roughly what you had in mind? I can submit a patch for the bpf tree if that's acceptable with the new flag. Do we need a few tests though? >From 7e0b0c78530f3800e5c40aa1fe87e5db82c5fb59 Mon Sep 17 00:00:00 2001 From: Alan Maguire <alan.maguire@xxxxxxxxxx> Date: Mon, 1 Jun 2020 13:10:37 +0200 Subject: [PATCH bpf-next 1/2] bpf: fix bpf_skb_adjust_room decap for CHECKSUM_UNNECESSESARY skbs When hardware verifies checksums for some of the headers it will set CHECKSUM_UNNECESSESARY and csum_level indicates the number of consecutive checksums found. If we de-encapsulate data however these values become invalid since we likely just removed the checksum-validated headers. The best option in such cases is to revert to CHECKSUM_NONE as all checksums will then be checked in software. Otherwise such checks can be skipped. Other checksum states are handled via skb_postpull_rcsum(). Reported-by: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> Suggested-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx> --- include/uapi/linux/bpf.h | 7 +++++++ net/core/filter.c | 15 ++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 974ca6e..03ab70c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1646,6 +1646,12 @@ struct bpf_stack_build_id { * * **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size. * Adjusting mss in this way is not allowed for datagrams. * + * * **BPF_F_ADJ_ROOM_SKIP_CSUM_RESET**: When shrinking skbs + * marked CHECKSUM_UNNECESSARY, avoid default behavior which + * resets to CHECKSUM_NONE. In most cases, this flag will + * not be needed as the default behavior ensures checksums + * will be verified in sofware. + * * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4**, * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6**: * Any new space is reserved to hold a tunnel header. @@ -3431,6 +3437,7 @@ enum { BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3), BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4), + BPF_F_ADJ_ROOM_SKIP_CSUM_RESET = (1ULL << 5), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index a6fc234..47c8a31 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3113,7 +3113,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, { int ret; - if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO) + if (flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | + BPF_F_ADJ_ROOM_SKIP_CSUM_RESET)) return -EINVAL; if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { @@ -3143,6 +3144,18 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, shinfo->gso_segs = 0; } + /* + * Decap should invalidate checksum checks done by hardware. + * skb_csum_unnecessary() is not used as the other conditions + * in that predicate do not need to be considered here; we only + * wish to downgrade CHECKSUM_UNNECESSARY to CHECKSUM_NONE. + */ + if (unlikely(!(flags & BPF_F_ADJ_ROOM_SKIP_CSUM_RESET) && + skb->ip_summed == CHECKSUM_UNNECESSARY)) { + skb->ip_summed = CHECKSUM_NONE; + skb->csum_level = 0; + } + return 0; } -- 1.8.3.1