On 5/6/20 6:24 PM, Lorenz Bauer wrote:
On Wed, 6 May 2020 at 02:28, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
On Mon, May 4, 2020 at 9:12 AM Lorenz Bauer <lmb@xxxxxxxxxxxxxx> wrote:
In our TC classifier cls_redirect [1], we use the following sequence
of helper calls to
decapsulate a GUE (basically IP + UDP + custom header) encapsulated packet:
skb_adjust_room(skb, -encap_len,
BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
bpf_redirect(skb->ifindex, BPF_F_INGRESS)
It seems like some checksums of the inner headers are not validated in
this case.
For example, a TCP SYN packet with invalid TCP checksum is still accepted by the
network stack and elicits a SYN ACK.
Is this known but undocumented behaviour or a bug? In either case, is
there a work
around I'm not aware of?
I thought inner and outer csums are covered by different flags and driver
suppose to set the right one depending on level of in-hw checking it did.
I've figured out what the problem is. We receive the following packet from
the driver:
| ETH | IP | UDP | GUE | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx
checksum offloading. On this packet we run skb_adjust_room_mac(-encap),
and get the following:
| ETH | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
Note that ip_summed is still CHECKSUM_UNNECESSARY. After
bpf_redirect()ing into the ingress, we end up in tcp_v4_rcv. There
skb_checksum_init is turned into a no-op due to
CHECKSUM_UNNECESSARY.
I think this boils down to bpf_skb_generic_pop not adjusting ip_summed
accordingly. Unfortunately I don't understand how checksums work
sufficiently. Daniel, it seems like you wrote the helper, could you
take a look?
Right, so in the skb_adjust_room() case we're not aware of protocol
specifics. We do handle the csum complete case via skb_postpull_rcsum(),
but not CHECKSUM_UNNECESSARY at the moment. I presume in your case the
skb->csum_level of the original skb prior to skb_adjust_room() call
might have been 0 (that is, covering UDP)? So if we'd add the possibility
to __skb_decr_checksum_unnecessary() via flag, then it would become
skb->ip_summed = CHECKSUM_NONE? And to be generic, we'd need to do the
same for the reverse case. Below is a quick hack (compile tested-only);
would this resolve your case ...
>>> skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO|BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
>>> bpf_redirect(skb->ifindex, BPF_F_INGRESS)
From 7439724fcfff7742223198c620349a4fc89d4835 Mon Sep 17 00:00:00 2001
Message-Id: <7439724fcfff7742223198c620349a4fc89d4835.1588801971.git.daniel@xxxxxxxxxxxxx>
From: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Date: Wed, 6 May 2020 23:50:31 +0200
Subject: [PATCH bpf-next] bpf: inc/dec csum level for csum_unnecessary in skb_adjust_room
Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
---
include/uapi/linux/bpf.h | 2 ++
net/core/filter.c | 23 ++++++++++++++++++++---
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b3643e27e264..9877807b8f28 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3279,6 +3279,8 @@ enum {
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3),
BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4),
+ BPF_F_ADJ_ROOM_INC_CSUM_LEVEL = (1ULL << 5),
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL = (1ULL << 6),
};
enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..10551dabb7b5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3008,7 +3008,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
BPF_F_ADJ_ROOM_ENCAP_L2( \
- BPF_ADJ_ROOM_ENCAP_L2_MASK))
+ BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
+ BPF_F_ADJ_ROOM_INC_CSUM_LEVEL | \
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
u64 flags)
@@ -3019,6 +3021,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
unsigned int gso_type = SKB_GSO_DODGY;
int ret;
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK &
+ ~(BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
+ return -EINVAL;
+
if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
/* udp gso_size delineates datagrams, only allow if fixed */
if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) ||
@@ -3105,6 +3111,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
shinfo->gso_segs = 0;
}
+ if (flags & BPF_F_ADJ_ROOM_INC_CSUM_LEVEL)
+ __skb_incr_checksum_unnecessary(skb);
+
return 0;
}
@@ -3113,7 +3122,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
{
int ret;
- if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO)
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)))
return -EINVAL;
if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3143,6 +3153,9 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
shinfo->gso_segs = 0;
}
+ if (flags & BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
+ __skb_decr_checksum_unnecessary(skb);
+
return 0;
}
@@ -3163,7 +3176,11 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
u32 off;
int ret;
- if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
+ if (unlikely((flags & ~BPF_F_ADJ_ROOM_MASK) ||
+ ((flags & (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)) ==
+ (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+ BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
return -EINVAL;
if (unlikely(len_diff_abs > 0xfffU))
return -EFAULT;
--
2.21.0