Re: Checksum behaviour of bpf_redirected packets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/6/20 6:24 PM, Lorenz Bauer wrote:
On Wed, 6 May 2020 at 02:28, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
On Mon, May 4, 2020 at 9:12 AM Lorenz Bauer <lmb@xxxxxxxxxxxxxx> wrote:

In our TC classifier cls_redirect [1], we use the following sequence
of helper calls to
decapsulate a GUE (basically IP + UDP + custom header) encapsulated packet:

   skb_adjust_room(skb, -encap_len,
BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
   bpf_redirect(skb->ifindex, BPF_F_INGRESS)

It seems like some checksums of the inner headers are not validated in
this case.
For example, a TCP SYN packet with invalid TCP checksum is still accepted by the
network stack and elicits a SYN ACK.

Is this known but undocumented behaviour or a bug? In either case, is
there a work
around I'm not aware of?

I thought inner and outer csums are covered by different flags and driver
suppose to set the right one depending on level of in-hw checking it did.

I've figured out what the problem is. We receive the following packet from
the driver:

     | ETH | IP | UDP | GUE | IP | TCP |
     skb->ip_summed == CHECKSUM_UNNECESSARY

ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx
checksum offloading. On this packet we run skb_adjust_room_mac(-encap),
and get the following:

     | ETH | IP | TCP |
     skb->ip_summed == CHECKSUM_UNNECESSARY

Note that ip_summed is still CHECKSUM_UNNECESSARY. After
bpf_redirect()ing into the ingress, we end up in tcp_v4_rcv. There
skb_checksum_init is turned into a no-op due to
CHECKSUM_UNNECESSARY.

I think this boils down to bpf_skb_generic_pop not adjusting ip_summed
accordingly. Unfortunately I don't understand how checksums work
sufficiently. Daniel, it seems like you wrote the helper, could you
take a look?

Right, so in the skb_adjust_room() case we're not aware of protocol
specifics. We do handle the csum complete case via skb_postpull_rcsum(),
but not CHECKSUM_UNNECESSARY at the moment. I presume in your case the
skb->csum_level of the original skb prior to skb_adjust_room() call
might have been 0 (that is, covering UDP)? So if we'd add the possibility
to __skb_decr_checksum_unnecessary() via flag, then it would become
skb->ip_summed = CHECKSUM_NONE? And to be generic, we'd need to do the
same for the reverse case. Below is a quick hack (compile tested-only);
would this resolve your case ...

>>>    skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO|BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
>>>    bpf_redirect(skb->ifindex, BPF_F_INGRESS)

From 7439724fcfff7742223198c620349a4fc89d4835 Mon Sep 17 00:00:00 2001
Message-Id: <7439724fcfff7742223198c620349a4fc89d4835.1588801971.git.daniel@xxxxxxxxxxxxx>
From: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Date: Wed, 6 May 2020 23:50:31 +0200
Subject: [PATCH bpf-next] bpf: inc/dec csum level for csum_unnecessary in skb_adjust_room

Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
---
 include/uapi/linux/bpf.h |  2 ++
 net/core/filter.c        | 23 ++++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b3643e27e264..9877807b8f28 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3279,6 +3279,8 @@ enum {
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
 	BPF_F_ADJ_ROOM_ENCAP_L4_GRE	= (1ULL << 3),
 	BPF_F_ADJ_ROOM_ENCAP_L4_UDP	= (1ULL << 4),
+	BPF_F_ADJ_ROOM_INC_CSUM_LEVEL	= (1ULL << 5),
+	BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL	= (1ULL << 6),
 };

 enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..10551dabb7b5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3008,7 +3008,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
 					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
 					 BPF_F_ADJ_ROOM_ENCAP_L2( \
-					  BPF_ADJ_ROOM_ENCAP_L2_MASK))
+					  BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
+					 BPF_F_ADJ_ROOM_INC_CSUM_LEVEL | \
+					 BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)

 static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 			    u64 flags)
@@ -3019,6 +3021,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 	unsigned int gso_type = SKB_GSO_DODGY;
 	int ret;

+	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK &
+			       ~(BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
+		return -EINVAL;
+
 	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
 		/* udp gso_size delineates datagrams, only allow if fixed */
 		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) ||
@@ -3105,6 +3111,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 		shinfo->gso_segs = 0;
 	}

+	if (flags & BPF_F_ADJ_ROOM_INC_CSUM_LEVEL)
+		__skb_incr_checksum_unnecessary(skb);
+
 	return 0;
 }

@@ -3113,7 +3122,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 {
 	int ret;

-	if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO)
+	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
+			       BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)))
 		return -EINVAL;

 	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3143,6 +3153,9 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 		shinfo->gso_segs = 0;
 	}

+	if (flags & BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)
+		__skb_decr_checksum_unnecessary(skb);
+
 	return 0;
 }

@@ -3163,7 +3176,11 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 	u32 off;
 	int ret;

-	if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
+	if (unlikely((flags & ~BPF_F_ADJ_ROOM_MASK) ||
+		     ((flags & (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+				BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL)) ==
+		      (BPF_F_ADJ_ROOM_INC_CSUM_LEVEL |
+		       BPF_F_ADJ_ROOM_DEC_CSUM_LEVEL))))
 		return -EINVAL;
 	if (unlikely(len_diff_abs > 0xfffU))
 		return -EFAULT;
--
2.21.0





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux