When using calico bpf based NAT, hits a kernel BUG_ON at function skb_segment(), line 4560. Performing NAT translation when accessing a Service IP across subnets, the calico will encap vxlan and calls the bpf_skb_adjust_room to decrease the gso_size, and then call bpf_redirect send packets out. 4438 struct sk_buff *skb_segment(struct sk_buff *head_skb, 4439 netdev_features_t features) 4440 { 4441 struct sk_buff *segs = NULL; 4442 struct sk_buff *tail = NULL; ... 4558 if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) && 4559 (skb_headlen(list_skb) == len || sg)) { 4560 BUG_ON(skb_headlen(list_skb) > len); 4561 4562 nskb = skb_clone(list_skb, GFP_ATOMIC); 4563 if (unlikely(!nskb)) 4564 goto err; call stack: ... [exception RIP: skb_segment+3016] RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293 RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011 RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1 RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011 R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00 R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63 ... The skb has the following properties: doffset = 66 list_skb = skb_shinfo(skb)->frag_list list_skb->head_frag = true skb->len = 2441 && skb->data_len = 2250 skb_shinfo(skb)->nr_frags = 17 skb_shinfo(skb)->gso_size = 75 skb_shinfo(skb)->frags[0...16].bv_len = 125 list_skb->len = 125 list_skb->data_len = 0 When slicing the frag_list skb, there three cases: 1. Only *non* head_frag sg will be false, only when skb_headlen(list_skb)==len is satisfied, it will enter the branch at line 4560, and there will be no crash. 2. Mixed head_frag sg will be false, Only when skb_headlen(list_skb)==len is satisfied, it will enter the branch at line 4560, and there will be no crash. 3. Only frag_list with head_frag=true sg is true, three cases below: (1) skb_headlen(list_skb)==len is satisfied, it will enter the branch at line 4560, and there will be no crash. (2) skb_headlen(list_skb)<len is satisfied, it will enter the branch at line 4560, and there will be no crash. (3) skb_headlen(list_skb)>len is satisfied, it will be crash. Applying this patch, three cases will be: 1. Only *non* head_frag sg will be false. No difference with before. 2. Mixed head_frag sg will be false. No difference with before. 3. Only frag_list with head_frag=true sg is true, there also three cases: (1) skb_headlen(list_skb)==len is satisfied, no difference with before. (2) skb_headlen(list_skb)<len is satisfied, will be revert to copying in this case. (3) skb_headlen(list_skb)>len is satisfied, will be revert to copying in this case. Since commit 13acc94eff122("net: permit skb_segment on head_frag frag_list skb"), it is allowed to segment the head_frag frag_list skb. Signed-off-by: Fred Li <dracodingfly@xxxxxxxxx> --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f0a9ef1aeaa2..b1dab1b071fc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4556,7 +4556,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, hsize = skb_headlen(head_skb) - offset; if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) && - (skb_headlen(list_skb) == len || sg)) { + (skb_headlen(list_skb) == len)) { BUG_ON(skb_headlen(list_skb) > len); nskb = skb_clone(list_skb, GFP_ATOMIC); -- 2.33.0