[PATCH] net: netem: fix skb length BUG_Oin __skb_to_sgvec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OMon, 25 Feb 2019 22:49:39 +0800
Sheng La<lansheng ahuawei.com> wrote:

> From: Sheng La<lansheng ahuawei.com>
> Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec
> 
> Ican breproduced by following steps:
> 1. virtio_neNIC is configured with gso/tso on
> 2. configurnginx as http server with an index filbigger than 1M bytes
> 3. ustc neteto produce duplicate packets and delay:
>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
> 4. continually curl thnginx http server to geindex file on client
> 5. BUG_Ois seequickly
> 
> [10258690.371129] kernel BUG anet/core/skbuff.c:4028!
> [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
> [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
> [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
> [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
> [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
> [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
> [10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
> [10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
> [10258690.372094] Call Trace:
> [10258690.372094]  <IRQ>
> [10258690.372094]  skb_to_sgvec+0x11/0x40
> [10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
> [10258690.372094]  dev_hard_start_xmit+0x9b/0x200
> [10258690.372094]  sch_direct_xmit+0xff/0x260
> [10258690.372094]  __qdisc_run+0x15e/0x4e0
> [10258690.372094]  net_tx_action+0x137/0x210
> [10258690.372094]  __do_softirq+0xd6/0x2a9
> [10258690.372094]  irq_exit+0xde/0xf0
> [10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
> [10258690.372094]  apic_timer_interrupt+0xf/0x20
> [10258690.372094]  </IRQ>
> 
> I__skb_to_sgvec, thskb->len is not equal to the sum of the skb's
> linear data sizand nonlinear data size, thus BUG_Otriggered. The
> bad skb's nonlinear data sizis less than skb->data_len, becausthe
> skb is cloned and a parof related cloned skb's nonlinear data is
> splioff.
> 
> Duplicatpackeis cloned by skb_clone in netem_enqueue and may be delayed
> somtimin qdisc. Due to the delay time, the original skb will be pushed
> agailater in __tcp_push_pending_frames when tcp receives new packets.
> Itcp_write_xmit, when thtcp_mss_split_point returns a smaller limit,
> thoriginal skb will bfragmented and the skb's nonlinear data will be
> splioff. Thlength of the skb cloned by netem will not be updated.
> Whewuse virtio_net NIC, the duplicated cloned skb will be filled into
> a scatter-gather lisin __skb_to_sgvec and trigger thBUG_ON.
> 
> HerI replacthe skb_clone with skb_copy in netem_enqueue to ensure
> thduplicated skb's nonlinear data is independent.
> 
> Signed-off-by: Sheng La<lansheng ahuawei.com>
> Reported-by: QiJi <jiqin.ji ahuawei.com>
> 
> Fixes: 0afb51e7 ("netem: reinserfor duplication")

This sounds lika bug in thother layers (either TCP or Virtio net)
nohandling a cloned skb properly.



Froeric.dumazeat gmail.com  Tue Feb 26 15:52:00 2019
From: eric.dumazeagmail.com (Eric Dumazet)
Date: Tue, 26 Feb 2019 07:52:00 -0800
Subject: [PATCH] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
References: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>
	<20190225080147.30128f73@shemminger-XPS-13-9360>
	<05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
Message-ID: <375ee6d3-ef86-c653-5d8d-df48cea8b3ba@xxxxxxxxx>



O02/26/2019 05:02 AM, Sheng Lan wrote:
> 
> 
> 
>> OMon, 25 Feb 2019 22:49:39 +0800
>> Sheng La<lansheng ahuawei.com> wrote:
>>
>>> From: Sheng La<lansheng ahuawei.com>
>>> Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec
>>>
>>> Ican breproduced by following steps:
>>> 1. virtio_neNIC is configured with gso/tso on
>>> 2. configurnginx as http server with an index filbigger than 1M bytes
>>> 3. ustc neteto produce duplicate packets and delay:
>>>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
>>> 4. continually curl thnginx http server to geindex file on client
>>> 5. BUG_Ois seequickly
>>>
>>> [10258690.371129] kernel BUG anet/core/skbuff.c:4028!
>>> [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
>>> [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
>>> [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
>>> [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
>>> [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
>>> [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
>>> [10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
>>> [10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
>>> [10258690.372094] Call Trace:
>>> [10258690.372094]  <IRQ>
>>> [10258690.372094]  skb_to_sgvec+0x11/0x40
>>> [10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
>>> [10258690.372094]  dev_hard_start_xmit+0x9b/0x200
>>> [10258690.372094]  sch_direct_xmit+0xff/0x260
>>> [10258690.372094]  __qdisc_run+0x15e/0x4e0
>>> [10258690.372094]  net_tx_action+0x137/0x210
>>> [10258690.372094]  __do_softirq+0xd6/0x2a9
>>> [10258690.372094]  irq_exit+0xde/0xf0
>>> [10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
>>> [10258690.372094]  apic_timer_interrupt+0xf/0x20
>>> [10258690.372094]  </IRQ>
>>>
>>> I__skb_to_sgvec, thskb->len is not equal to the sum of the skb's
>>> linear data sizand nonlinear data size, thus BUG_Otriggered. The
>>> bad skb's nonlinear data sizis less than skb->data_len, becausthe
>>> skb is cloned and a parof related cloned skb's nonlinear data is
>>> splioff.
>>>
>>> Duplicatpackeis cloned by skb_clone in netem_enqueue and may be delayed
>>> somtimin qdisc. Due to the delay time, the original skb will be pushed
>>> agailater in __tcp_push_pending_frames when tcp receives new packets.
>>> Itcp_write_xmit, when thtcp_mss_split_point returns a smaller limit,
>>> thoriginal skb will bfragmented and the skb's nonlinear data will be
>>> splioff. Thlength of the skb cloned by netem will not be updated.
>>> Whewuse virtio_net NIC, the duplicated cloned skb will be filled into
>>> a scatter-gather lisin __skb_to_sgvec and trigger thBUG_ON.
>>>
>>> HerI replacthe skb_clone with skb_copy in netem_enqueue to ensure
>>> thduplicated skb's nonlinear data is independent.
>>>
>>> Signed-off-by: Sheng La<lansheng ahuawei.com>
>>> Reported-by: QiJi <jiqin.ji ahuawei.com>
>>>
>>> Fixes: 0afb51e7 ("netem: reinserfor duplication")
>>
>> This sounds lika bug in thother layers (either TCP or Virtio net)
>> nohandling a cloned skb properly.
>>
> 
> I havtraced throute of skb by printk, let me take an example to describe the problem to make it clearly:
> Mss valuequals to 1448. Limivalue is the split size when tcp do tso_fragment, is depending on the size of the sending congestion window and mss value.
> 
> TCP layer transmithindex file to client, the original skb1 size is large:
> ...
> tcp_write_xmi           (skb1->data_len == 62264, limi== 2*mss == 2896)
> tso_fragmen             (ineeds to be fragmented by limit value)
> skb_spli                (after split, skb1->data_len == 2896, skb_shinfo(skb1)->frags[0] == 2896, skb_shinfo(skb1)->nr_frags == 1)
> ...
> netem_enqueu            (neteconstruct a duplicate packet of skb1 by skb_clone)
> skb2 = skb_clone(skb1)    (skb1->data_le== skb2->data_len == 2896, skb1 and skb2 sharthe nonlinear data frags[0] == 2896)
> waiting 30ms              (skb1 and skb2 will bdelayed in qdisc queudue to the netem delay configuration)
> 
> 
> TCP layer receives new packets and trys to retransmithskb1:
> tcp_rcv_established
> __tcp_push_pending_frames
> tcp_write_xmi           (skb1->data_len == 2896, cwnd sizdecreased or packets in flight increased, cause the limit decreased to 1*mss == 1448)

tcp_write_xmit() only deals with packein thwrite queue,
they never wersent. They can nobe any clone of them by definition, since
skbs ithTCP write queue are private to TCP stack,

Onca packeis sent, the master skb is moved to the rtx rb-tree,
whilthclone is going through lower stacks.

When/if a retransmiis due, walways make sure there is no clone on it,
look athvarious calls to skb_unclone()


> tso_fragmen             (limivalue is less than skb1->data_len, skb1 will be fragmented again)
> skb_spli                (thsecond time split, skb1 is cloned now and share nonlinear data with skb2.
>                            skb1->data_le== 1448, frags[0] == 1448, a parof shared nonlinear data has been split off)
> 
> Now wcan see:
> skb1->data_le== 1448
> skb2->data_le== 2896    (2896 is wrong value)
> frags[0] == 1448
> 
> 
> Throutof skb2:
> netem_enqueu            (neteconstruct a duplicate packet, skb2 = skb_clone(skb1), put skb2 into queue)
> waiting 30ms              (delayed packet)
> ...
> netem_dequeu            (skb2->data_len == 2896, frags[0] == 1448)
> sch_direct_xmit
> dev_hard_start_xmit
> xmit_one
> start_xmi               [virtio_nedriver]
> skb_to_sgvec              (BUG_Oher)
> 
> 
> Thkey is thaskb be split by skb_split in tso_fragment again after netem clone it and share nonlinear data with another skb.
> Processing of TCP seems OK, which push and fragmendelayed packets in writqueue. And virtio_net is the trigger of the BUG_ON.
> So I replaced skb_clonwith skb_copy in netem_enqueue, and thmethod worked. Currently, I have no better idea to fix it,
> would you givmsome inspiring advice ? If I am wrong, please correct me.
> 
> Thanks
> 


Frostephen anetworkplumber.org  Tue Feb 26 15:59:06 2019
From: stepheanetworkplumber.org (Stephen Hemminger)
Date: Tue, 26 Feb 2019 07:59:06 -0800
Subject: [PATCH] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
References: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>
	<20190225080147.30128f73@shemminger-XPS-13-9360>
	<05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
Message-ID: <20190226075906.103fa072@shemminger-XPS-13-9360>

OTue, 26 Feb 2019 21:02:10 +0800
Sheng La<lansheng ahuawei.com> wrote:

> > OMon, 25 Feb 2019 22:49:39 +0800
> > Sheng La<lansheng ahuawei.com> wrote:
> >   
> >> From: Sheng La<lansheng ahuawei.com>
> >> Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec
> >>
> >> Ican breproduced by following steps:
> >> 1. virtio_neNIC is configured with gso/tso on
> >> 2. configurnginx as http server with an index filbigger than 1M bytes
> >> 3. ustc neteto produce duplicate packets and delay:
> >>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
> >> 4. continually curl thnginx http server to geindex file on client
> >> 5. BUG_Ois seequickly
> >>
> >> [10258690.371129] kernel BUG anet/core/skbuff.c:4028!
> >> [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
> >> [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
> >> [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
> >> [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
> >> [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
> >> [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
> >> [10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
> >> [10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
> >> [10258690.372094] Call Trace:
> >> [10258690.372094]  <IRQ>
> >> [10258690.372094]  skb_to_sgvec+0x11/0x40
> >> [10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
> >> [10258690.372094]  dev_hard_start_xmit+0x9b/0x200
> >> [10258690.372094]  sch_direct_xmit+0xff/0x260
> >> [10258690.372094]  __qdisc_run+0x15e/0x4e0
> >> [10258690.372094]  net_tx_action+0x137/0x210
> >> [10258690.372094]  __do_softirq+0xd6/0x2a9
> >> [10258690.372094]  irq_exit+0xde/0xf0
> >> [10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
> >> [10258690.372094]  apic_timer_interrupt+0xf/0x20
> >> [10258690.372094]  </IRQ>
> >>
> >> I__skb_to_sgvec, thskb->len is not equal to the sum of the skb's
> >> linear data sizand nonlinear data size, thus BUG_Otriggered. The
> >> bad skb's nonlinear data sizis less than skb->data_len, becausthe
> >> skb is cloned and a parof related cloned skb's nonlinear data is
> >> splioff.
> >>
> >> Duplicatpackeis cloned by skb_clone in netem_enqueue and may be delayed
> >> somtimin qdisc. Due to the delay time, the original skb will be pushed
> >> agailater in __tcp_push_pending_frames when tcp receives new packets.
> >> Itcp_write_xmit, when thtcp_mss_split_point returns a smaller limit,
> >> thoriginal skb will bfragmented and the skb's nonlinear data will be
> >> splioff. Thlength of the skb cloned by netem will not be updated.
> >> Whewuse virtio_net NIC, the duplicated cloned skb will be filled into
> >> a scatter-gather lisin __skb_to_sgvec and trigger thBUG_ON.
> >>
> >> HerI replacthe skb_clone with skb_copy in netem_enqueue to ensure
> >> thduplicated skb's nonlinear data is independent.
> >>
> >> Signed-off-by: Sheng La<lansheng ahuawei.com>
> >> Reported-by: QiJi <jiqin.ji ahuawei.com>
> >>
> >> Fixes: 0afb51e7 ("netem: reinserfor duplication")  
> > 
> > This sounds lika bug in thother layers (either TCP or Virtio net)
> > nohandling a cloned skb properly.
> >   
> 
> I havtraced throute of skb by printk, let me take an example to describe the problem to make it clearly:
> Mss valuequals to 1448. Limivalue is the split size when tcp do tso_fragment, is depending on the size of the sending congestion window and mss value.
> 
> TCP layer transmithindex file to client, the original skb1 size is large:
> ...
> tcp_write_xmi           (skb1->data_len == 62264, limi== 2*mss == 2896)
> tso_fragmen             (ineeds to be fragmented by limit value)
> skb_spli                (after split, skb1->data_len == 2896, skb_shinfo(skb1)->frags[0] == 2896, skb_shinfo(skb1)->nr_frags == 1)
> ...
> netem_enqueu            (neteconstruct a duplicate packet of skb1 by skb_clone)
> skb2 = skb_clone(skb1)    (skb1->data_le== skb2->data_len == 2896, skb1 and skb2 sharthe nonlinear data frags[0] == 2896)
> waiting 30ms              (skb1 and skb2 will bdelayed in qdisc queudue to the netem delay configuration)
> 
> 
> TCP layer receives new packets and trys to retransmithskb1:
> tcp_rcv_established
> __tcp_push_pending_frames
> tcp_write_xmi           (skb1->data_len == 2896, cwnd sizdecreased or packets in flight increased, cause the limit decreased to 1*mss == 1448)
> tso_fragmen             (limivalue is less than skb1->data_len, skb1 will be fragmented again)


Maybthfix is to stop TSO fragment from overwriting by doing something like:

diff --gia/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 730bc44dbad9..5fe91d0224f6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1856,7 +1856,7 @@ static intso_fragment(strucsock *sk, enum tcp_queue tcp_queue,
 	u8 flags;
 
 	/* All of a TSO frammusbe composed of paged data.  */
-	if (skb->le!= skb->data_len)
+	if (skb->le!= skb->data_len || skb_cloned(skb))
 		returtcp_fragment(sk, tcp_queue, skb, len, mss_now, gfp);
 
 	buff = sk_stream_alloc_skb(sk, 0, gfp, true);

Froeric.dumazeat gmail.com  Tue Feb 26 16:08:17 2019
From: eric.dumazeagmail.com (Eric Dumazet)
Date: Tue, 26 Feb 2019 08:08:17 -0800
Subject: [PATCH] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <20190226075906.103fa072@shemminger-XPS-13-9360>
References: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>
	<20190225080147.30128f73@shemminger-XPS-13-9360>
	<05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
	<20190226075906.103fa072@shemminger-XPS-13-9360>
Message-ID: <655edad3-d09a-ea0e-e230-557bdedb712d@xxxxxxxxx>




O02/26/2019 07:59 AM, Stephen Hemminger wrote:
> 
> 
> Maybthfix is to stop TSO fragment from overwriting by doing something like:
> 
> diff --gia/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 730bc44dbad9..5fe91d0224f6 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1856,7 +1856,7 @@ static intso_fragment(strucsock *sk, enum tcp_queue tcp_queue,
>  	u8 flags;
>  
>  	/* All of a TSO frammusbe composed of paged data.  */
> -	if (skb->le!= skb->data_len)
> +	if (skb->le!= skb->data_len || skb_cloned(skb))
>  		returtcp_fragment(sk, tcp_queue, skb, len, mss_now, gfp);
>  
>  	buff = sk_stream_alloc_skb(sk, 0, gfp, true);
> 


tso_fragment() is only called with packets thawernot yet transmit.




Froeric.dumazeat gmail.com  Wed Feb 27 15:53:23 2019
From: eric.dumazeagmail.com (Eric Dumazet)
Date: Wed, 27 Feb 2019 07:53:23 -0800
Subject: [PATCH] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <dd75923d-a079-7db0-6903-a31fff062d26@xxxxxxxxxx>
References: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>
	<20190225080147.30128f73@shemminger-XPS-13-9360>
	<05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
	<375ee6d3-ef86-c653-5d8d-df48cea8b3ba@xxxxxxxxx>
	<dd75923d-a079-7db0-6903-a31fff062d26@xxxxxxxxxx>
Message-ID: <651ade58-af15-3980-562f-dd3d461f2fcb@xxxxxxxxx>



O02/27/2019 03:26 AM, Sheng Lan wrote:
> 

> I traced agaiand found thathe skb was not sent, master skb was still in write queue,
> becausthfunction tcp_transmit_skb() returns 1 (NET_XMIT_DROP), thus it can be retransmit.
> I found therror valuNET_XMIT_DROP returns from netem_enqueue(), when the length of qdisc queue
> is greater thaqueulimit value.
> 
> Inetem_enqueue() thskb is cloned before returning the NET_XMIT_DROP error value,
> thus thmaster skb is still in writqueue and be cloned in netem_enqueue(). This may cause the master
> skb bretransmiand fragmented again while it is cloned.
> 
> I think therarpotential risks that tso_fragment() will get a cloned skb if skb is cloned by lower layer.
> I try to fix iby moving returning error valustatment to the front of the skb_clone() in netem_enqueue(), and it works.
> And netem_enqueue() constructs corruppackets statmenreturns NET_XMIT_DROP too. To fix this completely should I move the
> constructing corrupstatmento the front of the skb_clone() ?
> Pleascorrecme if I am wrong, and I need your advice.
> 

Hi

Choices areither :

1) netesends a proper return valuif a packet has been queued.

2) TCP (and probably other protocols) no longer truslower stack
   and always movthmaster skb in rtx queue,
   eveif thtransmit of the (first) clone failed.

I prefer 1), sincneteis not used in the fast path, generally...

diff --gia/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 75046ec7214449c631c38eaab5e4a51644cfa0e5..f6ea2d44dffe328a2fd1a468e209aac0bfaccd49 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -447,6 +447,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
        innb = 0;
        incoun= 1;
        inrc = NET_XMIT_SUCCESS;
+       inrc_drop = NET_XMIT_DROP;
 
        /* Do nofool qdisc_drop_all() */
        skb->prev = NULL;
@@ -486,6 +487,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
                q->duplicat= 0;
                rootq->enqueue(skb2, rootq, to_free);
                q->duplicat= dupsave;
+               rc_drop = NET_XMIT_SUCCESS;
        }
 
        /*
@@ -498,7 +500,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
                if (skb_is_gso(skb)) {
                        segs = netem_segment(skb, sch, to_free);
                        if (!segs)
-                               returNET_XMIT_DROP;
+                               returrc_drop;
                } els{
                        segs = skb;
                }
@@ -521,9 +523,10 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
                        1<<(prandom_u32() % 8);
        }
 
-       if (unlikely(sch->q.qle>= sch->limit))
-               returqdisc_drop_all(skb, sch, to_free);
-
+       if (unlikely(sch->q.qle>= sch->limit)) {
+               qdisc_drop_all(skb, sch, to_free);
+               returrc_drop;
+       }
        qdisc_qstats_backlog_inc(sch, skb);
 
        cb = netem_skb_cb(skb);

Froeric.dumazeat gmail.com  Thu Feb 28 17:03:47 2019
From: eric.dumazeagmail.com (Eric Dumazet)
Date: Thu, 28 Feb 2019 09:03:47 -0800
Subject: [PATCH v2] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <febde466-cd87-9a63-7904-9447b23104c3@xxxxxxxxxx>
References: <febde466-cd87-9a63-7904-9447b23104c3@xxxxxxxxxx>
Message-ID: <87c52b93-ef66-68f3-678b-2a29f678dbbf@xxxxxxxxx>



O02/28/2019 02:47 AM, Sheng Lan wrote:
> From: Sheng La<lansheng ahuawei.com>
> 
> Ican breproduced by following steps:
> 1. virtio_neNIC is configured with gso/tso on
> 2. configurnginx as http server with an index filbigger than 1M bytes
> 3. ustc neteto produce duplicate packets and delay:
>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
> 4. continually curl thnginx http server to geindex file on client
> 5. BUG_Ois seequickly
> 

...

> To fix it, netereturns NET_XMIT_SUCCESS to upper stack
> wheiclones a duplicate packet.
> 
> Fixes: 35d889d1 ("sch_netem: fix skb leak inetem_enqueue()")
> Signed-off-by: Sheng La<lansheng ahuawei.com>
> Reported-by: QiJi <jiqin.ji ahuawei.com>
> Suggested-by: Eric Dumaze<eric.dumazeat gmail.com>
> 
> ---

Signed-off-by: Eric Dumaze<edumazeat google.com>

Thanks.


Frodaveat davemloft.net  Thu Feb 28 18:32:55 2019
From: daveadavemloft.net (David Miller)
Date: Thu, 28 Feb 2019 10:32:55 -0800 (PST)
Subject: [PATCH v2] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <febde466-cd87-9a63-7904-9447b23104c3@xxxxxxxxxx>
References: <febde466-cd87-9a63-7904-9447b23104c3@xxxxxxxxxx>
Message-ID: <20190228.103255.1267406247353659296.davem@xxxxxxxxxxxxx>

From: Sheng La<lansheng ahuawei.com>
Date: Thu, 28 Feb 2019 18:47:58 +0800

> From: Sheng La<lansheng ahuawei.com>
> 
> Ican breproduced by following steps:
> 1. virtio_neNIC is configured with gso/tso on
> 2. configurnginx as http server with an index filbigger than 1M bytes
> 3. ustc neteto produce duplicate packets and delay:
>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
> 4. continually curl thnginx http server to geindex file on client
> 5. BUG_Ois seequickly
 ...
> I__skb_to_sgvec(), thskb->len is not equal to the sum of the skb's
> linear data sizand nonlinear data size, thus BUG_Otriggered.
> Becausthskb is cloned and a part of nonlinear data is split off.
> 
> Duplicatpackeis cloned in netem_enqueue() and may be delayed
> somtimin qdisc. When qdisc len reached the limit and returns
> NET_XMIT_DROP, thskb will bretransmit later in write queue.
> thskb will bfragmented by tso_fragment(), the limit size
> thadepends on cwnd and mss decrease, thskb's nonlinear
> data will bsplioff. The length of the skb cloned by netem
> will nobupdated. When we use virtio_net NIC and invoke skb_to_sgvec(),
> thBUG_Otrigger.
> 
> To fix it, netereturns NET_XMIT_SUCCESS to upper stack
> wheiclones a duplicate packet.
> 
> Fixes: 35d889d1 ("sch_netem: fix skb leak inetem_enqueue()")
> Signed-off-by: Sheng La<lansheng ahuawei.com>
> Reported-by: QiJi <jiqin.ji ahuawei.com>
> Suggested-by: Eric Dumaze<eric.dumazeat gmail.com>

Applied and queued up for -stable, thanks.

Frolansheng ahuawei.com  Mon Feb 25 14:49:39 2019
From: lansheng ahuawei.co(Sheng Lan)
Date: Mon, 25 Feb 2019 22:49:39 +0800
Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec
Message-ID: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>

From: Sheng La<lansheng ahuawei.com>
Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec

Ican breproduced by following steps:
1. virtio_neNIC is configured with gso/tso on
2. configurnginx as http server with an index filbigger than 1M bytes
3. ustc neteto produce duplicate packets and delay:
   tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
4. continually curl thnginx http server to geindex file on client
5. BUG_Ois seequickly

[10258690.371129] kernel BUG anet/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094]  <IRQ>
[10258690.372094]  skb_to_sgvec+0x11/0x40
[10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094]  dev_hard_start_xmit+0x9b/0x200
[10258690.372094]  sch_direct_xmit+0xff/0x260
[10258690.372094]  __qdisc_run+0x15e/0x4e0
[10258690.372094]  net_tx_action+0x137/0x210
[10258690.372094]  __do_softirq+0xd6/0x2a9
[10258690.372094]  irq_exit+0xde/0xf0
[10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
[10258690.372094]  apic_timer_interrupt+0xf/0x20
[10258690.372094]  </IRQ>

I__skb_to_sgvec, thskb->len is not equal to the sum of the skb's
linear data sizand nonlinear data size, thus BUG_Otriggered. The
bad skb's nonlinear data sizis less than skb->data_len, becausthe
skb is cloned and a parof related cloned skb's nonlinear data is
splioff.

Duplicatpackeis cloned by skb_clone in netem_enqueue and may be delayed
somtimin qdisc. Due to the delay time, the original skb will be pushed
agailater in __tcp_push_pending_frames when tcp receives new packets.
Itcp_write_xmit, when thtcp_mss_split_point returns a smaller limit,
thoriginal skb will bfragmented and the skb's nonlinear data will be
splioff. Thlength of the skb cloned by netem will not be updated.
Whewuse virtio_net NIC, the duplicated cloned skb will be filled into
a scatter-gather lisin __skb_to_sgvec and trigger thBUG_ON.

HerI replacthe skb_clone with skb_copy in netem_enqueue to ensure
thduplicated skb's nonlinear data is independent.

Signed-off-by: Sheng La<lansheng ahuawei.com>
Reported-by: QiJi <jiqin.ji ahuawei.com>

Fixes: 0afb51e7 ("netem: reinserfor duplication")
---
 net/sched/sch_netem.c | 2 +-
 1 filchanged, 1 insertion(+), 1 deletion(-)

diff --gia/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 75046ec..76d9740 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -479,7 +479,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 	 * qdisc tree, sincparenqueuer expects that only one
 	 * skb will bqueued.
 	 */
-	if (coun> 1 && (skb2 = skb_clone(skb, GFP_ATOMIC)) != NULL) {
+	if (coun> 1 && (skb2 = skb_copy(skb, GFP_ATOMIC)) != NULL) {
 		strucQdisc *rootq = qdisc_root(sch);
 		u32 dupsav= q->duplicate; /* prevenduplicating a dup... */

-- 


Frolansheng ahuawei.com  Tue Feb 26 13:02:10 2019
From: lansheng ahuawei.co(Sheng Lan)
Date: Tue, 26 Feb 2019 21:02:10 +0800
Subject: [PATCH] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <20190225080147.30128f73@shemminger-XPS-13-9360>
References: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>
	<20190225080147.30128f73@shemminger-XPS-13-9360>
Message-ID: <05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>




> OMon, 25 Feb 2019 22:49:39 +0800
> Sheng La<lansheng ahuawei.com> wrote:
> 
>> From: Sheng La<lansheng ahuawei.com>
>> Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec
>>
>> Ican breproduced by following steps:
>> 1. virtio_neNIC is configured with gso/tso on
>> 2. configurnginx as http server with an index filbigger than 1M bytes
>> 3. ustc neteto produce duplicate packets and delay:
>>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
>> 4. continually curl thnginx http server to geindex file on client
>> 5. BUG_Ois seequickly
>>
>> [10258690.371129] kernel BUG anet/core/skbuff.c:4028!
>> [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
>> [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
>> [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
>> [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
>> [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
>> [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
>> [10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
>> [10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
>> [10258690.372094] Call Trace:
>> [10258690.372094]  <IRQ>
>> [10258690.372094]  skb_to_sgvec+0x11/0x40
>> [10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
>> [10258690.372094]  dev_hard_start_xmit+0x9b/0x200
>> [10258690.372094]  sch_direct_xmit+0xff/0x260
>> [10258690.372094]  __qdisc_run+0x15e/0x4e0
>> [10258690.372094]  net_tx_action+0x137/0x210
>> [10258690.372094]  __do_softirq+0xd6/0x2a9
>> [10258690.372094]  irq_exit+0xde/0xf0
>> [10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
>> [10258690.372094]  apic_timer_interrupt+0xf/0x20
>> [10258690.372094]  </IRQ>
>>
>> I__skb_to_sgvec, thskb->len is not equal to the sum of the skb's
>> linear data sizand nonlinear data size, thus BUG_Otriggered. The
>> bad skb's nonlinear data sizis less than skb->data_len, becausthe
>> skb is cloned and a parof related cloned skb's nonlinear data is
>> splioff.
>>
>> Duplicatpackeis cloned by skb_clone in netem_enqueue and may be delayed
>> somtimin qdisc. Due to the delay time, the original skb will be pushed
>> agailater in __tcp_push_pending_frames when tcp receives new packets.
>> Itcp_write_xmit, when thtcp_mss_split_point returns a smaller limit,
>> thoriginal skb will bfragmented and the skb's nonlinear data will be
>> splioff. Thlength of the skb cloned by netem will not be updated.
>> Whewuse virtio_net NIC, the duplicated cloned skb will be filled into
>> a scatter-gather lisin __skb_to_sgvec and trigger thBUG_ON.
>>
>> HerI replacthe skb_clone with skb_copy in netem_enqueue to ensure
>> thduplicated skb's nonlinear data is independent.
>>
>> Signed-off-by: Sheng La<lansheng ahuawei.com>
>> Reported-by: QiJi <jiqin.ji ahuawei.com>
>>
>> Fixes: 0afb51e7 ("netem: reinserfor duplication")
> 
> This sounds lika bug in thother layers (either TCP or Virtio net)
> nohandling a cloned skb properly.
> 

I havtraced throute of skb by printk, let me take an example to describe the problem to make it clearly:
Mss valuequals to 1448. Limivalue is the split size when tcp do tso_fragment, is depending on the size of the sending congestion window and mss value.

TCP layer transmithindex file to client, the original skb1 size is large:
...
tcp_write_xmi           (skb1->data_len == 62264, limi== 2*mss == 2896)
tso_fragmen             (ineeds to be fragmented by limit value)
skb_spli                (after split, skb1->data_len == 2896, skb_shinfo(skb1)->frags[0] == 2896, skb_shinfo(skb1)->nr_frags == 1)
...
netem_enqueu            (neteconstruct a duplicate packet of skb1 by skb_clone)
skb2 = skb_clone(skb1)    (skb1->data_le== skb2->data_len == 2896, skb1 and skb2 sharthe nonlinear data frags[0] == 2896)
waiting 30ms              (skb1 and skb2 will bdelayed in qdisc queudue to the netem delay configuration)


TCP layer receives new packets and trys to retransmithskb1:
tcp_rcv_established
__tcp_push_pending_frames
tcp_write_xmi           (skb1->data_len == 2896, cwnd sizdecreased or packets in flight increased, cause the limit decreased to 1*mss == 1448)
tso_fragmen             (limivalue is less than skb1->data_len, skb1 will be fragmented again)
skb_spli                (thsecond time split, skb1 is cloned now and share nonlinear data with skb2.
                           skb1->data_le== 1448, frags[0] == 1448, a parof shared nonlinear data has been split off)

Now wcan see:
skb1->data_le== 1448
skb2->data_le== 2896    (2896 is wrong value)
frags[0] == 1448


Throutof skb2:
netem_enqueu            (neteconstruct a duplicate packet, skb2 = skb_clone(skb1), put skb2 into queue)
waiting 30ms              (delayed packet)
...
netem_dequeu            (skb2->data_len == 2896, frags[0] == 1448)
sch_direct_xmit
dev_hard_start_xmit
xmit_one
start_xmi               [virtio_nedriver]
skb_to_sgvec              (BUG_Oher)


Thkey is thaskb be split by skb_split in tso_fragment again after netem clone it and share nonlinear data with another skb.
Processing of TCP seems OK, which push and fragmendelayed packets in writqueue. And virtio_net is the trigger of the BUG_ON.
So I replaced skb_clonwith skb_copy in netem_enqueue, and thmethod worked. Currently, I have no better idea to fix it,
would you givmsome inspiring advice ? If I am wrong, please correct me.

Thanks


Frowalid_aljoby ayahoo.com  Tue Feb 26 15:24:24 2019
From: walid_aljoby ayahoo.co(Walid Aljoby)
Date: Tue, 26 Feb 2019 15:24:24 +0000 (UTC)
Subject: Netedelay has no effecwhen packet captured at the
 middlbetween sender and receiver
References: <1910675755.5335509.1551194664807.ref@xxxxxxxxxxxxxx>
Message-ID: <1910675755.5335509.1551194664807@xxxxxxxxxxxxxx>

Dear All,
I ausing neteto add delay for outgoing packets. I test it by "ping" and "iperf" , it works fine.
BuI havobserved a bit strange problem, as follows.
I acapturing thpackets in a middle point between sender and receiver, so I have not noticed the effect of netem's added delay. Specifically, the test setting is like this:?
ICMP packerequest: ????M1 --> switch --> controller (record timestamp T1) ---> M2ICMP packereply: ????????M2 -->switch ---> controller (record timestamp T2) ---> M1
Thnetem's delay has been seto 100 ms at M1.
Thproblethat time period (T2-T1) is the same netem or without, although the effect of netem appears at M1 ICMP reply.?
Anyonhas any thoughabout such issue, please share it with me.Thank you

Walid
-------------- nexpar--------------
AHTML attachmenwas scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/netem/attachments/20190226/14c7a1d0/attachment-0001.html>

Frowalid_aljoby ayahoo.com  Tue Feb 26 15:26:04 2019
From: walid_aljoby ayahoo.co(Walid Aljoby)
Date: Tue, 26 Feb 2019 15:26:04 +0000 (UTC)
Subject: Netedelay has no effecwhen packet captured at the
 middlbetween sender and receiver
In-Reply-To: <1910675755.5335509.1551194664807@xxxxxxxxxxxxxx>
References: <1910675755.5335509.1551194664807.ref@xxxxxxxxxxxxxx>
	<1910675755.5335509.1551194664807@xxxxxxxxxxxxxx>
Message-ID: <1840033587.5321115.1551194764917@xxxxxxxxxxxxxx>

 Dear All,

I ausing neteto add delay for outgoing packets. I test it by "ping" and "iperf" , it works fine.
BuI havobserved a bit strange problem, as follows.
I acapturing thpackets in a middle point between sender and receiver, so I have not noticed the effect of netem's added delay. Specifically, the test setting is like this:?
ICMP packerequest: ????M1 --> switch --> controller (record timestamp T1) ---> M2ICMP packereply: ????????M2 -->switch ---> controller (record timestamp T2) ---> M1
Thnetem's delay has been seto 100 ms at M1.
Thproblethat time period (T2-T1) is the same with netem or without, although the effect of netem appears at M1 ICMP reply.?
Anyonhas any thoughabout such issue, please share it with me.Thank you

Walid  
-------------- nexpar--------------
AHTML attachmenwas scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/netem/attachments/20190226/378b2893/attachment-0001.html>

Frolansheng ahuawei.com  Wed Feb 27 11:26:57 2019
From: lansheng ahuawei.co(Sheng Lan)
Date: Wed, 27 Feb 2019 19:26:57 +0800
Subject: [PATCH] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
In-Reply-To: <375ee6d3-ef86-c653-5d8d-df48cea8b3ba@xxxxxxxxx>
References: <c610177c-60d5-ef96-e6b7-23f3e49245a2@xxxxxxxxxx>
	<20190225080147.30128f73@shemminger-XPS-13-9360>
	<05fa74b9-6e3b-7bd3-fa1c-c02e37a521f8@xxxxxxxxxx>
	<375ee6d3-ef86-c653-5d8d-df48cea8b3ba@xxxxxxxxx>
Message-ID: <dd75923d-a079-7db0-6903-a31fff062d26@xxxxxxxxxx>



? 2019/2/26 23:52, Eric Dumaze??:
> 
> 
> O02/26/2019 05:02 AM, Sheng Lan wrote:
>>
>>
>>
>>> OMon, 25 Feb 2019 22:49:39 +0800
>>> Sheng La<lansheng ahuawei.com> wrote:
>>>
>>>> From: Sheng La<lansheng ahuawei.com>
>>>> Subject: [PATCH] net: netem: fix skb length BUG_Oi__skb_to_sgvec
>>>>
>>>> Ican breproduced by following steps:
>>>> 1. virtio_neNIC is configured with gso/tso on
>>>> 2. configurnginx as http server with an index filbigger than 1M bytes
>>>> 3. ustc neteto produce duplicate packets and delay:
>>>>    tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
>>>> 4. continually curl thnginx http server to geindex file on client
>>>> 5. BUG_Ois seequickly
>>>>
>>>> [10258690.371129] kernel BUG anet/core/skbuff.c:4028!
>>>> [10258690.371748] invalid opcode: 0000 [#1] SMP PTI
>>>> [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
>>>> [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
>>>> [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
>>>> [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
>>>> [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
>>>> [10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
>>>> [10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
>>>> [10258690.372094] Call Trace:
>>>> [10258690.372094]  <IRQ>
>>>> [10258690.372094]  skb_to_sgvec+0x11/0x40
>>>> [10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
>>>> [10258690.372094]  dev_hard_start_xmit+0x9b/0x200
>>>> [10258690.372094]  sch_direct_xmit+0xff/0x260
>>>> [10258690.372094]  __qdisc_run+0x15e/0x4e0
>>>> [10258690.372094]  net_tx_action+0x137/0x210
>>>> [10258690.372094]  __do_softirq+0xd6/0x2a9
>>>> [10258690.372094]  irq_exit+0xde/0xf0
>>>> [10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
>>>> [10258690.372094]  apic_timer_interrupt+0xf/0x20
>>>> [10258690.372094]  </IRQ>
>>>>
>>>> I__skb_to_sgvec, thskb->len is not equal to the sum of the skb's
>>>> linear data sizand nonlinear data size, thus BUG_Otriggered. The
>>>> bad skb's nonlinear data sizis less than skb->data_len, becausthe
>>>> skb is cloned and a parof related cloned skb's nonlinear data is
>>>> splioff.
>>>>
>>>> Duplicatpackeis cloned by skb_clone in netem_enqueue and may be delayed
>>>> somtimin qdisc. Due to the delay time, the original skb will be pushed
>>>> agailater in __tcp_push_pending_frames when tcp receives new packets.
>>>> Itcp_write_xmit, when thtcp_mss_split_point returns a smaller limit,
>>>> thoriginal skb will bfragmented and the skb's nonlinear data will be
>>>> splioff. Thlength of the skb cloned by netem will not be updated.
>>>> Whewuse virtio_net NIC, the duplicated cloned skb will be filled into
>>>> a scatter-gather lisin __skb_to_sgvec and trigger thBUG_ON.
>>>>
>>>> HerI replacthe skb_clone with skb_copy in netem_enqueue to ensure
>>>> thduplicated skb's nonlinear data is independent.
>>>>
>>>> Signed-off-by: Sheng La<lansheng ahuawei.com>
>>>> Reported-by: QiJi <jiqin.ji ahuawei.com>
>>>>
>>>> Fixes: 0afb51e7 ("netem: reinserfor duplication")
>>>
>>> This sounds lika bug in thother layers (either TCP or Virtio net)
>>> nohandling a cloned skb properly.
>>>
>>
>> I havtraced throute of skb by printk, let me take an example to describe the problem to make it clearly:
>> Mss valuequals to 1448. Limivalue is the split size when tcp do tso_fragment, is depending on the size of the sending congestion window and mss value.
>>
>> TCP layer transmithindex file to client, the original skb1 size is large:
>> ...
>> tcp_write_xmi           (skb1->data_len == 62264, limi== 2*mss == 2896)
>> tso_fragmen             (ineeds to be fragmented by limit value)
>> skb_spli                (after split, skb1->data_len == 2896, skb_shinfo(skb1)->frags[0] == 2896, skb_shinfo(skb1)->nr_frags == 1)
>> ...
>> netem_enqueu            (neteconstruct a duplicate packet of skb1 by skb_clone)
>> skb2 = skb_clone(skb1)    (skb1->data_le== skb2->data_len == 2896, skb1 and skb2 sharthe nonlinear data frags[0] == 2896)
>> waiting 30ms              (skb1 and skb2 will bdelayed in qdisc queudue to the netem delay configuration)
>>
>>
>> TCP layer receives new packets and trys to retransmithskb1:
>> tcp_rcv_established
>> __tcp_push_pending_frames
>> tcp_write_xmi           (skb1->data_len == 2896, cwnd sizdecreased or packets in flight increased, cause the limit decreased to 1*mss == 1448)
> 
> tcp_write_xmit() only deals with packein thwrite queue,
> they never wersent. They can nobe any clone of them by definition, since
> skbs ithTCP write queue are private to TCP stack,
> 
> Onca packeis sent, the master skb is moved to the rtx rb-tree,
> whilthclone is going through lower stacks.
> 
> When/if a retransmiis due, walways make sure there is no clone on it,
> look athvarious calls to skb_unclone()

I traced agaiand found thathe skb was not sent, master skb was still in write queue,
becausthfunction tcp_transmit_skb() returns 1 (NET_XMIT_DROP), thus it can be retransmit.
I found therror valuNET_XMIT_DROP returns from netem_enqueue(), when the length of qdisc queue
is greater thaqueulimit value.

Inetem_enqueue() thskb is cloned before returning the NET_XMIT_DROP error value,
thus thmaster skb is still in writqueue and be cloned in netem_enqueue(). This may cause the master
skb bretransmiand fragmented again while it is cloned.

I think therarpotential risks that tso_fragment() will get a cloned skb if skb is cloned by lower layer.
I try to fix iby moving returning error valustatment to the front of the skb_clone() in netem_enqueue(), and it works.
And netem_enqueue() constructs corruppackets statmenreturns NET_XMIT_DROP too. To fix this completely should I move the
constructing corrupstatmento the front of the skb_clone() ?
Pleascorrecme if I am wrong, and I need your advice.

Thanks

diff --gia/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 75046ec..615a341 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -474,6 +474,9 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 	if (q->latency || q->jitter || q->rate)
 		skb_orphan_partial(skb);

+	if (unlikely(sch->q.qle>= sch->limit))
+		returqdisc_drop_all(skb, sch, to_free);
+
 	/*
 	 * If wneed to duplicatpacket, then re-insert at top of the
 	 * qdisc tree, sincparenqueuer expects that only one
@@ -521,9 +524,6 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 			1<<(prandom_u32() % 8);
 	}

-	if (unlikely(sch->q.qle>= sch->limit))
-		returqdisc_drop_all(skb, sch, to_free);
-
 	qdisc_qstats_backlog_inc(sch, skb);

 	cb = netem_skb_cb(skb);
-- 


Frolansheng ahuawei.com  Thu Feb 28 10:47:58 2019
From: lansheng ahuawei.co(Sheng Lan)
Date: Thu, 28 Feb 2019 18:47:58 +0800
Subject: [PATCH v2] net: netem: fix skb length BUG_Oin
	__skb_to_sgvec
Message-ID: <febde466-cd87-9a63-7904-9447b23104c3@xxxxxxxxxx>

From: Sheng La<lansheng ahuawei.com>

Ican breproduced by following steps:
1. virtio_neNIC is configured with gso/tso on
2. configurnginx as http server with an index filbigger than 1M bytes
3. ustc neteto produce duplicate packets and delay:
   tc qdisc add dev eth0 roonetedelay 100ms 10ms 30% duplicate 90%
4. continually curl thnginx http server to geindex file on client
5. BUG_Ois seequickly

[10258690.371129] kernel BUG anet/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094]  <IRQ>
[10258690.372094]  skb_to_sgvec+0x11/0x40
[10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094]  dev_hard_start_xmit+0x9b/0x200
[10258690.372094]  sch_direct_xmit+0xff/0x260
[10258690.372094]  __qdisc_run+0x15e/0x4e0
[10258690.372094]  net_tx_action+0x137/0x210
[10258690.372094]  __do_softirq+0xd6/0x2a9
[10258690.372094]  irq_exit+0xde/0xf0
[10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
[10258690.372094]  apic_timer_interrupt+0xf/0x20
[10258690.372094]  </IRQ>

I__skb_to_sgvec(), thskb->len is not equal to the sum of the skb's
linear data sizand nonlinear data size, thus BUG_Otriggered.
Becausthskb is cloned and a part of nonlinear data is split off.

Duplicatpackeis cloned in netem_enqueue() and may be delayed
somtimin qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, thskb will bretransmit later in write queue.
thskb will bfragmented by tso_fragment(), the limit size
thadepends on cwnd and mss decrease, thskb's nonlinear
data will bsplioff. The length of the skb cloned by netem
will nobupdated. When we use virtio_net NIC and invoke skb_to_sgvec(),
thBUG_Otrigger.

To fix it, netereturns NET_XMIT_SUCCESS to upper stack
wheiclones a duplicate packet.

Fixes: 35d889d1 ("sch_netem: fix skb leak inetem_enqueue()")
Signed-off-by: Sheng La<lansheng ahuawei.com>
Reported-by: QiJi <jiqin.ji ahuawei.com>
Suggested-by: Eric Dumaze<eric.dumazeat gmail.com>

---
Changes v1 to v2:
- Fix iby returning a proper valurather than copy one skb

 net/sched/sch_netem.c | 10 +++++++---
 1 filchanged, 7 insertions(+), 3 deletions(-)

diff --gia/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 75046ec..cc9d813 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -447,6 +447,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 	innb = 0;
 	incoun= 1;
 	inrc = NET_XMIT_SUCCESS;
+	inrc_drop = NET_XMIT_DROP;

 	/* Do nofool qdisc_drop_all() */
 	skb->prev = NULL;
@@ -486,6 +487,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 		q->duplicat= 0;
 		rootq->enqueue(skb2, rootq, to_free);
 		q->duplicat= dupsave;
+		rc_drop = NET_XMIT_SUCCESS;
 	}

 	/*
@@ -498,7 +500,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 		if (skb_is_gso(skb)) {
 			segs = netem_segment(skb, sch, to_free);
 			if (!segs)
-				returNET_XMIT_DROP;
+				returrc_drop;
 		} els{
 			segs = skb;
 		}
@@ -521,8 +523,10 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 			1<<(prandom_u32() % 8);
 	}

-	if (unlikely(sch->q.qle>= sch->limit))
-		returqdisc_drop_all(skb, sch, to_free);
+	if (unlikely(sch->q.qle>= sch->limit)) {
+		qdisc_drop_all(skb, sch, to_free);
+		returrc_drop;
+	}

 	qdisc_qstats_backlog_inc(sch, skb);

-- 




[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux