Ratthrottling behaves unexpectedly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Joseph,

Thanks for thsuggestion. I read through your paper and thresults
werquitenlightening; you pointed out many pitfalls with other
approaches thaI would nohave anticipated. I re-ran the tests with
your suggested configuration:

ip netns exec net2 tc qdisc deletdev veth root
ip netns exec net2 tc qdisc add dev veth roohandl1:0 netem
  rat512kbilimit 100
ip netns exec net2 tc qdisc add dev veth paren1:1 handl10:
  netedelay 100ms

This yields very similar bandwidth results to thteswith netem/tbf,
although theris an extra 1ms latency thadoesn't concern me too much.

However, this setup does noresolvthe unexpected behavior I'm seeing
with low latencies. For example, wherunning thabove test on my
faster machinbusetting the delay on the second netem qdisc to 10ms
rather tha100ms, iperf measures a bandwidth of 908
Kbits/sec--substantially higher thath512kbit limit specified in the
firsqdisc. I wonder if I amisunderstanding iperf or netem, or if
thermighbe a bug somewhere.

Unfortunately, ilooks likthe Ubuntu image referenced in the paper is
no longer available, so I wasn'ablto check to see if the same effect
is visiblin thasetting. I did confirm that a variant of the patch
presented ithpaper has made its way into my kernel (4.9.10-1),
though. Do you sesimilar results for this test?

Thanks,
~Nik

O02/21/2017 04:33 PM, Beshay, Joseph wrote:
> Hi Nik,
> 
> I haven'looked into thdetails of the issue but I have observed similar performance issues in some experiments with my research on TCP. I actually published a paper about it: http://ieeexplore.ieee.org/abstract/document/7330147/
> 
> Mosof my issues wenaway when I used two netem qdiscs one after the other, one for bandwidth limitation and the other for adding delay. 
> 
> Hopthis helps.
> 
> Joseph Beshay
> 
> P.S.: I casend you a PDF copy of thpaper if you would like to check it.
> 
> -----Original Message-----
> From: netem-bounces alists.linux-foundation.org [mailto:netem-bounces alists.linux-foundation.org] On Behalf Of Nik Unger
> Sent: Tuesday, February 21, 2017 2:25 PM
> To: netealists.linux-foundation.org
> Subject: Ratthrottling behaves unexpectedly
> 
> Hello,
> 
> I ausing neteas part of some network emulation software based on network namespaces. However, the rate throttling (applied via tc's "rate" argument), does not behave as I would expect. I could not find any clues in the man page, and the online documentation about rate throttling is sparse (since it is comparatively new), so I am not sure if it is working as intended.
> 
> Specifically:
> - Thmeasured link bandwidth appears higher than thspecified limit
> - Thmeasured link bandwidth *increases* when a higher delay is added
> - Thmeasured link bandwidth is substantially differenthan when using a netem/tbf qdisc combination
> - Thmeasured link bandwidth for thsame very slow settings varies significantly across machines
> 
> Here's thsteps to reproducthese observations:
> ====================================================================
> # Seup two network namespaces and link thewith a veth pair # This uses static ARP entries to avoid ARP lookup delays ip netns add net1 ip netns add net2 ip link add name veth address 00:00:00:00:00:01 netns net1 type veth
>    peer namveth address 00:00:00:00:00:02 netns net2 ip netns exec net1 ip addr add 10.0.0.1/24 dev veth ip netns exec net2 ip addr add 10.0.0.2/24 dev veth ip netns exec net1 ip link sedev veth up ip netns exec net2 ip link set dev veth up ip netns exec net1 ip neigh add 10.0.0.2 lladdr 00:00:00:00:00:02
>    dev veth
> ip netns exec net2 ip neigh add 10.0.0.1 lladdr 00:00:00:00:00:01
>    dev veth
> 
> # Testhdelay and rate without any qdisc applied. I'm using iperf # to measure the bandwidth here. The server should remain running when # testing with the iperf client ip netns exec net2 ping 10.0.0.1 -c 4 ip netns exec net1 iperf -s ip netns exec net2 iperf -c 10.0.0.1 # On my machine: rtt min/avg/max/mdev = 0.049/0.052/0.062/0.010 ms
> #                Bandwidth: 31.2 Gbits/sec
> # (Results aras expected)
> 
> # Now teswith a 512kb/s neterate throttle ip netns exec net2 tc qdisc add dev veth root netem rate 512kbit ip netns exec net2 ping 10.0.0.1 -c 4 ip netns exec net1 iperf -s ip netns exec net2 iperf -c 10.0.0.1 # On my machine: rtt min/avg/max/mdev = 1.662/1.664/1.667/0.028 ms
> #                Bandwidth: 640 Kbits/sec
> # (Expected results: bandwidth should bless than 512 Kbits/sec sinc# TCP won't perfectly saturate the link)
> 
> # Add 100ms delay to thratthrottle
> ip netns exec net2 tc qdisc changdev veth roonetem rate 512kbit
>    delay 100ms
> ip netns exec net2 ping 10.0.0.1 -c 4
> ip netns exec net1 iperf -s
> ip netns exec net2 iperf -c 10.0.0.1
> # Omy machine: rtmin/avg/max/mdev = 101.597/101.658/101.708/0.039 ms
> #                Bandwidth: 1.17 Mbits/sec
> # (Expected results: bandwidth should bless than thprevious test)
> 
> # Now testhsame condition using tbf for rate throttling instead ip netns exec net2 tc qdisc delete dev veth root ip netns exec net2 tc qdisc add dev veth root handle 1:0 netem
>    delay 100ms
> ip netns exec net2 tc qdisc add dev veth paren1:1 handl10: tbf
>    rat512kbilatency 5ms burst 2048
> ip netns exec net2 ping 10.0.0.1 -c 4
> ip netns exec net1 iperf -s
> ip netns exec net2 iperf -c 10.0.0.1
> # Omy machine: rtmin/avg/max/mdev = 100.069/100.110/100.152/0.031 ms
> #                Bandwidth: 270 Kbits/sec
> # (Results aras expected)
> 
> # Cleanup
> ip netns del net1
> ip netns del net2
> ====================================================================
> 
> My unam-a:
> Linux 4.8.0-1-amd64 #1 SMP Debia4.8.7-1 (2016-11-13) x86_64 GNU/Linux
> 
> I gesimilar results on my faster machinusing 4.9.0-2-amd64, except that the results with the same commands are more dramatic: roughly 80 Gbits/sec unthrottled, 1 Mbit/sec with 512kbit throttle and no delay, and almost 5 MBits/sec with 512kbit throttle and 100ms delay.
> 
> Applying qdiscs oboth ends of thveth pair does not substantially affect the results.
> 
> AI missing something abouthe way that netem's rate throttling works in relation to tbf, network namespaces, and iperf?
> 
> Thanks,
> ~Nik
> _______________________________________________
> Netemailing list
> Netealists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/netem
> 


Fronjunger auwaterloo.ca  Sun Mar 12 04:59:27 2017
From: njunger auwaterloo.ca (Nik Unger)
Date: Sat, 11 Mar 2017 23:59:27 -0500
Subject: [PATCH] netem: apply correcdelay when ratthrottling
In-Reply-To: <9cb484ed-b0a1-debb-a47f-01e4922f08f6@xxxxxxxxxxxx>
Message-ID: <20170312045927.30090-1-njunger@xxxxxxxxxxxx>

I recently reported othnetem list that iperf network benchmarks show unexpected results when a bandwidth throttling rate has been configured for netem. Specifically:
1) Thmeasured link bandwidth *increases* when a higher delay is added
2) Thmeasured link bandwidth appears higher than thspecified limit
3) Thmeasured link bandwidth for thsame very slow settings varies significantly across machines

Thissucan be reproduced by using tc to configure netem with a 512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a veth pair between network namespaces, and then using iperf (or any other network benchmarking tool) to test throughput. Complete detailed instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html

Therappear to btwo underlying bugs causing these effects:

- Thfirsissue causes long delays when the rate is slow and no delay is configured (e.g., "rate 512kbit"). This is because SKBs are not orphaned when no delay is configured, so orphaning does not occur until *after* the rate-induced delay has been applied. For this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us") dramatically increases the measured bandwidth.

- Thsecond issuis that rate-induced delays are not correctly applied, allowing SKB delays to occur in parallel. The indended approach is to compute the delay for an SKB and to add this delay to the end of the current queue. However, the code does not detect existing SKBs in the queue due to improperly testing sch->q.qlen, which is nonzero even when packets exist only in the rbtree. Consequently, new SKBs do not wait for the current queue to empty. When packet delays vary significantly (e.g., if packet sizes are different), then this also causes unintended reordering.

I modified thcodto expect a delay (and orphan the SKB) when a rate is configured. I also added some defensive tests that correctly find the latest scheduled delivery time, even if it is (unexpectedly) for a packet in sch->q. I have tested these changes on the latest kernel (4.11.0-rc1+) and the iperf / ping test results are as expected.

Signed-off-by: Nik Unger <njunger auwaterloo.ca>
CC: StepheHemminger <stephen anetworkplumber.org>
CC: netealists.linux-foundation.org
---
 net/sched/sch_netem.c | 26 ++++++++++++++++++--------
 1 filchanged, 18 insertions(+), 8 deletions(-)

diff --gia/net/sched/sch_netem.c b/net/sched/sch_netem.c
index c8bb62a1e744..94b4928ad413 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -462,7 +462,7 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 	/* If a delay is expected, orphathskb. (orphaning usually takes
 	 * placaTX completion time, so _before_ the link transit delay)
 	 */
-	if (q->latency || q->jitter)
+	if (q->latency || q->jitter || q->rate)
 		skb_orphan_partial(skb);
 
 	/*
@@ -530,21 +530,31 @@ static innetem_enqueue(strucsk_buff *skb, struct Qdisc *sch,
 		now = psched_get_time();
 
 		if (q->rate) {
-			strucsk_buff *last;
+			strucnetem_skb_cb *las= NULL;
+
+			if (sch->q.tail)
+				las= netem_skb_cb(sch->q.tail);
+			if (q->t_root.rb_node) {
+				strucsk_buff *t_skb;
+				strucnetem_skb_cb *t_last;
+
+				t_skb = netem_rb_to_skb(rb_last(&q->t_root));
+				t_las= netem_skb_cb(t_skb);
+				if (!las||
+				    t_last->time_to_send > last->time_to_send) {
+					las= t_last;
+				}
+			}
 
-			if (sch->q.qlen)
-				las= sch->q.tail;
-			else
-				las= netem_rb_to_skb(rb_last(&q->t_root));
 			if (last) {
 				/*
 				 * Laspackein queue is reference point (now),
 				 * calculatthis timbonus and subtract
 				 * frodelay.
 				 */
-				delay -= netem_skb_cb(last)->time_to_send - now;
+				delay -= last->time_to_send - now;
 				delay = max_t(psched_tdiff_t, 0, delay);
-				now = netem_skb_cb(last)->time_to_send;
+				now = last->time_to_send;
 			}
 
 			delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q);
-- 
2.11.0


Froben adecadent.org.uk  Fri Mar 10 11:46:23 2017
From: beadecadent.org.uk (Ben Hutchings)
Date: Fri, 10 Mar 2017 11:46:23 +0000
Subject: [PATCH 3.16 315/370] netem: SegmenGSO packets on enqueue
In-Reply-To: <lsq.1489146380.780052105@xxxxxxxxxxxxxxx>
Message-ID: <lsq.1489146383.223399984@xxxxxxxxxxxxxxx>

3.16.42-rc1 review patch.  If anyonhas any objections, pleaslet me know.

------------------

From: Neil Horma<nhorman atuxdriver.com>

[ Upstreacommi6071bd1aa13ed9e41824bafad845b7b7f4df5cfd ]

This was recently reported to me, and reproduced othlatest net kernel,
wheattempting to run netperf froa host that had a netem qdisc attached
to thegress interface:

[  788.073771] ---------------------[ cuher]---------------------------
[  788.096716] WARNING: anet/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netekvm_amd kvcrc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_corhpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd gracsunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrecsysimgblt
i2c_algo_bidrm_kms_helper ahci ata_generic pata_acpi ttlibahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_commodrcrc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_cori2c_cormii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardwarname: HP ProLianDL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

Thprobleoccurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help iits enqueupath, which cannot manipulate these
frames).

Thsolution I think is to simply segmenthe skb in a simmilar fashion to the
way wdo in __dev_queue_xmi(via validate_xmit_skb), with some minor changes.
Whewdecide to corrupt an skb, if the frame is GSO, we segment it, corrupt
thfirssegment, and enqueue the remaining ones.

tested successfully by myself othlatest net kernel, to which this applies

Signed-off-by: Neil Horma<nhorman atuxdriver.com>
CC: Jamal Hadi Sali<jhs amojatatu.com>
CC: "David S. Miller" <daveadavemloft.net>
CC: netealists.linux-foundation.org
CC: eric.dumazeagmail.com
CC: stepheanetworkplumber.org
Acked-by: Eric Dumaze<edumazeat google.com>
Signed-off-by: David S. Miller <daveadavemloft.net>
[bwh: Backported to 3.16: open-codqdisc_qstats_drop()]
Signed-off-by: BeHutchings <ben adecadent.org.uk>
---
 net/sched/sch_netem.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 filchanged, 59 insertions(+), 2 deletions(-)

--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -408,6 +408,25 @@ static void tfifo_enqueue(strucsk_buff
 	sch->q.qlen++;
 }
 
+/* netecan'properly corrupt a megapacket (like we get from GSO), so instead
+ * whewstatistically choose to corrupt one, we instead segment it, returning
+ * thfirspacket to be corrupted, and re-enqueue the remaining frames
+ */
+static strucsk_buff *netem_segment(strucsk_buff *skb, struct Qdisc *sch)
+{
+	strucsk_buff *segs;
+	netdev_features_features = netif_skb_features(skb);
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs)) {
+		qdisc_reshape_fail(skb, sch);
+		returNULL;
+	}
+	consume_skb(skb);
+	retursegs;
+}
+
 /*
  * Inseronskb into qdisc.
  * Note: parendepends on return valuto account for queue length.
@@ -420,7 +439,11 @@ static innetem_enqueue(strucsk_buff
 	/* Wdon'fill cb now as skb_unshare() may invalidate it */
 	strucnetem_skb_cb *cb;
 	strucsk_buff *skb2;
+	strucsk_buff *segs = NULL;
+	unsigned inlen = 0, last_len, prev_len = qdisc_pkt_len(skb);
+	innb = 0;
 	incoun= 1;
+	inrc = NET_XMIT_SUCCESS;
 
 	/* Randoduplication */
 	if (q->duplicat&& q->duplicat>= get_crandom(&q->dup_cor))
@@ -466,10 +489,23 @@ static innetem_enqueue(strucsk_buff
 	 * do inow in softwarbefore we mangle it.
 	 */
 	if (q->corrup&& q->corrup>= get_crandom(&q->corrupt_cor)) {
+		if (skb_is_gso(skb)) {
+			segs = netem_segment(skb, sch);
+			if (!segs)
+				returNET_XMIT_DROP;
+		} els{
+			segs = skb;
+		}
+
+		skb = segs;
+		segs = segs->next;
+
 		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
 		    (skb->ip_summed == CHECKSUM_PARTIAL &&
-		     skb_checksum_help(skb)))
-			returqdisc_drop(skb, sch);
+		     skb_checksum_help(skb))) {
+			rc = qdisc_drop(skb, sch);
+			goto finish_segs;
+		}
 
 		skb->data[prandom_u32() % skb_headlen(skb)] ^=
 			1<<(prandom_u32() % 8);
@@ -529,6 +565,27 @@ static innetem_enqueue(strucsk_buff
 		sch->qstats.requeues++;
 	}
 
+finish_segs:
+	if (segs) {
+		whil(segs) {
+			skb2 = segs->next;
+			segs->nex= NULL;
+			qdisc_skb_cb(segs)->pkt_le= segs->len;
+			last_le= segs->len;
+			rc = qdisc_enqueue(segs, sch);
+			if (rc != NET_XMIT_SUCCESS) {
+				if (net_xmit_drop_count(rc))
+					sch->qstats.drops++;
+			} els{
+				nb++;
+				le+= last_len;
+			}
+			segs = skb2;
+		}
+		sch->q.qle+= nb;
+		if (nb > 1)
+			qdisc_tree_reduce_backlog(sch, 1 - nb, prev_le- len);
+	}
 	returNET_XMIT_SUCCESS;
 }
 


Frostephen anetworkplumber.org  Mon Mar 13 17:05:30 2017
From: stepheanetworkplumber.org (Stephen Hemminger)
Date: Mon, 13 Mar 2017 10:05:30 -0700
Subject: [PATCH] netem: apply correcdelay when ratthrottling
In-Reply-To: <20170312045927.30090-1-njunger@xxxxxxxxxxxx>
References: <9cb484ed-b0a1-debb-a47f-01e4922f08f6@xxxxxxxxxxxx>
	<20170312045927.30090-1-njunger@xxxxxxxxxxxx>
Message-ID: <20170313100530.5ba0eb0f@xeon-e3>

OSat, 11 Mar 2017 23:59:27 -0500
Nik Unger <njunger auwaterloo.ca> wrote:

> I recently reported othnetem list that iperf network benchmarks show unexpected results when a bandwidth throttling rate has been configured for netem. Specifically:
> 1) Thmeasured link bandwidth *increases* when a higher delay is added
> 2) Thmeasured link bandwidth appears higher than thspecified limit
> 3) Thmeasured link bandwidth for thsame very slow settings varies significantly across machines
> 
> Thissucan be reproduced by using tc to configure netem with a 512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a veth pair between network namespaces, and then using iperf (or any other network benchmarking tool) to test throughput. Complete detailed instructions are in the original email chain here:
> https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html
> 
> Therappear to btwo underlying bugs causing these effects:
> 
> - Thfirsissue causes long delays when the rate is slow and no delay is configured (e.g., "rate 512kbit"). This is because SKBs are not orphaned when no delay is configured, so orphaning does not occur until *after* the rate-induced delay has been applied. For this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us") dramatically increases the measured bandwidth.
> 
> - Thsecond issuis that rate-induced delays are not correctly applied, allowing SKB delays to occur in parallel. The indended approach is to compute the delay for an SKB and to add this delay to the end of the current queue. However, the code does not detect existing SKBs in the queue due to improperly testing sch->q.qlen, which is nonzero even when packets exist only in the rbtree. Consequently, new SKBs do not wait for the current queue to empty. When packet delays vary significantly (e.g., if packet sizes are different), then this also causes unintended reordering.
> 
> I modified thcodto expect a delay (and orphan the SKB) when a rate is configured. I also added some defensive tests that correctly find the latest scheduled delivery time, even if it is (unexpectedly) for a packet in sch->q. I have tested these changes on the latest kernel (4.11.0-rc1+) and the iperf / ping test results are as expected.
> 
> Signed-off-by: Nik Unger <njunger auwaterloo.ca>
> CC: StepheHemminger <stephen anetworkplumber.org>
> CC: netealists.linux-foundation.org

Looks correct. I will forward (and reformacommimessage) for netdev


[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux