Re: tc packet drop in high priority queue

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Knowledge is power! You've given me a lot of good things to digest.
Although we gained the performance in the network, we did not get any
additional performance in our VMs, so I'm now looking at QEMU and all
that to remove the next bottleneck.

Thanks again.
- ----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jun 1, 2015 at 5:49 PM, Dave Taht  wrote:
> On Mon, Jun 1, 2015 at 4:34 PM, Robert LeBlanc  wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Dave,
>>
>> Thanks for the info about a newer kernel. I installed 4.0.4 (from
>> 3.18) and there was a 6x improvement on both send and receive. I'm not
>> seeing the high ksoftirqd usage on either host now. I only saw the
>> ksoftirq when running iperf tests so I assumed it was related to the
>> network adapter. It also moved to cores that I pinned the network
>> interrupts to.
>>
>> I'll still need to work on tuning other network aspects, but so far,
>> this is looking much more acceptable.
>
> Awesome.
>
> Work on scaling the kernel's networking stack continues.
>
> A good place to see the ongoing path forward is dave miller's keynote
> at netdev 01: https://www.youtube.com/watch?v=QDxM83YaI0E
>
> The early FIB rework I mentioned was also at that conference (there is
> a video somewhere)
> https://netdev01.org/docs/duyck-fib-trie.pdf (see page 10)
>
> And jesper led the bulk BQL work:
>
> http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
>
> There have been so many exciting e2e and server-based improvements of
> late, and so few on edge routers in between, that it has been quite
> lonely looking on all that good stuff happening here, and going off to
> reflash another router with a would-be wifi improvement.
>
>> I was not using HTB (I understand that it doesn't work well with high
>> speed links as the buckets don't get filled up fast enough),
>
> I believe HTB was thoroughly improved in the 3.10.12 timeframe.
>
>>it was
>> straight prio with sfq. Right now I'm using fq_codel and once we get
>> all of our boxes upgraded to 4.0.4, I'm going to do my saturation
>> tests again and see how it performs
>
> you might want to try enabling ecn on the tcps and to shorten the
> default tcp_limit_output_bytes to 4 or 8k.
>
>> Thanks again.
>
> de nada. it is not every day I can help improve something by 6x merely
> by spreading some knowledge around more widely. :)
>
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJVbOwRCRDmVDuy+mK58QAAYHkP/RNCF42heU4Ophx/+D2o
>> 7XDSUZHoFPXxzf23bHCeOOrlYTrLqsGqiulvWhc3xZe6nyv05r30BQbKbDw4
>> rXDjh6vIr12jwDHfzb9A9zc8kn/3qT1YL/a9zDBIhR03r5QW1h2mcwZfvLpu
>> nj2mK+oShfYd01236aCTH9mv/WzERXZ7yvgYPMe8gZ00Db33YMFyKSB3J4v1
>> PwAUQQ1fPVOjQE8I8Th1ESQew4clShsSVtDlFhd6POHnLz6TQsR+4Txhwn0K
>> zOUWIFX6eKorpP0Op6WwalKBuHpHjzEDBk2ejnIpcshuKRImZDZ2QQYrcELw
>> Y7IM7texHbguykWsIH7ZOuFP9gu1JrCBMXhFUPz3Rk8obxusQMPySBqNyfrb
>> 9zWw1bRa3QlGsWnUehX3726tqO8ppJ7Ugoi5eWao0yC7iks/BVgIow560zpM
>> kQXFGuAC6j5yUcQyiVVBlhTU/C0tvspo5kImghy2bb8/hsTs6PT61KrxhzyD
>> yfPGSIxGSDk5XH55lPC0vgaT0vpwMTOxwWv7otnGLzc6Hwlyirnpc+/MQG2O
>> oMYi86Lz+KCALquF4oy1EJAwfd2790EdZP2SuDsxmO9mLrb7X/GbGIugTwc7
>> RzxcbLADxCi23XxfN6grI6ir+2PLJtq1lsaTvIMsQIaYreh7Pi4M4HOb0dpz
>> Z2SE
>> =EA0M
>> -----END PGP SIGNATURE-----
>> ----------------
>> Robert LeBlanc
>> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Mon, Jun 1, 2015 at 4:08 PM, Dave Taht  wrote:
>>> Note: I am at nanog this week, expect replies to be sparse to non-existent.
>>>
>>> On Mon, Jun 1, 2015 at 12:17 PM, Robert LeBlanc  wrote:
>>>> I wondered about the queue depth of sfq as well. I changed out the
>>>> prio and sfq to fq_codel. The problem persisted, although it looks
>>>> like we found another issue.
>>>
>>> I am not a believer in strict prio queuing, preferring weighted DRR if
>>> you must have classes of traffic.
>>>
>>>> Performance on the target host just tanked and never recovered. We
>>>> found ksoftirqd eating tons of CPU time.
>>>
>>>
>>>> From what I understand
>>>> (please help my understanding if I'm wrong) the kernel stopped
>>>> handling interrupts in hardware and started handling them all in the
>>>> kernel dropping performance. A reboot of the host restored performance
>>>> and lowered ksoftirqd utilization although it is still no where the 40
>>>
>>> ksoftirqd is used for many things. Were you also attempting to use htb
>>> to rate limit?
>>>
>>> Don't do that. Try something that has no inherent rate limits if you
>>> must do more
>>> prioritization - but look over sch_fq's features carefully, it
>>> probably does what you want.
>>>
>>>> Gb of line rate of the adapter or even a quarter that. I think the
>>>> i40e driver still has a way to go before good performance can be
>>>> expected.
>>>
>>> Latest kernels had quite a few optimizations for 40GigE, notably BQL bulking and
>>> FIB table improvements.
>>>
>>> Prior to the BQL bulk changes 40GigE was unachievable.
>>>
>>>> I was looking for some good info on sch_fq last week, but couldn't
>>>> find anything. Do you have links?
>>>
>>> It has become recently apparent that better documentation and
>>> discussion of sch_fq is needed. I am not the author (nor am I of
>>> fq_codel!) and my own focus is primarily on fixing the edge networks,
>>> particularly wifi these days, and as huge a fan as I am of sch_fq and
>>> pacing, it's (very strong!) applicability in the data center is where
>>> I do not venture personally.
>>>
>>> I strongly encourage experimentation and good measurements on your own
>>> workloads, and publishing your scripts and results. :)
>>>
>>> Give it a shot.
>>>>
>>>> Thanks,
>>>> ----------------
>>>> Robert LeBlanc
>>>> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> On Mon, Jun 1, 2015 at 12:12 PM, Dave Taht  wrote:
>>>>> On Mon, Jun 1, 2015 at 10:06 AM, jsullivan@xxxxxxxxxxxxxxxxxxx
>>>>>  wrote:
>>>>>>
>>>>>>> On June 1, 2015 at 12:36 PM Andy Furniss  wrote:
>>>>>>>
>>>>>>>
>>>>>>> Robert LeBlanc wrote:
>>>>>>> > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>>>>>> >
>>>>>>> > Any ideas on this?
>>>>>>>
>>>>>>> 40 gig nics are way beyond anything I've ever done with tc and I guess
>>>>>>> involve some offload = huge "packets".
>>>>>>>
>>>>>>> It could be that aas sfq has a default qlen of 128 and as you are not
>>>>>>> actually rate limiting (and to do that may be "interesting" at 80 gig)
>>>>>>> the prio relies on some downstream buffer being full. Perhaps it's just
>>>>>>> that at these rates prio can not dequeue anything for periods of time so
>>>>>>> the 128 limit of sfq is overrun even for the highest prio.
>>>>>>>
>>>>>>> This is pure guesswork.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe lartc" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> Alas, I haven't yet read the details of the original post but I know we just
>>>>>> replaced all our sfq leaves with fq_codel because of concerns about the per flow
>>>>>> packet depth on high speed, high latency networks.
>>>>>
>>>>> Yes, older versions of SFQ had a hard limit on queue depth. This was
>>>>> "improved" in linux 3.6 and later but that work  ultimately pointed at
>>>>> a need to actively manage the queue depth, which begat sfqred, and
>>>>> ultimately fq_codel.
>>>>>
>>>>> I note that, these days, the best results we get for
>>>>> tcp-heavy-*servers and hosts* (not routers, not udp heavy services,
>>>>> and the bare metal under a vm is a "router" in this context), in the
>>>>> data center, at these speeds, now come from sch_fq (from the pacing,
>>>>> fq, and tso fixes), and a low setting for tcp_limit_output_bytes.
>>>>>
>>>>> example:
>>>>>
>>>>> https://fasterdata.es.net/host-tuning/linux/fair-queuing-scheduler/
>>>>>
>>>>> fq_codel remains a great all-around choice, but what pacing is doing
>>>>> is really remarkable for servers in sch_fq.
>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe lartc" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dave Täht
>>>>> What will it take to vastly improve wifi for everyone?
>>>>> https://plus.google.com/u/0/explore/makewififast
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> What will it take to vastly improve wifi for everyone?
>>> https://plus.google.com/u/0/explore/makewififast
>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVb4T+CRDmVDuy+mK58QAArM4P/0N8aKv7ZETaRAecrmNk
RkmZW8EG5jA590h1NysPqdayCB5RQ9xJBJYFkQAhetPgogZRwGJwn0H0HFwU
jMzsLdyCoNTydlMcvEZzOd3/zv/8Qz6oeKUXMGMyCPHyVydvo5mKeVjhGaBE
vRYnp/hiFMuMpvFANFIgbej783MFqsMUItGvv9eWp/Ek+ld4Yv3LR4pzzhOJ
WJZ5nbuz2eZJ6j5eVct3WeZla7SDzMWhfB5Z62CBpPiAe9qAEeHaluUeLpvi
6Z02sMy6gc5qzvHdJ6OafhzspUs+R53mXOz5LBh5zit6nxIPQS9+fVAd+9D3
K+nYp1H5MOs3RmxdzG35Y3TBH+sHGHYGDDdRfdsdf5c0/x+4iOUHMZRvsfAu
6GNPhy9CDa38uX0pknXAI2BCLADlSM+Z/Q1tLFJjYi40U8hfURK3SwcUAUhg
+QLjHveATXRdLNpydmzhFfCob0CcAnx2/KFNaY313qmjoiKUiWYdQpttWL1u
YgZmRIRbwFkbEOzwWhrzDrBnZSmDiv8dYcEc8BxzyXZywHU1yxS8/ZBZotFJ
h1QefUDeGfWk/3k21hGuMd/LtTYE4KN7zEat/5dPXuVNptaCVOkc7MhfAe/7
IxiDod0v47ygGA1v065TxTuKj3tl34rP8UWGcruF3bf5DUoB3bW1aMHrzWhx
a77e
=A2u4
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe lartc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux