On Thu, Mar 19, 2015 at 10:21 AM, Robert LeBlanc <robert@xxxxxxxxxx> wrote: > > I'm not familiar with fq_codel, but it doesn't look to have priority > queues which is one of our primary concerns. You generally don't need priority queues per se' with fq_codel. See more below. You CAN, if you wish, use specialized tc rules for priority traffic. We have generally found that by using fair queueing (flow queuing), and an AQM to keep queue lengths short, that nearly all of the perceived need for explicit prioritization vanishes. try it. you are a single sysctl away from trying it. > 40 Gb is pretty fast, Well, an ongoing problem much discussed at the netconf01 conference, is the tx path is now capable of 40Gbit but the rx path is not. > > but > we are concerned about run away processes in our cluster that could > saturate uplinks or even the PCI bus at these speeds. Well, fq_codel is best on your saturated uplinks, to be sure, and it is very helpful against simple malbehaved loads. For example, we had a bug in dhcpdv6 that happened after 51 days, totally saturating the cable uplink with requests. The fq_codel users - never noticed it! The link was mildly slower, that's all. http://www.bufferbloat.net/issues/441 I am sure I still have older users of cerowrt out there flooding comcast with dhcpv6 requests from that particular bug, and not noticing it. > We need to keep > heartbeats and management traffic alive to be able to solve these > problems. Heartbeats and management traffic are naturally sparse, and thus automagically prioritized by fq_codel. It also uses ECN if you choose, which allows for low latency without packet loss on hosts that use ecn by default. The relevant sysctls are: net.ipv4.tcp_ecn=1 net.core.default_qdisc=fq_codel Which I usually put in a file in /etc/sysctl.d and there is also now support for enabling ecn on a per route basis in the latest kernels, as people are still dubious about being able to use ecn everywhere. But ecn is not required to be fully enabled for fq_codel to work it's magic. But: I know of at least one large web service provider enabling ecn AND fq_codel by default, everywhere now ( archive.org ) and there are no doubt countless others. In your case, with the 10+GigE hardware queues running mq, you can have a top level tc filter to redirect stuff based on markings into the "right" hardware queues, and have fq_codel or sch_fq attached underneath it for each hardware queue to further manage the traffic. The latter part of that happens automagically with the first sysctl above, the tc part is kind of expensive to write rules for, I wish we had an explicit "diffserv" filter to use instead of one tc rule per diffserv codepoint you cared about. sch_fq, is a host only solution, mostly targetted at tcp traffic, and that *does* support explicit prioritization. It's pretty amazing what it does... All I have ever argued on these fronts is merely that people try these new qdiscs in *their* realistic scenarios and with tools like netperf-wrapper, to see if they help or not, and (hopefully) share their results so we can further improve matters. We are well aware of how much sqm-scripts + fq_codel help cable and DSL modems in particular (see for example: http://burntchrome.blogspot.com/2014/05/fixing-bufferbloat-on-comcasts-blast.html or http://snapon.lab.bufferbloat.net/~cero2/jimreisert/results.html ), and we are working on some derivative ideas to improve wifi as part of the make-wifi-fast effort. ( http://snapon.lab.bufferbloat.net/~d/ieee802.11-sept-17-2014/11-14-1265-00-0wng-More-on-Bufferbloat.pdf ) In other news... Work is resuming on "sch_cake", which combines htb + fq_codel + diffserv (and priority) classification into one qdisc (replacing the sqm-scripts entirely) and I do hope we can get some more testers of the patches over the next few months. The discussion of "cake" is over on the codel list on bufferbloat.net where the current patch set is linked to and the flaws discussed. I do hope this qdisc will also scale up to 40GigE and be of use in doing traffic engineering at these speeds, and PLEASE join us in discussing the needed feature set(s) for your use cases. And I do generally hope that these algorithms will start appearing the basic ethernet and switch hardware at some point soon. I am backing a kickstarter project to get it out there in an FPGA at least... (see: https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking ) > It seems that fq_codel really helps with latency and > bufferbloat (which would also be good for us) and it is possible that > it would be adequate since we are DSCP marking packets and the > switches are applying L2 COS. I'll have to look over it. If you have > some information that would be helpful, I'd be happy to have it. Oy. More talks and documentation than you can shake a stick at. we have put a collection of the best up here: http://www.bufferbloat.net/projects/cerowrt/wiki/Bloat-videos I enjoy Stephen Hemminger's presos a lot, although my modena talks go into more detail. Van Jacobson's talk is essential, and the shortest talk I have ever given on these subjects was the uknof talk. Most of my work has been focused on fixing the edge of the internet, rather than stuff in the data center, but I am increasingly interested in what good all this stuff does in these environments, as I do hope to see these algorithms start showing up soon in Load Balancers, BRAS's and CMTSes and ISP gear like that. Certainly sch_fq is rather widely deployed at at least one large web service provider, in particular. :) I don't have any data as to where else it is deployed... > > On Wed, Mar 18, 2015 at 4:41 PM, Dave Taht <dave.taht@xxxxxxxxx> wrote: > > On Wed, Mar 18, 2015 at 3:03 PM, Robert LeBlanc <robert@xxxxxxxxxx> wrote: > >> > >> I can't speak for the specific configuration, but we are having > >> challenges using the mqprio qdisc on the Intel XL710 adapters. We have > >> engaged Intel, but we are still having trouble. If you want priority > >> queuing and using the hardware TX queues, you may have problems with > >> this family of cards. > > > > > > I am very curious about benchmarks of fq_codel (vs mqprio or > > pfifo_fast) at these speeds on adaptors like this, preferably driven > > by repeatable tests like the rrul test in netperf-wrapper. > > > > In particular, what happens if only one tx queue is enabled with fq_codel? > > > >> > >> On Wed, Mar 18, 2015 at 1:07 PM, Nieścierowicz Adam > >> <adam.niescierowicz@xxxxxxxxxx> wrote: > >> > Hi, > >> > > >> > in the near future we are planning to purchase new equipment for the router. > >> > > >> > Our requirements are: > >> > > >> > - four or more 1GbE interfaces > >> > > >> > - two-four 10GbE interfaces > >> > > >> > - two 40GbE interfaces > >> > > >> > - multicast routing with throughput input 1-2 Gb/s output 4x1-2Gb/s > >> > > >> > Can someone share your experience what equipment you choose to such a > >> > configuration > >> > > >> > --- > >> > Thanks, > >> > Adam Nieścierowicz > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe lartc" in > >> > the body of a message to majordomo@xxxxxxxxxxxxxxx > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> -- > >> To unsubscribe from this list: send the line "unsubscribe lartc" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > -- > > Dave Täht > > Let's make wifi fast, less jittery and reliable again! > > > > https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb -- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb -- To unsubscribe from this list: send the line "unsubscribe lartc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html