By way of introduction, I'm a long-time Linux user/admin (since 1992), but have only recently stumbled into the area of traffic shaping/control in a Linux environment. The "fireqos" package looked like a great way to ease into this, but I'm obviously missing something fundamental about the way this is supposed to work, at least in terms of what I expected to see. Software environment is a somewhat customized CentOS 6.8 with kernel version 2.6.32-642.4.2.el6.x86_64: please consider that "invariant" for now, because the OS image is provided by a third-party vendor. Hardware environment is an IBM blade server with Broadcom NetXtreme II BCM57111 NICs (10G PCIe). There is an "eth5" interface associated with the "external" network, and an "eth7" interface associated with the "internal" network. The two interfaces are bridged: idea is for traffic to pass transparently between the two interfaces, subject to whatever shaping/control I might want to apply. I have access to a BreakingPoint appliance I can use to generate different mixes of simulated application traffic at various rates, and that's what I've been using for testing. For now, I'm limiting things to IPv4 UDP traffic, and looking at various scenarios involving unidirectional and bidirectional traffic flows. Per the recommendations found in the FireQOS tutorial and elsewhere, I did a bit of tuning with "ethtool" on the "eth5" and "eth7" interfaces. Specifically, I turned off "gro" and "lro", and increased the number of receive buffers ("rx") to the indicated maximum of 4078. Early testing of a trivial "fireqos" configuration resulted in a massive number of receiver overruns: adjusting the "rx" value was part of the solution to that problem, as well as adjusting the "interrupt coalescing parameters" -- rx-usecs 0 tx-usecs 0 rx-frames 1 tx-frames 1. Another individual suggested setting "net.core.netdev_max_backlog = 5000" (default value is 1000): the explanation offered was, this is the max number of packets that can be queued on the input side when an interface receives packets faster than the kernel can process them. With "fireqos" inactive, the bridge has no problems processing traffic at an aggregate (bidirectional) rate of 10 Gbit/s. Neither the BreakingPoint nor the SUT report any issues whatsoever. Given the following trivial "fireqos.conf" file as suggested by the tutorial: DEVICE=eth5 IN_SPD=9000mbit OUT_SPD=10000mbit LINKTYPE=ethernet interface $DEVICE ext-in input rate $IN_SPD $LINKTYPE interface $DEVICE ext-out output rate $OUT_SPD $LINKTYPE After typing "fireqos start", I see the expected "eth5-ifb" device created to help with the "input" side of things. With the default "sfq" qdisc, throughput with this configuration would best be described as "abysmal". 1.0 Gbit/s aggregate (bidirectional) is completely error-free. 1.5 Gbit/s aggregate is pretty good at 98%+. At a 2.0 Gbit/s aggregate data rate, traffic control reports I'm dropping upwards of 120,000 packets per second. Appending "qdisc pfifo" to each interface statement in the conf file (replaces default "sfq" qdisc with "pfifo") helps somewhat, which I would expect because simple FIFOs are computationally simple to implement. Odd thing about that is, the SUT doesn't seem to be in any "distress" as far as CPU utilization, inability to service interrupts promptly, memory/buffer issues, etc. The SUT has no other job to do than process traffic, and it has 24 hyperthreaded 2.4 GHz Xeon cores and 64 GB RAM available to throw at the task. Other things I've tried, a few of which seem to have helped: (1) Since "eth5" and "eth7" are bridged, why not use "eth7" as the "ext-in" interface and "eth5" as the "ext-out" interface (both in the "output" direction only)? Can then run both at "rate 10000mbit" since traffic shaping/control is only being applied to outbound direction of each interface. This has the further advantage of eliminating the IFB layer and whatever latency it introduces. (2) For each "interface" statement, experimented with adding "class default commit X%" for various values of "X". So far, this actually had the biggest effect on improving overall throughput when "fireqos" was active: aggregate rates of up to 3.0 Gbit/s look pretty good for a commit value of 90%. So, is this behaving "as designed"? I would expect higher throughput in the absence of any explicit controls, but I'm obviously missing something. Thanks in advance for improving my understanding of what's going on here. Do feel free to point me at any potentially relevant archived discussion threads: I'm pretty sure I'm at the stage where I don't know what I don't know :-). Respectfully, --Bob